


healthcare data, tabular data, data anonymization, privacy models, k-anonymity, l-diversity, t-closeness, data anonymization techniques, differential privacy


In today's world, issues of privacy and personal data protection are becoming extremely relevant, especially in the healthcare field, where the use of large volumes of data for research is becoming increasingly common. The use of personal data is regulated by relevant laws that require data anonymization to minimize the risks of identifying individuals. Anonymization is a process that allows the use of sensitive data without the risk of disclosing personal information while maintaining its utility. This article discusses the main privacy models and anonymization techniques used to protect tabular healthcare data. Privacy models include k-anonymity, l-diversity, and t-closeness. The k-anonymity model ensures that any combination of quasi-identifiers is shared by at least k records. The l-diversity model complements k-anonymity by requiring at least l unique combinations of sensitive attribute (SA) values in each equivalence class. The t-closeness model considers the distribution of these sensitive attribute values, ensuring that the distance between the SA distribution in the equivalence class and the overall distribution does not exceed a specified threshold. Anonymization techniques include generalization, suppression, relocation, permutation, perturbation, slicing, differential privacy, and synthetic data. Generalization reduces the precision of quasi-identifiers. Suppression removes certain values from the dataset to improve its statistical characteristics. Relocation changes a limited number of values in the data to enhance protection. Permutation mixes the values of quasi-identifiers between records while preserving the overall statistical features of the dataset. Perturbation adds noise to the data, increasing privacy. The idea of differential privacy also involves adding noise, but this is done at the query processing stage. Generating synthetic data allows the creation of new datasets that are similar in characteristics to the original data.

Author Biographies

Denys Kalinin, National Technical University "Kharkiv Polytechnic Institute"

Postgraduate of Department of System Analysis and Information-Analytical Technologies, National Technical University "Kharkiv Polytechnic Institute", Kharkiv, Ukraine

Valerii Severyn, National Technical University "Kharkiv Polytechnic Institute"

Doctor of Technical Sciences, Professor, Professor of Department System Analysis and Information-Analytical Technologies, National Technical University «Kharkiv Polytechnic Institute», Kharkiv, Ukraine

Mykola Bezmenov, National Technical University "Kharkiv Polytechnic Institute"

Candidate of Technical Sciences (PhD), Docent, Professor of the Department of System Analysis and Information-Analytical Technologies, National Technical University "Kharkiv Polytechnic Institute", Kharkiv, Ukraine


