Feature Based Data Anonymization for High Dimensional Data

Gachanga, Esther; Kimwele, Michael; Nderu, Lawrence

Feature Based Data Anonymization for High Dimensional Data

Authors: Esther Gachanga
Michael Kimwele
Lawrence Nderu
Publication date: 28 April 2019
Publisher: The International Institute for Science, Technology and Education (IISTE)

Abstract

Information surges and advances in machine learning tools have enable the collection and storage of large amounts of data. These data are highly dimensional. Individuals are deeply concerned about the consequences of sharing and publishing these data as it may contain their personal information and may compromise their privacy. Anonymization techniques have been used widely to protect sensitive information in published datasets. However, the anonymization of high dimensional data while balancing between privacy and utility is a challenge. In this paper we use feature selection with information gain and ranking to demonstrate that the challenge of high dimensionality in data can be addressed by anonymizing attributes with more irrelevant features. We conduct experiments with real life datasets and build classifiers with the anonymized datasets. Our results show that by combining feature selection with slicing and reducing the amount of data distortion for features with high relevance in a dataset, the utility of anonymized dataset can be enhanced. Keywords: High Dimension, Privacy, Anonymization, Feature Selection, Classifier, Utility DOI: 10.7176/JIEA/9-2-03 Publication date: April 30th 201

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

International Institute for Science, Technology and Education (IISTE): E-Journals

oai:ojs.localhost:article/4734...

Last time updated on 30/10/2019