6 research outputs found
Complementing privacy and utility trade-off with self-organising maps
This research received no external funding.Peer reviewedPublisher PD
Complementing Privacy and Utility Trade-Off with Self-Organising Maps
open access articleIn recent years, data-enabled technologies have intensified the rate and scale at which organisations
collect and analyse data. Data mining techniques are applied to realise the full potential
of large-scale data analysis. These techniques are highly efficient in sifting through big data to extract
hidden knowledge and assist evidence-based decisions, offering significant benefits to their adopters.
However, this capability is constrained by important legal, ethical and reputational concerns. These
concerns arise because they can be exploited to allow inferences to be made on sensitive data, thus
posing severe threats to individuals’ privacy. Studies have shown Privacy-Preserving Data Mining
(PPDM) can adequately address this privacy risk and permit knowledge extraction in mining processes.
Several published works in this area have utilised clustering techniques to enforce anonymisation
models on private data, which work by grouping the data into clusters using a quality measure and
generalising the data in each group separately to achieve an anonymisation threshold. However, existing
approaches do not work well with high-dimensional data, since it is difficult to develop good groupings
without incurring excessive information loss. Our work aims to complement this balancing act by
optimising utility in PPDMprocesses. To illustrate this, we propose a hybrid approach, that combines
self-organising maps with conventional privacy-based clustering algorithms. We demonstrate through
experimental evaluation, that results from our approach produce more utility for data mining tasks and
outperforms conventional privacy-based clustering algorithms. This approach can significantly enable
large-scale analysis of data in a privacy-preserving and trustworthy manner
Data utility and privacy protection in data publishing
Data about individuals is being increasingly collected and disseminated for purposes such as business analysis and medical research. This has raised some privacy concerns. In response, a number of techniques have been proposed which attempt to transform data prior to its release so that sensitive information about the individuals contained within it is protected. A:-Anonymisation is one such technique that has attracted much recent attention from the database research community. A:-Anonymisation works by transforming data in such a way that each record is made identical to at least A: 1 other records with respect to those attributes that are likely to be used to identify individuals. This helps prevent sensitive information associated with individuals from being disclosed, as each individual is represented by at least A: records in the dataset. Ideally, a /c-anonymised dataset should maximise both data utility and privacy protection, i.e. it should allow intended data analytic tasks to be carried out without loss of accuracy while preventing sensitive information disclosure, but these two notions are conflicting and only a trade-off between them can be achieved in practice. The existing works, however, focus on how either utility or protection requirement may be satisfied, which often result in anonymised data with an unnecessarily and/or unacceptably low level of utility or protection. In this thesis, we study how to construct /-anonymous data that satisfies both data utility and privacy protection requirements. We propose new criteria to capture utility and protection requirements, and new algorithms that allow A:-anonymisations with required utility/protection trade-off or guarantees to be generated. Our extensive experiments using both benchmarking and synthetic datasets show that our methods are efficient, can produce A:-anonymised data with desired properties, and outperform the state of the art methods in retaining data utility and providing privacy protection
Clustering-based K-anonymisation algorithms
K-anonymisation is an approach to protecting private information contained within a dataset. Many k-anonymisation methods have been proposed recently and one class of such methods are clustering-based. These methods are able to achieve high quality anonymisations and thus have a great application potential. However, existing clustering-based techniques use different quality measures and employ different data grouping strategies, and their comparative quality and performance are unclear. In this paper, we present and experimentally evaluate a family of clustering-based k-anonymisation algorithms in terms of data utility, privacy protection and processing efficiency
Speeding up clustering-based k-anonymisation algorithms with pre-partitioning
K-anonymisation is a technique for protecting privacy contained within a dataset. Many k-anonymisation algorithms have been proposed, and one class of such algorithms are clustering-based. These algorithms can offer high quality solutions, but are rather inefficient to execute. In this paper, we propose a method that partitions a dataset into groups first and then clusters the data within each group for k-anonymisation. Our experiments show that combining partitioning with clustering can improve the performance of clustering-based k-anonymisation algorithms significantly while maintaining the quality of anonymisations they produce