561 research outputs found
User's Privacy in Recommendation Systems Applying Online Social Network Data, A Survey and Taxonomy
Recommender systems have become an integral part of many social networks and
extract knowledge from a user's personal and sensitive data both explicitly,
with the user's knowledge, and implicitly. This trend has created major privacy
concerns as users are mostly unaware of what data and how much data is being
used and how securely it is used. In this context, several works have been done
to address privacy concerns for usage in online social network data and by
recommender systems. This paper surveys the main privacy concerns, measurements
and privacy-preserving techniques used in large-scale online social networks
and recommender systems. It is based on historical works on security,
privacy-preserving, statistical modeling, and datasets to provide an overview
of the technical difficulties and problems associated with privacy preserving
in online social networks.Comment: 26 pages, IET book chapter on big data recommender system
Utility Promises of Self-Organising Maps in Privacy Preserving Data Mining
Data mining techniques are highly efficient in sifting through big data to extract hidden knowledge and assist evidence-based decisions. However, it poses severe threats to individuals’ privacy because it can be exploited to allow inferences to be made on sensitive data. Researchers have proposed several privacy-preserving data mining techniques to address this challenge. One unique method is by extending anonymisation privacy models in data mining processes to enhance privacy and utility. Several published works in this area have utilised clustering techniques to enforce anonymisation models on private data, which work by grouping the data into clusters using a quality measure and then generalise the data in each group separately to achieve an anonymisation threshold. Although they are highly efficient and practical, however guaranteeing adequate balance between data utility and privacy protection remains a challenge. In addition to this, existing approaches do not work well with high-dimensional data, since it is difficult to develop good groupings without incurring excessive information loss. Our work aims to overcome these challenges by proposing a hybrid approach, combining self organising maps with conventional privacy based clustering algorithms. The main contribution of this paper is to show that, dimensionality reduction techniques can improve the anonymisation process by incurring less information loss, thus producing a more desirable balance between privacy and utility properties
Technical Research Priorities for Big Data
To drive innovation and competitiveness, organisations need to foster the development and broad adoption of data technologies, value-adding use cases and sustainable business models. Enabling an effective data ecosystem requires overcoming several technical challenges associated with the cost and complexity of management, processing, analysis and utilisation of data. This chapter details a community-driven initiative to identify and characterise the key technical research priorities for research and development in data technologies. The chapter examines the systemic and structured methodology used to gather inputs from over 200 stakeholder organisations. The result of the process identified five key technical research priorities in the areas of data management, data processing, data analytics, data visualisation and user interactions, and data protection, together with 28 sub-level challenges. The process also highlighted the important role of data standardisation, data engineering and DevOps for Big Data
Big data analytics: balancing individuals’ privacy rights andbusiness interests
This research thesis analyses and discusses the importance of having a legal framework that can control and manage the use of data during the Big Data analysis process.
The thesis firstly examines the data analytics technologies, such as Hadoop Distributed File System (HDFS) and the technologies that are used to protect data during the analytics process. Then there is an examination of the legal principles that are part of the new General Data Protection Regulation (GDPR), and the other laws that are in place in order to manage the new era of Big Data analytics. Both the legal principles Chapter and data analytics Chapter are part of the literature review.
The IT section of the literature review begins with an analysis of the data analytics technologies, such as HDFS and Map-Reduce. The second part consists of the technologies to protect privacy, especially with respect to protection during the data generation phase. Furthermore, there is a discussion on whether these current technologies are good enough to provide protection for personal data in the Big Data age.
The legal section of the literature review starts by discussing some risk mitigation schemes that can be used to help individuals protect their data. This is followed by an analysis of consent issues in the Big Data era and later by an examination of the important legal principles that can help to control the Big Data process and ultimately protect individuals’ personal data.
The motivation for carrying out this research was to examine how Big Data could have an effect on ordinary individuals, specifically with respect to how their data and privacy could be infringed during the data analytics process. This was done by bringing together the Big Data worlds from the legal and technological perspective. Also, by hearing the thoughts and views of those individuals who could be affected, and hearing from the experts who could shine a light on the realities in the Big Data era.
The research includes the analysis and results of three surveys, constituting over 100 respondents, who expressed their views on a number of issues, including their fears about privacy online. This included a survey of mainly closed questions for students at Canterbury Christ Church University, a survey monkey survey for students at University College Cork, in Ireland and finally a survey for students in Sri Lanka.
Questions were posed to some experts in areas of IT law and Big Data analytics and security. The results of these interviews were analysed and discussed, producing much debate with respect to what can be done to manage and protect citizens’ personal data privacy in the age of Big Data analytics. The software packages Statistical Package for the Social Sciences (SPSS) and Minitab were used to analyse the results of the surveys, while Qualitative Data Analysis Miner (QDA miner) software was used to analyse the results of the interviews
A Comprehensive Bibliometric Analysis on Social Network Anonymization: Current Approaches and Future Directions
In recent decades, social network anonymization has become a crucial research
field due to its pivotal role in preserving users' privacy. However, the high
diversity of approaches introduced in relevant studies poses a challenge to
gaining a profound understanding of the field. In response to this, the current
study presents an exhaustive and well-structured bibliometric analysis of the
social network anonymization field. To begin our research, related studies from
the period of 2007-2022 were collected from the Scopus Database then
pre-processed. Following this, the VOSviewer was used to visualize the network
of authors' keywords. Subsequently, extensive statistical and network analyses
were performed to identify the most prominent keywords and trending topics.
Additionally, the application of co-word analysis through SciMAT and the
Alluvial diagram allowed us to explore the themes of social network
anonymization and scrutinize their evolution over time. These analyses
culminated in an innovative taxonomy of the existing approaches and
anticipation of potential trends in this domain. To the best of our knowledge,
this is the first bibliometric analysis in the social network anonymization
field, which offers a deeper understanding of the current state and an
insightful roadmap for future research in this domain.Comment: 73 pages, 28 figure
Energy cost and machine learning accuracy impact of k-anonymisation and synthetic data techniques
To address increasing societal concerns regarding privacy and climate, the EU
adopted the General Data Protection Regulation (GDPR) and committed to the
Green Deal. Considerable research studied the energy efficiency of software and
the accuracy of machine learning models trained on anonymised data sets. Recent
work began exploring the impact of privacy-enhancing techniques (PET) on both
the energy consumption and accuracy of the machine learning models, focusing on
k-anonymity. As synthetic data is becoming an increasingly popular PET, this
paper analyses the energy consumption and accuracy of two phases: a) applying
privacy-enhancing techniques to the concerned data set, b) training the models
on the concerned privacy-enhanced data set. We use two privacy-enhancing
techniques: k-anonymisation (using generalisation and suppression) and
synthetic data, and three machine-learning models. Each model is trained on
each privacy-enhanced data set. Our results show that models trained on
k-anonymised data consume less energy than models trained on the original data,
with a similar performance regarding accuracy. Models trained on synthetic data
have a similar energy consumption and a similar to lower accuracy compared to
models trained on the original data.Comment: Published in the proceedings (Pages: 57-65) of The International
Conference on Information and Communications Technology for Sustainability
(ICT4S) 2023 in Rennes, France. 9 pages, 4 figures, 5 table
On supporting K-anonymisation and L-diversity of crime databases with genetic algorithms in a resource constrained environment
The social benefits derived from analysing crime data need to be weighed against issues relating to privacy loss. To facilitate such analysis of crime data Burke and Kayem [7] proposed a framework (MCRF) to enable mobile crime reporting in a developing country. Here crimes are reported via mobile phones and stored in a database owned by a law enforcement agency. The expertise required to perform analysis on the crime data is however unlikely to be available within the law enforcement agency. Burke and Kayem [7] proposed anonymising the data(using manual input parameters) at the law enforcement agency before sending it to a third party for analysis. Whilst analysis of the crime data requires expertise, adequate skill to appropriately anonymise the data is also required. What is lacking in the original MCRF is therefore an automated scheme for the law enforcement agency to adequately anonymise the data before sending it to the third party. This should, however, be done whilst maximising information utility of the anonymised data from the perspective of the third party. In this thesis we introduce a crime severity scale to facilitate the automation of data anonymisation within the MCRF. We consider a modified loss metric to capture information loss incurred during the anonymisation process. This modified loss metric also gives third party users the flexibility to specify attributes of the anonymised data when requesting data from the law enforcement agency. We employ a genetic algorithm(GA) approach called "Crime Genes"(CG) to optimise utility of the anonymised data based on our modified loss metric whilst adhering to notions of privacy denned by k-anonymity and l-diversity. Our CG implementation is modular and can therefore be easily integrated with the original MCRF. We also show how our CG approach is designed to be suitable for implementation in a developing country where particular resource constraints exist
- …