561 research outputs found

    User's Privacy in Recommendation Systems Applying Online Social Network Data, A Survey and Taxonomy

    Full text link
    Recommender systems have become an integral part of many social networks and extract knowledge from a user's personal and sensitive data both explicitly, with the user's knowledge, and implicitly. This trend has created major privacy concerns as users are mostly unaware of what data and how much data is being used and how securely it is used. In this context, several works have been done to address privacy concerns for usage in online social network data and by recommender systems. This paper surveys the main privacy concerns, measurements and privacy-preserving techniques used in large-scale online social networks and recommender systems. It is based on historical works on security, privacy-preserving, statistical modeling, and datasets to provide an overview of the technical difficulties and problems associated with privacy preserving in online social networks.Comment: 26 pages, IET book chapter on big data recommender system

    Utility Promises of Self-Organising Maps in Privacy Preserving Data Mining

    Get PDF
    Data mining techniques are highly efficient in sifting through big data to extract hidden knowledge and assist evidence-based decisions. However, it poses severe threats to individuals’ privacy because it can be exploited to allow inferences to be made on sensitive data. Researchers have proposed several privacy-preserving data mining techniques to address this challenge. One unique method is by extending anonymisation privacy models in data mining processes to enhance privacy and utility. Several published works in this area have utilised clustering techniques to enforce anonymisation models on private data, which work by grouping the data into clusters using a quality measure and then generalise the data in each group separately to achieve an anonymisation threshold. Although they are highly efficient and practical, however guaranteeing adequate balance between data utility and privacy protection remains a challenge. In addition to this, existing approaches do not work well with high-dimensional data, since it is difficult to develop good groupings without incurring excessive information loss. Our work aims to overcome these challenges by proposing a hybrid approach, combining self organising maps with conventional privacy based clustering algorithms. The main contribution of this paper is to show that, dimensionality reduction techniques can improve the anonymisation process by incurring less information loss, thus producing a more desirable balance between privacy and utility properties

    Technical Research Priorities for Big Data

    Get PDF
    To drive innovation and competitiveness, organisations need to foster the development and broad adoption of data technologies, value-adding use cases and sustainable business models. Enabling an effective data ecosystem requires overcoming several technical challenges associated with the cost and complexity of management, processing, analysis and utilisation of data. This chapter details a community-driven initiative to identify and characterise the key technical research priorities for research and development in data technologies. The chapter examines the systemic and structured methodology used to gather inputs from over 200 stakeholder organisations. The result of the process identified five key technical research priorities in the areas of data management, data processing, data analytics, data visualisation and user interactions, and data protection, together with 28 sub-level challenges. The process also highlighted the important role of data standardisation, data engineering and DevOps for Big Data

    Big data analytics: balancing individuals’ privacy rights andbusiness interests

    Get PDF
    This research thesis analyses and discusses the importance of having a legal framework that can control and manage the use of data during the Big Data analysis process. The thesis firstly examines the data analytics technologies, such as Hadoop Distributed File System (HDFS) and the technologies that are used to protect data during the analytics process. Then there is an examination of the legal principles that are part of the new General Data Protection Regulation (GDPR), and the other laws that are in place in order to manage the new era of Big Data analytics. Both the legal principles Chapter and data analytics Chapter are part of the literature review. The IT section of the literature review begins with an analysis of the data analytics technologies, such as HDFS and Map-Reduce. The second part consists of the technologies to protect privacy, especially with respect to protection during the data generation phase. Furthermore, there is a discussion on whether these current technologies are good enough to provide protection for personal data in the Big Data age. The legal section of the literature review starts by discussing some risk mitigation schemes that can be used to help individuals protect their data. This is followed by an analysis of consent issues in the Big Data era and later by an examination of the important legal principles that can help to control the Big Data process and ultimately protect individuals’ personal data. The motivation for carrying out this research was to examine how Big Data could have an effect on ordinary individuals, specifically with respect to how their data and privacy could be infringed during the data analytics process. This was done by bringing together the Big Data worlds from the legal and technological perspective. Also, by hearing the thoughts and views of those individuals who could be affected, and hearing from the experts who could shine a light on the realities in the Big Data era. The research includes the analysis and results of three surveys, constituting over 100 respondents, who expressed their views on a number of issues, including their fears about privacy online. This included a survey of mainly closed questions for students at Canterbury Christ Church University, a survey monkey survey for students at University College Cork, in Ireland and finally a survey for students in Sri Lanka. Questions were posed to some experts in areas of IT law and Big Data analytics and security. The results of these interviews were analysed and discussed, producing much debate with respect to what can be done to manage and protect citizens’ personal data privacy in the age of Big Data analytics. The software packages Statistical Package for the Social Sciences (SPSS) and Minitab were used to analyse the results of the surveys, while Qualitative Data Analysis Miner (QDA miner) software was used to analyse the results of the interviews

    A Comprehensive Bibliometric Analysis on Social Network Anonymization: Current Approaches and Future Directions

    Full text link
    In recent decades, social network anonymization has become a crucial research field due to its pivotal role in preserving users' privacy. However, the high diversity of approaches introduced in relevant studies poses a challenge to gaining a profound understanding of the field. In response to this, the current study presents an exhaustive and well-structured bibliometric analysis of the social network anonymization field. To begin our research, related studies from the period of 2007-2022 were collected from the Scopus Database then pre-processed. Following this, the VOSviewer was used to visualize the network of authors' keywords. Subsequently, extensive statistical and network analyses were performed to identify the most prominent keywords and trending topics. Additionally, the application of co-word analysis through SciMAT and the Alluvial diagram allowed us to explore the themes of social network anonymization and scrutinize their evolution over time. These analyses culminated in an innovative taxonomy of the existing approaches and anticipation of potential trends in this domain. To the best of our knowledge, this is the first bibliometric analysis in the social network anonymization field, which offers a deeper understanding of the current state and an insightful roadmap for future research in this domain.Comment: 73 pages, 28 figure

    Energy cost and machine learning accuracy impact of k-anonymisation and synthetic data techniques

    Full text link
    To address increasing societal concerns regarding privacy and climate, the EU adopted the General Data Protection Regulation (GDPR) and committed to the Green Deal. Considerable research studied the energy efficiency of software and the accuracy of machine learning models trained on anonymised data sets. Recent work began exploring the impact of privacy-enhancing techniques (PET) on both the energy consumption and accuracy of the machine learning models, focusing on k-anonymity. As synthetic data is becoming an increasingly popular PET, this paper analyses the energy consumption and accuracy of two phases: a) applying privacy-enhancing techniques to the concerned data set, b) training the models on the concerned privacy-enhanced data set. We use two privacy-enhancing techniques: k-anonymisation (using generalisation and suppression) and synthetic data, and three machine-learning models. Each model is trained on each privacy-enhanced data set. Our results show that models trained on k-anonymised data consume less energy than models trained on the original data, with a similar performance regarding accuracy. Models trained on synthetic data have a similar energy consumption and a similar to lower accuracy compared to models trained on the original data.Comment: Published in the proceedings (Pages: 57-65) of The International Conference on Information and Communications Technology for Sustainability (ICT4S) 2023 in Rennes, France. 9 pages, 4 figures, 5 table

    On supporting K-anonymisation and L-diversity of crime databases with genetic algorithms in a resource constrained environment

    Get PDF
    The social benefits derived from analysing crime data need to be weighed against issues relating to privacy loss. To facilitate such analysis of crime data Burke and Kayem [7] proposed a framework (MCRF) to enable mobile crime reporting in a developing country. Here crimes are reported via mobile phones and stored in a database owned by a law enforcement agency. The expertise required to perform analysis on the crime data is however unlikely to be available within the law enforcement agency. Burke and Kayem [7] proposed anonymising the data(using manual input parameters) at the law enforcement agency before sending it to a third party for analysis. Whilst analysis of the crime data requires expertise, adequate skill to appropriately anonymise the data is also required. What is lacking in the original MCRF is therefore an automated scheme for the law enforcement agency to adequately anonymise the data before sending it to the third party. This should, however, be done whilst maximising information utility of the anonymised data from the perspective of the third party. In this thesis we introduce a crime severity scale to facilitate the automation of data anonymisation within the MCRF. We consider a modified loss metric to capture information loss incurred during the anonymisation process. This modified loss metric also gives third party users the flexibility to specify attributes of the anonymised data when requesting data from the law enforcement agency. We employ a genetic algorithm(GA) approach called "Crime Genes"(CG) to optimise utility of the anonymised data based on our modified loss metric whilst adhering to notions of privacy denned by k-anonymity and l-diversity. Our CG implementation is modular and can therefore be easily integrated with the original MCRF. We also show how our CG approach is designed to be suitable for implementation in a developing country where particular resource constraints exist

    Cybersecurity Research: Challenges and Course of Action

    Get PDF
    • …
    corecore