13 research outputs found

    Privacy-Preserving Record Linkage for Big Data: Current Approaches and Research Challenges

    Get PDF
    The growth of Big Data, especially personal data dispersed in multiple data sources, presents enormous opportunities and insights for businesses to explore and leverage the value of linked and integrated data. However, privacy concerns impede sharing or exchanging data for linkage across different organizations. Privacy-preserving record linkage (PPRL) aims to address this problem by identifying and linking records that correspond to the same real-world entity across several data sources held by different parties without revealing any sensitive information about these entities. PPRL is increasingly being required in many real-world application areas. Examples range from public health surveillance to crime and fraud detection, and national security. PPRL for Big Data poses several challenges, with the three major ones being (1) scalability to multiple large databases, due to their massive volume and the flow of data within Big Data applications, (2) achieving high quality results of the linkage in the presence of variety and veracity of Big Data, and (3) preserving privacy and confidentiality of the entities represented in Big Data collections. In this chapter, we describe the challenges of PPRL in the context of Big Data, survey existing techniques for PPRL, and provide directions for future research.This work was partially funded by the Australian Research Council under Discovery Project DP130101801, the German Academic Exchange Service (DAAD) and Universities Australia (UA) under the Joint Research Co-operation Scheme, and also funded by the German Federal Ministry of Education and Research within the project Competence Center for Scalable Data Services and Solutions (ScaDS) Dresden/Leipzig (BMBF 01IS14014B)

    Clustering Approaches for Multi-source Entity Resolution

    Get PDF
    Entity Resolution (ER) or deduplication aims at identifying entities, such as specific customer or product descriptions, in one or several data sources that refer to the same real-world entity. ER is of key importance for improving data quality and has a crucial role in data integration and querying. The previous generation of ER approaches focus on integrating records from two relational databases or performing deduplication within a single database. Nevertheless, in the era of Big Data the number of available data sources is increasing rapidly. Therefore, large-scale data mining or querying systems need to integrate data obtained from numerous sources. For example, in online digital libraries or E-Shops, publications or products are incorporated from a large number of archives or suppliers across the world or within a specified region or country to provide a unified view for the user. This process requires data consolidation from numerous heterogeneous data sources, which are mostly evolving. By raising the number of sources, data heterogeneity and velocity as well as the variance in data quality is increased. Therefore, multi-source ER, i.e. finding matching entities in an arbitrary number of sources, is a challenging task. Previous efforts for matching and clustering entities between multiple sources (> 2) mostly treated all sources as a single source. This approach excludes utilizing metadata or provenance information for enhancing the integration quality and leads up to poor results due to ignorance of the discrepancy between quality of sources. The conventional ER pipeline consists of blocking, pair-wise matching of entities, and classification. In order to meet the new needs and requirements, holistic clustering approaches that are capable of scaling to many data sources are needed. The holistic clustering-based ER should further overcome the restriction of pairwise linking of entities by making the process capable of grouping entities from multiple sources into clusters. The clustering step aims at removing false links while adding missing true links across sources. Additionally, incremental clustering and repairing approaches need to be developed to cope with the ever-increasing number of sources and new incoming entities. To this end, we developed novel clustering and repairing schemes for multi-source entity resolution. The approaches are capable of grouping entities from multiple clean (duplicate-free) sources, as well as handling data from an arbitrary combination of clean and dirty sources. The multi-source clustering schemes exclusively developed for multi-source ER can obtain superior results compared to general purpose clustering algorithms. Additionally, we developed incremental clustering and repairing methods in order to handle the evolving sources. The proposed incremental approaches are capable of incorporating new sources as well as new entities from existing sources. The more sophisticated approach is able to repair previously determined clusters, and consequently yields improved quality and a reduced dependency on the insert order of the new entities. To ensure scalability, the parallel variation of all approaches are implemented on top of the Apache Flink framework which is a distributed processing engine. The proposed methods have been integrated in a new end-to-end ER tool named FAMER (FAst Multi-source Entity Resolution system). The FAMER framework is comprised of Linking and Clustering components encompassing both batch and incremental ER functionalities. The output of Linking part is recorded as a similarity graph where each vertex represents an entity and each edge maintains the similarity relationship between two entities. Such a similarity graph is the input of the Clustering component. The comprehensive comparative evaluations overall show that the proposed clustering and repairing approaches for both batch and incremental ER achieve high quality while maintaining the scalability

    Privacy-preserving systems around security, trust and identity

    Get PDF
    Data has proved to be the most valuable asset in a modern world of rapidly advancing technologies. Companies are trying to maximise their profits by getting valuable insights from collected data about people’s trends and behaviour which often can be considered personal and sensitive. Additionally, sophisticated adversaries often target organisations aiming to exfiltrate sensitive data to sell it to third parties or ask for ransom. Hence, the privacy assurance of the individual data producers is a matter of great importance who rely on simply trusting that the services they use took all the necessary countermeasures to protect them.Distributed ledger technology and its variants can securely store data and preserve its privacy with novel characteristics. Additionally, the concept of self-sovereign identity, which gives the control back to the data subjects, is an expected future step once these approaches mature further. Last but not least, big data analysis typically occurs through machine learning techniques. However, the security of these techniques is often questioned since adversaries aim to exploit them for their benefit.The aspect of security, privacy and trust is highlighted throughout this thesis which investigates several emerging technologies that aim to protect and analyse sensitive data compared to already existing systems, tools and approaches in terms of security guarantees and performance efficiency.The contributions of this thesis derive to i) the presentation of a novel distributed ledger infrastructure tailored to the domain name system, ii) the adaptation of this infrastructure to a critical healthcare use case, iii) the development of a novel self-sovereign identity healthcare scenario in which a data scientist analyses sensitive data stored in the premises of three hospitals, through a privacy-preserving machine learning approach, and iv) the thorough investigation of adversarial attacks that aim to exploit machine learning intrusion detection systems by “tricking” them to misclassify carefully crafted inputs such as malware identified as benign.A significant finding is that the security and privacy of data are often neglected since they do not directly impact people’s lives. It is common for the protection and confidentiality of systems, even of critical nature, to be an afterthought, which is considered merely after malicious intents occur. Further, emerging sets of technologies, tools, and approaches built with fundamental security and privacy principles, such as the distributed ledger technology, should be favoured by existing systems that can adopt them without significant changes and compromises. Additionally, it has been presented that the decentralisation of machine learning algorithms through self-sovereign identity technologies that provide novel end-to-end encrypted channels is possible without sacrificing the valuable utility of the original machine learning algorithms.However, a matter of great importance is that alongside technological advancements, adversaries are becoming more sophisticated in this area and are trying to exploit the aforementioned machine learning approaches and other similar ones for their benefit through various tools and approaches. Adversarial attacks pose a real threat to any machine learning algorithm and artificial intelligence technique, and their detection is challenging and often problematic. Hence, any security professional operating in this domain should consider the impact of these attacks and the protection countermeasures to combat or minimise them

    The State of World Fisheries and Aquaculture: Sustainability in Action

    Get PDF
    “The 2020 edition of The State of World Fisheries and Aquaculture continues to demonstrate the significant and growing role of fisheries and aquaculture in providing food, nutrition and employment. It also shows the major challenges ahead despite the progress made on a number of fronts. For example, there is growing evidence that when fisheries are properly managed, stocks are consistently above target levels or rebuilding, giving credibility to the fishery managers and governments around the world that are willing to take strong action. However, the report also demonstrates that the successes achieved in some countries and regions have not been sufficient to reverse the global trend of overfished stocks, indicating that in places where fisheries management is not in place, or is ineffective, the status of fish stocks is poor and deteriorating. This unequal progress highlights the urgent need to replicate and re-adapt successful policies and measures in the light of the realities and needs of specific fisheries. It calls for new mechanisms to support the effective implementation of policy and management regulations for sustainable fisheries and ecosystems, as the only solution to ensure fisheries around the world are sustainable

    Information technology and military performance

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Political Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 519-544).Militaries have long been eager to adopt the latest technology (IT) in a quest to improve knowledge of and control over the battlefield. At the same time, uncertainty and confusion have remained prominent in actual experience of war. IT usage sometimes improves knowledge, but it sometimes contributes to tactical blunders and misplaced hubris. As militaries invest intensively in IT, they also tend to develop larger headquarters staffs, depend more heavily on planning and intelligence, and employ a larger percentage of personnel in knowledge work rather than physical combat. Both optimists and pessimists about the so-called "revolution in military affairs" have tended to overlook the ways in which IT is profoundly and ambiguously embedded in everyday organizational life. Technocrats embrace IT to "lift the fog of war," but IT often becomes a source of breakdowns, misperception, and politicization. To describe the conditions under which IT usage improves or degrades organizational performance, this dissertation develops the notion of information friction, an aggregate measure of the intensity of organizational struggle to coordinate IT with the operational environment. It articulates hypotheses about how the structure of the external battlefield, internal bureaucratic politics, and patterns of human-computer interaction can either exacerbate or relieve friction, which thus degrades or improves performance. Technological determinism alone cannot account for the increasing complexity and variable performances of information phenomena. Information friction theory is empirically grounded in a participant-observation study of U.S. special operations in Iraq from 2007 to 2008. To test the external validity of insights gained through fieldwork in Iraq, an historical study of the 1940 Battle of Britain examines IT usage in a totally different structural, organizational, and technological context.(cont.) These paired cases show that high information friction, and thus degraded performance, can arise with sophisticated IT, while lower friction and impressive performance can occur with far less sophisticated networks. The social context, not just the quality of technology, makes all the difference. Many shorter examples from recent military history are included to illustrate concepts. This project should be of broad interest to students of organizational knowledge, IT, and military effectiveness.by Jon Randall Lindsay.Ph.D

    XVI Agricultural Science Congress 2023: Transformation of Agri-Food Systems for Achieving Sustainable Development Goals

    Get PDF
    The XVI Agricultural Science Congress being jointly organized by the National Academy of Agricultural Sciences (NAAS) and the Indian Council of Agricultural Research (ICAR) during 10-13 October 2023, at hotel Le Meridien, Kochi, is a mega event echoing the theme “Transformation of Agri-Food Systems for achieving Sustainable Development Goals”. ICAR-Central Marine Fisheries Research Institute takes great pride in hosting the XVI ASC, which will be the perfect point of convergence of academicians, researchers, students, farmers, fishers, traders, entrepreneurs, and other stakeholders involved in agri-production systems that ensure food and nutritional security for a burgeoning population. With impeding challenges like growing urbanization, increasing unemployment, growing population, increasing food demands, degradation of natural resources through human interference, climate change impacts and natural calamities, the challenges ahead for India to achieve the Sustainable Development Goals (SDGs) set out by the United Nations are many. The XVI ASC will provide an interface for dissemination of useful information across all sectors of stakeholders invested in developing India’s agri-food systems, not only to meet the SDGs, but also to ensure a stable structure on par with agri-food systems around the world. It is an honour to present this Book of Abstracts which is a compilation of a total of 668 abstracts that convey the results of R&D programs being done in India. The abstracts have been categorized under 10 major Themes – 1. Ensuring Food & Nutritional Security: Production, Consumption and Value addition; 2. Climate Action for Sustainable Agri-Food Systems; 3. Frontier Science and emerging Genetic Technologies: Genome, Breeding, Gene Editing; 4. Livestock-based Transformation of Food Systems; 5. Horticulture-based Transformation of Food Systems; 6. Aquaculture & Fisheries-based Transformation of Food Systems; 7. Nature-based Solutions for Sustainable AgriFood Systems; 8. Next Generation Technologies: Digital Agriculture, Precision Farming and AI-based Systems; 9. Policies and Institutions for Transforming Agri-Food Systems; 10. International Partnership for Research, Education and Development. This Book of Abstracts sets the stage for the mega event itself, which will see a flow of knowledge emanating from a zeal to transform and push India’s Agri-Food Systems to perform par excellence and achieve not only the SDGs of the UN but also to rise as a world leader in the sector. I thank and congratulate all the participants who have submitted abstracts for this mega event, and I also applaud the team that has strived hard to publish this Book of Abstracts ahead of the event. I wish all the delegates and participants a very vibrant and memorable time at the XVI ASC
    corecore