15,195 research outputs found

    Non-parametric Methods for Correlation Analysis in Multivariate Data with Applications in Data Mining

    Get PDF
    In this thesis, we develop novel methods for correlation analysis in multivariate data, with a special focus on mining correlated subspaces. Our methods handle major open challenges arisen when combining correlation analysis with subspace mining. Besides traditional correlation analysis, we explore interaction-preserving discretization of multivariate data and causality analysis. We conduct experiments on a variety of real-world data sets. The results validate the benefits of our methods

    Robustness and Explainability of Artificial Intelligence

    Get PDF
    In the light of the recent advances in artificial intelligence (AI), the serious negative consequences of its use for EU citizens and organisations have led to multiple initiatives from the European Commission to set up the principles of a trustworthy and secure AI. Among the identified requirements, the concepts of robustness and explainability of AI systems have emerged as key elements for a future regulation of this technology. This Technical Report by the European Commission Joint Research Centre (JRC) aims to contribute to this movement for the establishment of a sound regulatory framework for AI, by making the connection between the principles embodied in current regulations regarding to the cybersecurity of digital systems and the protection of data, the policy activities concerning AI, and the technical discussions within the scientific community of AI, in particular in the field of machine learning, that is largely at the origin of the recent advancements of this technology. The individual objectives of this report are to provide a policy-oriented description of the current perspectives of AI and its implications in society, an objective view on the current landscape of AI, focusing of the aspects of robustness and explainability. This also include a technical discussion of the current risks associated with AI in terms of security, safety, and data protection, and a presentation of the scientific solutions that are currently under active development in the AI community to mitigate these risks. This report puts forward several policy-related considerations for the attention of policy makers to establish a set of standardisation and certification tools for AI. First, the development of methodologies to evaluate the impacts of AI on society, built on the model of the Data Protection Impact Assessments (DPIA) introduced in the General Data Protection Regulation (GDPR), is discussed. Secondly, a focus is made on the establishment of methodologies to assess the robustness of systems that would be adapted to the context of use. This would come along with the identification of known vulnerabilities of AI systems, and the technical solutions that have been proposed in the scientific community to address them. Finally, the promotion of transparency systems in sensitive systems is discussed, through the implementation of explainability-by-design approaches in AI components that would provide guarantee of the respect of the fundamental rights.JRC.E.3-Cyber and Digital Citizens' Securit

    Understanding and controlling leakage in machine learning

    Get PDF
    Machine learning models are being increasingly adopted in a variety of real-world scenarios. However, the privacy and confidentiality implications introduced in these scenarios are not well understood. Towards better understanding such implications, we focus on scenarios involving interactions between numerous parties prior to, during, and after training relevant models. Central to these interactions is sharing information for a purpose e.g., contributing data samples towards a dataset, returning predictions via an API. This thesis takes a step toward understanding and controlling leakage of private information during such interactions. In the first part of the thesis we investigate leakage of private information in visual data and specifically, photos representative of content shared on social networks. There is a long line of work to tackle leakage of personally identifiable information in social photos, especially using face- and body-level visual cues. However, we argue this presents only a narrow perspective as images reveal a wide spectrum of multimodal private information (e.g., disabilities, name-tags). Consequently, we work towards a Visual Privacy Advisor that aims to holistically identify and mitigate private risks when sharing social photos. In the second part, we address leakage during training of ML models. We observe learning algorithms are being increasingly used to train models on rich decentralized datasets e.g., personal data on numerous mobile devices. In such cases, information in the form of high-dimensional model parameter updates are anonymously aggregated from participating individuals. However, we find that the updates encode sufficient identifiable information and allows them to be linked back to participating individuals. We additionally propose methods to mitigate this leakage while maintaining high utility of the updates. In the third part, we discuss leakage of confidential information during inference time of black-box models. In particular, we find models lend themselves to model functionality stealing attacks: an adversary can interact with the black-box model towards creating a replica `knock-off' model that exhibits similar test-set performances. As such attacks pose a severe threat to the intellectual property of the model owner, we also work towards effective defenses. Our defense strategy by introducing bounded and controlled perturbations to predictions can significantly amplify model stealing attackers' error rates. In summary, this thesis advances understanding of privacy leakage when information is shared in raw visual forms, during training of models, and at inference time when deployed as black-boxes. In each of the cases, we further propose techniques to mitigate leakage of information to enable wide-spread adoption of techniques in real-world scenarios.Modelle für maschinelles Lernen werden zunehmend in einer Vielzahl realer Szenarien eingesetzt. Die in diesen Szenarien vorgestellten Auswirkungen auf Datenschutz und Vertraulichkeit wurden jedoch nicht vollständig untersucht. Um solche Implikationen besser zu verstehen, konzentrieren wir uns auf Szenarien, die Interaktionen zwischen mehreren Parteien vor, während und nach dem Training relevanter Modelle beinhalten. Das Teilen von Informationen für einen Zweck, z. B. das Einbringen von Datenproben in einen Datensatz oder die Rückgabe von Vorhersagen über eine API, ist zentral für diese Interaktionen. Diese Arbeit verhilft zu einem besseren Verständnis und zur Kontrolle des Verlusts privater Informationen während solcher Interaktionen. Im ersten Teil dieser Arbeit untersuchen wir den Verlust privater Informationen bei visuellen Daten und insbesondere bei Fotos, die für Inhalte repräsentativ sind, die in sozialen Netzwerken geteilt werden. Es gibt eine lange Reihe von Arbeiten, die das Problem des Verlustes persönlich identifizierbarer Informationen in sozialen Fotos angehen, insbesondere mithilfe visueller Hinweise auf Gesichts- und Körperebene. Wir argumentieren jedoch, dass dies nur eine enge Perspektive darstellt, da Bilder ein breites Spektrum multimodaler privater Informationen (z. B. Behinderungen, Namensschilder) offenbaren. Aus diesem Grund arbeiten wir auf einen Visual Privacy Advisor hin, der darauf abzielt, private Risiken beim Teilen sozialer Fotos ganzheitlich zu identifizieren und zu minimieren. Im zweiten Teil befassen wir uns mit Datenverlusten während des Trainings von ML-Modellen. Wir beobachten, dass zunehmend Lernalgorithmen verwendet werden, um Modelle auf umfangreichen dezentralen Datensätzen zu trainieren, z. B. persönlichen Daten auf zahlreichen Mobilgeräten. In solchen Fällen werden Informationen von teilnehmenden Personen in Form von hochdimensionalen Modellparameteraktualisierungen anonym verbunden. Wir stellen jedoch fest, dass die Aktualisierungen ausreichend identifizierbare Informationen codieren und es ermöglichen, sie mit teilnehmenden Personen zu verknüpfen. Wir schlagen zudem Methoden vor, um diesen Datenverlust zu verringern und gleichzeitig die hohe Nützlichkeit der Aktualisierungen zu erhalten. Im dritten Teil diskutieren wir den Verlust vertraulicher Informationen während der Inferenzzeit von Black-Box-Modellen. Insbesondere finden wir, dass sich Modelle für die Entwicklung von Angriffen, die auf Funktionalitätsdiebstahl abzielen, eignen: Ein Gegner kann mit dem Black-Box-Modell interagieren, um ein Replikat-Knock-Off-Modell zu erstellen, das ähnliche Test-Set-Leistungen aufweist. Da solche Angriffe eine ernsthafte Bedrohung für das geistige Eigentum des Modellbesitzers darstellen, arbeiten wir auch an einer wirksamen Verteidigung. Unsere Verteidigungsstrategie durch die Einführung begrenzter und kontrollierter Störungen in Vorhersagen kann die Fehlerraten von Modelldiebstahlangriffen erheblich verbessern. Zusammenfassend lässt sich sagen, dass diese Arbeit das Verständnis von Datenschutzverlusten beim Informationsaustausch verbessert, sei es bei rohen visuellen Formen, während des Trainings von Modellen oder während der Inferenzzeit von Black-Box-Modellen. In jedem Fall schlagen wir ferner Techniken zur Verringerung des Informationsverlusts vor, um eine weit verbreitete Anwendung von Techniken in realen Szenarien zu ermöglichen.Max Planck Institute for Informatic

    Privacy Management in Smart Environments

    Get PDF
    This thesis addresses the issue of managing privacy in smart environments, while emphasizing problems and solutions in context of interpersonal privacy. It elaborates different concepts of privacy and how smart environments interfere with these concepts. In this context this work develops solutions to understand patterns of interpersonal privacy management, to orchestrate different disclosure control methods to a composite disclosure control system, and to automate disclosure decisions using machine learning techniques.Diese Arbeit befasst sich mit dem Umgang von privaten Daten in intelligenten Umgebungen, speziell im Kontext von sozialen Interaktionen. Es werden verschiedene Konzepte des Begriffes "Privacy" erarbeitet und aufgezeigt, welche Konflikte in intelligenten Umgebungen daraus resultieren. Entsprechend werden Lösungen erarbeitet, um Muster der Informationsfreigabe in sozialen Interaktionen zu erkennen, verschiedene Methoden der Freigabekontrolle zu einer integrierten Freigabekontrolle zu kombinieren und um Freigabeentscheidungen mit maschinellen Lernverfahren vorherzusagen

    Ontology based data warehousing for mining of heterogeneous and multidimensional data sources

    Get PDF
    Heterogeneous and multidimensional big-data sources are virtually prevalent in all business environments. System and data analysts are unable to fast-track and access big-data sources. A robust and versatile data warehousing system is developed, integrating domain ontologies from multidimensional data sources. For example, petroleum digital ecosystems and digital oil field solutions, derived from big-data petroleum (information) systems, are in increasing demand in multibillion dollar resource businesses worldwide. This work is recognized by Industrial Electronic Society of IEEE and appeared in more than 50 international conference proceedings and journals

    Deep Learning Techniques for Music Generation -- A Survey

    Full text link
    This paper is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content. We propose a methodology based on five dimensions for our analysis: Objective - What musical content is to be generated? Examples are: melody, polyphony, accompaniment or counterpoint. - For what destination and for what use? To be performed by a human(s) (in the case of a musical score), or by a machine (in the case of an audio file). Representation - What are the concepts to be manipulated? Examples are: waveform, spectrogram, note, chord, meter and beat. - What format is to be used? Examples are: MIDI, piano roll or text. - How will the representation be encoded? Examples are: scalar, one-hot or many-hot. Architecture - What type(s) of deep neural network is (are) to be used? Examples are: feedforward network, recurrent network, autoencoder or generative adversarial networks. Challenge - What are the limitations and open challenges? Examples are: variability, interactivity and creativity. Strategy - How do we model and control the process of generation? Examples are: single-step feedforward, iterative feedforward, sampling or input manipulation. For each dimension, we conduct a comparative analysis of various models and techniques and we propose some tentative multidimensional typology. This typology is bottom-up, based on the analysis of many existing deep-learning based systems for music generation selected from the relevant literature. These systems are described and are used to exemplify the various choices of objective, representation, architecture, challenge and strategy. The last section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P. Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music Generation, Computational Synthesis and Creative Systems, Springer, 201
    corecore