12 research outputs found

    Contribution to privacy-enhancing tecnologies for machine learning applications

    Get PDF
    For some time now, big data applications have been enabling revolutionary innovation in every aspect of our daily life by taking advantage of lots of data generated from the interactions of users with technology. Supported by machine learning and unprecedented computation capabilities, different entities are capable of efficiently exploiting such data to obtain significant utility. However, since personal information is involved, these practices raise serious privacy concerns. Although multiple privacy protection mechanisms have been proposed, there are some challenges that need to be addressed for these mechanisms to be adopted in practice, i.e., to be “usable” beyond the privacy guarantee offered. To start, the real impact of privacy protection mechanisms on data utility is not clear, thus an empirical evaluation of such impact is crucial. Moreover, since privacy is commonly obtained through the perturbation of large data sets, usable privacy technologies may require not only preservation of data utility but also efficient algorithms in terms of computation speed. Satisfying both requirements is key to encourage the adoption of privacy initiatives. Although considerable effort has been devoted to design less “destructive” privacy mechanisms, the utility metrics employed may not be appropriate, thus the wellness of such mechanisms would be incorrectly measured. On the other hand, despite the advent of big data, more efficient approaches are not being considered. Not complying with the requirements of current applications may hinder the adoption of privacy technologies. In the first part of this thesis, we address the problem of measuring the effect of k-anonymous microaggregation on the empirical utility of microdata. We quantify utility accordingly as the accuracy of classification models learned from microaggregated data, evaluated over original test data. Our experiments show that the impact of the de facto microaggregation standard on the performance of machine-learning algorithms is often minor for a variety of data sets. Furthermore, experimental evidence suggests that the traditional measure of distortion in the community of microdata anonymization may be inappropriate for evaluating the utility of microaggregated data. Secondly, we address the problem of preserving the empirical utility of data. By transforming the original data records to a different data space, our approach, based on linear discriminant analysis, enables k-anonymous microaggregation to be adapted to the application domain of data. To do this, first, data is rotated (projected) towards the direction of maximum discrimination and, second, scaled in this direction, penalizing distortion across the classification threshold. As a result, data utility is preserved in terms of the accuracy of machine learned models for a number of standardized data sets. Afterwards, we propose a mechanism to reduce the running time for the k-anonymous microaggregation algorithm. This is obtained by simplifying the internal operations of the original algorithm. Through extensive experimentation over multiple data sets, we show that the new algorithm gets significantly faster. Interestingly, this remarkable speedup factor is achieved with no additional loss of data utility.Les aplicacions de big data impulsen actualment una accelerada innovació aprofitant la gran quantitat d’informació generada a partir de les interaccions dels usuaris amb la tecnologia. Així, qualsevol entitat és capaç d'explotar eficientment les dades per obtenir utilitat, emprant aprenentatge automàtic i capacitats de còmput sense precedents. No obstant això, sorgeixen en aquest escenari serioses preocupacions pel que fa a la privacitat dels usuaris ja que hi ha informació personal involucrada. Tot i que s'han proposat diversos mecanismes de protecció, hi ha alguns reptes per a la seva adopció en la pràctica, és a dir perquè es puguin utilitzar. Per començar, l’impacte real d'aquests mecanismes en la utilitat de les dades no esta clar, raó per la qual la seva avaluació empírica és important. A més, considerant que actualment es manegen grans volums de dades, una privacitat usable requereix, no només preservació de la utilitat de les dades, sinó també algoritmes eficients en temes de temps de còmput. És clau satisfer tots dos requeriments per incentivar l’adopció de mesures de privacitat. Malgrat que hi ha diversos esforços per dissenyar mecanismes de privacitat menys "destructius", les mètriques d'utilitat emprades no serien apropiades, de manera que aquests mecanismes de protecció podrien estar sent incorrectament avaluats. D'altra banda, tot i l’adveniment del big data, la investigació existent no s’enfoca molt en millorar la seva eficiència. Lamentablement, si els requisits de les aplicacions actuals no es satisfan, s’obstaculitzarà l'adopció de tecnologies de privacitat. A la primera part d'aquesta tesi abordem el problema de mesurar l'impacte de la microagregació k-Gnónima en la utilitat empírica de microdades. Per això, quantifiquem la utilitat com la precisió de models de classificació obtinguts a partir de les dades microagregades. i avaluats sobre dades de prova originals. Els experiments mostren que l'impacte de l’algoritme de rmicroagregació estàndard en el rendiment d’algoritmes d'aprenentatge automàtic és usualment menor per a una varietat de conjunts de dades avaluats. A més, l’evidència experimental suggereix que la mètrica tradicional de distorsió de les dades seria inapropiada per avaluar la utilitat empírica de dades microagregades. Així també estudiem el problema de preservar la utilitat empírica de les dades a l'ésser anonimitzades. Transformant els registres originaIs de dades en un espai de dades diferent, el nostre enfocament, basat en anàlisi de discriminant lineal, permet que el procés de microagregació k-anònima s'adapti al domini d’aplicació de les dades. Per això, primer, les dades són rotades o projectades en la direcció de màxima discriminació i, segon, escalades en aquesta direcció, penalitzant la distorsió a través del llindar de classificació. Com a resultat, la utilitat de les dades es preserva en termes de la precisió dels models d'aprenentatge automàtic en diversos conjunts de dades. Posteriorment, proposem un mecanisme per reduir el temps d'execució per a la microagregació k-anònima. Això s'aconsegueix simplificant les operacions internes de l'algoritme escollit Mitjançant una extensa experimentació sobre diversos conjunts de dades, vam mostrar que el nou algoritme és bastant més ràpid. Aquesta acceleració s'aconsegueix sense que hi ha pèrdua en la utilitat de les dades. Finalment, en un enfocament més aplicat, es proposa una eina de protecció de privacitat d'individus i organitzacions mitjançant l'anonimització de dades sensibles inclosos en logs de seguretat. Es dissenyen diferents mecanismes d'anonimat per implementar-los en base a la definició d'una política de privacitat, en el context d'un projecte europeu que té per objectiu construir un sistema de seguretat unificat

    INRISCO: INcident monitoRing in Smart COmmunities

    Get PDF
    Major advances in information and communication technologies (ICTs) make citizens to be considered as sensors in motion. Carrying their mobile devices, moving in their connected vehicles or actively participating in social networks, citizens provide a wealth of information that, after properly processing, can support numerous applications for the benefit of the community. In the context of smart communities, the INRISCO [1] proposal intends for (i) the early detection of abnormal situations in cities (i.e., incidents), (ii) the analysis of whether, according to their impact, those incidents are really adverse for the community; and (iii) the automatic actuation by dissemination of appropriate information to citizens and authorities. Thus, INRISCO will identify and report on incidents in traffic (jam, accident) or public infrastructure (e.g., works, street cut), the occurrence of specific events that affect other citizens' life (e.g., demonstrations, concerts), or environmental problems (e.g., pollution, bad weather). It is of particular interest to this proposal the identification of incidents with a social and economic impact, which affects the quality of life of citizens.This work was supported in part by the Spanish Government through the projects INRISCO under Grant TEC2014-54335-C4-1-R, Grant TEC2014-54335-C4-2-R, Grant TEC2014-54335-C4-3-R, and Grant TEC2014-54335-C4-4-R, in part by the MAGOS under Grant TEC2017-84197-C4-1-R, Grant TEC2017-84197-C4-2-R, and Grant TEC2017-84197-C4-3-R, in part by the European Regional Development Fund (ERDF), and in part by the Galician Regional Government under agreement for funding the Atlantic Research Center for Information and Communication Technologies (AtlantTIC)

    Erzeugung Mehrfach Imputierter Synthetischer Datensätze: Theorie und Implementierung

    Get PDF
    The book describes different approaches to generating multiply imputed synthetic datasets to guarantee confidentiality. Each chapter is dedicated to one approach, first describing the general concept followed by a detailed application to a real dataset providing useful guidelines on how to implement the theory in practice.Die Arbeit beschreibt verschiedene Ansätze zur Erstellung mehrfach imputierter synthetischer Datensätze. Diese Datensätze können der interessierten Fachöffentlichkeit zur Verfügung gestellt werden, ohne den Datenschutz zu verletzen. Jedes Kapitel befasst sich mit einem eigenen Ansatz, wobei zunächst das allgemeine Konzept beschrieben wird. Anschließend bietet eine detailierte Anwendung auf einen realen Datensatz hilfreiche Richtlinien, wie sich die beschriebene Theorie in der Praxis anwenden läßt

    Privacy-Aware Risk-Based Access Control Systems

    Get PDF
    Modern organizations collect massive amounts of data, both internally (from their employees and processes) and externally (from customers, suppliers, partners). The increasing availability of these large datasets was made possible thanks to the increasing storage and processing capability. Therefore, from a technical perspective, organizations are now in a position to exploit these diverse datasets to create new data-driven businesses or optimizing existing processes (real-time customization, predictive analytics, etc.). However, this kind of data often contains very sensitive information that, if leaked or misused, can lead to privacy violations. Privacy is becoming increasingly relevant for organization and businesses, due to strong regulatory frameworks (e.g., the EU General Data Protection Regulation GDPR, the Health Insurance Portability and Accountability Act HIPAA) and the increasing awareness of citizens about personal data issues. Privacy breaches and failure to meet privacy requirements can have a tremendous impact on companies (e.g., reputation loss, noncompliance fines, legal actions). Privacy violation threats are not exclusively caused by external actors gaining access due to security gaps. Privacy breaches can also be originated by internal actors, sometimes even by trusted and authorized ones. As a consequence, most organizations prefer to strongly limit (even internally) the sharing and dissemination of data, thereby making most of the information unavailable to decision-makers, and thus preventing the organization from fully exploit the power of these new data sources. In order to unlock this potential, while controlling the privacy risk, it is necessary to develop novel data sharing and access control mechanisms able to support risk-based decision making and weigh the advantages of information against privacy considerations. To achieve this, access control decisions must be based on an (dynamically assessed) estimation of expected cost and benefits compared to the risk, and not (as in traditional access control systems) on a predefined policy that statically defines what accesses are allowed and denied. In Risk-based access control for each access request, the corresponding risk is estimated and if the risk is lower than a given threshold (possibly related to the trustworthiness of the requester), then access is granted or denied. The aim is to be more permissive than in traditional access control systems by allowing for a better exploitation of data. Although existing risk-based access control models provide an important step towards a better management and exploitation of data, they have a number of drawbacks which limit their effectiveness. In particular, most of the existing risk-based systems only support binary access decisions: the outcome is “allowed” or “denied”, whereas in real life we often have exceptions based on additional conditions (e.g., “I cannot provide this information, unless you sign the following non-disclosure agreement.” or “I cannot disclose this data, because they contain personal identifiable information, but I can disclose an anonymized version of the data.”). In other words, the system should be able to propose risk mitigation measures to reduce the risk (e.g., disclose partial or anonymized version of the requested data) instead of denying risky access requests. Alternatively, it should be able to propose appropriate trust enhancement measures (e.g., stronger authentication), and once they are accepted/fulfilled by the requester, more information can be shared. The aim of this thesis is to propose and validate a novel privacy enhancing access control approach offering adaptive and fine-grained access control for sensitive data-sets. This approach enhances access to data, but it also mitigates privacy threats originated by authorized internal actors. More in detail: 1. We demonstrate the relevance and evaluate the impact of authorized actors’ threats. To this aim, we developed a privacy threats identification methodology EPIC (Evaluating Privacy violation rIsk in Cyber security systems) and apply EPIC in a cybersecurity use case where very sensitive information is used. 2. We present the privacy-aware risk-based access control framework that supports access control in dynamic contexts through trust enhancement mechanisms and privacy risk mitigation strategies. This allows us to strike a balance between the privacy risk and the trustworthiness of the data request. If the privacy risk is too large compared to the trust level, then the framework can identify adaptive strategies that can decrease the privacy risk (e.g., by removing/obfuscating part of the data through anonymization) and/or increase the trust level (e.g., by asking for additional obligations to the requester). 3. We show how the privacy-aware risk-based approach can be integrated to existing access control models such as RBAC and ABAC and that it can be realized using a declarative policy language with a number of advantages including usability, flexibility, and scalability. 4. We evaluate our approach using several industrial relevant use cases, elaborated to meet the requirements of the industrial partner (SAP) of this industrial doctorate

    Ethical and Unethical Hacking

    Get PDF
    The goal of this chapter is to provide a conceptual analysis of ethical, comprising history, common usage and the attempt to provide a systematic classification that is both compatible with common usage and normatively adequate. Subsequently, the article identifies a tension between common usage and a normativelyadequate nomenclature. ‘Ethical hackers’ are often identified with hackers that abide to a code of ethics privileging business-friendly values. However, there is no guarantee that respecting such values is always compatible with the all-things-considered morally best act. It is recognised, however, that in terms of assessment, it may be quite difficult to determine who is an ethical hacker in the ‘all things considered’ sense, while society may agree more easily on the determination of who is one in the ‘business-friendly’ limited sense. The article concludes by suggesting a pragmatic best-practice approach for characterising ethical hacking, which reaches beyond business-friendly values and helps in the taking of decisions that are respectful of the hackers’ individual ethics in morally debatable, grey zones
    corecore