316 research outputs found

    Understanding and controlling leakage in machine learning

    Get PDF
    Machine learning models are being increasingly adopted in a variety of real-world scenarios. However, the privacy and confidentiality implications introduced in these scenarios are not well understood. Towards better understanding such implications, we focus on scenarios involving interactions between numerous parties prior to, during, and after training relevant models. Central to these interactions is sharing information for a purpose e.g., contributing data samples towards a dataset, returning predictions via an API. This thesis takes a step toward understanding and controlling leakage of private information during such interactions. In the first part of the thesis we investigate leakage of private information in visual data and specifically, photos representative of content shared on social networks. There is a long line of work to tackle leakage of personally identifiable information in social photos, especially using face- and body-level visual cues. However, we argue this presents only a narrow perspective as images reveal a wide spectrum of multimodal private information (e.g., disabilities, name-tags). Consequently, we work towards a Visual Privacy Advisor that aims to holistically identify and mitigate private risks when sharing social photos. In the second part, we address leakage during training of ML models. We observe learning algorithms are being increasingly used to train models on rich decentralized datasets e.g., personal data on numerous mobile devices. In such cases, information in the form of high-dimensional model parameter updates are anonymously aggregated from participating individuals. However, we find that the updates encode sufficient identifiable information and allows them to be linked back to participating individuals. We additionally propose methods to mitigate this leakage while maintaining high utility of the updates. In the third part, we discuss leakage of confidential information during inference time of black-box models. In particular, we find models lend themselves to model functionality stealing attacks: an adversary can interact with the black-box model towards creating a replica `knock-off' model that exhibits similar test-set performances. As such attacks pose a severe threat to the intellectual property of the model owner, we also work towards effective defenses. Our defense strategy by introducing bounded and controlled perturbations to predictions can significantly amplify model stealing attackers' error rates. In summary, this thesis advances understanding of privacy leakage when information is shared in raw visual forms, during training of models, and at inference time when deployed as black-boxes. In each of the cases, we further propose techniques to mitigate leakage of information to enable wide-spread adoption of techniques in real-world scenarios.Modelle für maschinelles Lernen werden zunehmend in einer Vielzahl realer Szenarien eingesetzt. Die in diesen Szenarien vorgestellten Auswirkungen auf Datenschutz und Vertraulichkeit wurden jedoch nicht vollständig untersucht. Um solche Implikationen besser zu verstehen, konzentrieren wir uns auf Szenarien, die Interaktionen zwischen mehreren Parteien vor, während und nach dem Training relevanter Modelle beinhalten. Das Teilen von Informationen für einen Zweck, z. B. das Einbringen von Datenproben in einen Datensatz oder die Rückgabe von Vorhersagen über eine API, ist zentral für diese Interaktionen. Diese Arbeit verhilft zu einem besseren Verständnis und zur Kontrolle des Verlusts privater Informationen während solcher Interaktionen. Im ersten Teil dieser Arbeit untersuchen wir den Verlust privater Informationen bei visuellen Daten und insbesondere bei Fotos, die für Inhalte repräsentativ sind, die in sozialen Netzwerken geteilt werden. Es gibt eine lange Reihe von Arbeiten, die das Problem des Verlustes persönlich identifizierbarer Informationen in sozialen Fotos angehen, insbesondere mithilfe visueller Hinweise auf Gesichts- und Körperebene. Wir argumentieren jedoch, dass dies nur eine enge Perspektive darstellt, da Bilder ein breites Spektrum multimodaler privater Informationen (z. B. Behinderungen, Namensschilder) offenbaren. Aus diesem Grund arbeiten wir auf einen Visual Privacy Advisor hin, der darauf abzielt, private Risiken beim Teilen sozialer Fotos ganzheitlich zu identifizieren und zu minimieren. Im zweiten Teil befassen wir uns mit Datenverlusten während des Trainings von ML-Modellen. Wir beobachten, dass zunehmend Lernalgorithmen verwendet werden, um Modelle auf umfangreichen dezentralen Datensätzen zu trainieren, z. B. persönlichen Daten auf zahlreichen Mobilgeräten. In solchen Fällen werden Informationen von teilnehmenden Personen in Form von hochdimensionalen Modellparameteraktualisierungen anonym verbunden. Wir stellen jedoch fest, dass die Aktualisierungen ausreichend identifizierbare Informationen codieren und es ermöglichen, sie mit teilnehmenden Personen zu verknüpfen. Wir schlagen zudem Methoden vor, um diesen Datenverlust zu verringern und gleichzeitig die hohe Nützlichkeit der Aktualisierungen zu erhalten. Im dritten Teil diskutieren wir den Verlust vertraulicher Informationen während der Inferenzzeit von Black-Box-Modellen. Insbesondere finden wir, dass sich Modelle für die Entwicklung von Angriffen, die auf Funktionalitätsdiebstahl abzielen, eignen: Ein Gegner kann mit dem Black-Box-Modell interagieren, um ein Replikat-Knock-Off-Modell zu erstellen, das ähnliche Test-Set-Leistungen aufweist. Da solche Angriffe eine ernsthafte Bedrohung für das geistige Eigentum des Modellbesitzers darstellen, arbeiten wir auch an einer wirksamen Verteidigung. Unsere Verteidigungsstrategie durch die Einführung begrenzter und kontrollierter Störungen in Vorhersagen kann die Fehlerraten von Modelldiebstahlangriffen erheblich verbessern. Zusammenfassend lässt sich sagen, dass diese Arbeit das Verständnis von Datenschutzverlusten beim Informationsaustausch verbessert, sei es bei rohen visuellen Formen, während des Trainings von Modellen oder während der Inferenzzeit von Black-Box-Modellen. In jedem Fall schlagen wir ferner Techniken zur Verringerung des Informationsverlusts vor, um eine weit verbreitete Anwendung von Techniken in realen Szenarien zu ermöglichen.Max Planck Institute for Informatic

    Court-System Transparency

    Get PDF
    This article applies systems analysis to two ends. First, it identifies simple changes that would make the court system transparent. Second, it projects transparency\u27s consequences. Transparency means that both the patterns across, and details of, case files are revealed to policymakers, litigants, and the public in easily understood forms. Government must make two changes to achieve court system transparency. The first is to remove the existing restrictions on the electronic release of court documents, including the requirements for registration, separate requests for each document, and monetary payment. The second - already being implemented in the federal courts - is to require the use of data-enabled forms. Once these changes are in place, institutions and private parties will process the available data at the parties\u27 own expense. That processing will generate millions of real-time views of court system operation using automatically-updated regression analyses and both textual and graphical data displays. The effect would be a renaissance. Corruption, incompetence, inefficiency, prejudice and favoritism would be exposed and wither. Litigation would be cheap and easy because parties could see all court files in the system and copy the work of others. Policy makers could see the human consequences of the laws they enact and adjust accordingly. Lawyers could predict the outcomes of their cases, making litigation less necessary. Citizens would for the first time be able to derive and see the real rules by which they are governed. Transparency would have a minimal effect on privacy. The data processed are already public record and adequate privacy protections are already provided through sealing orders and redaction requirements. Transparency would generate pressures on judges and court administrators, but the effects of those pressures would be generally positive. Limitations on the public enforcement of private arbitration awards might be necessary to prevent parties from opting out of the transparent system

    Information Extraction in an Optical Character Recognition Context

    Full text link
    In this dissertation, we investigate the effectiveness of information extraction in the presence of Optical Character Recognition (OCR). It is well known that the OCR errors have no effects on general retrieval tasks. This is mainly due to the redundancy of information in textual documents. Our work shows that information extraction task is significantly influenced by OCR errors. Intuitively, this is due to the fact that extraction algorithms rely on a small window of text surrounding the objects to be extracted. We show that extraction methodologies based on the Hidden Markov Models are not robust enough to deal with extraction in this noisy environment. We also show that both precise shallow parsing and fuzzy shallow parsing can be used to increase the recall at the price of a significant drop in the precision. Most of our experimental work deals with the extraction of dates of birth and extraction of postal addresses. Both of these specific extractions are part of general methods of identification of privacy information in textual documents. Privacy information is particularly important when large collections of documents are posted on the Internet

    Methods for the de-identification of electronic health records for genomic research

    Get PDF
    Electronic health records are increasingly being linked to DNA repositories and used as a source of clinical information for genomic research. Privacy legislation in many jurisdictions, and most research ethics boards, require that either personal health information is de-identified or that patient consent or authorization is sought before the data are disclosed for secondary purposes. Here, I discuss how de-identification has been applied in current genomic research projects. Recent metrics and methods that can be used to ensure that the risk of re-identification is low and that disclosures are compliant with privacy legislation and regulations (such as the Health Insurance Portability and Accountability Act Privacy Rule) are reviewed. Although these methods can protect against the known approaches for re-identification, residual risks and specific challenges for genomic research are also discussed

    Satirical News and Political Subversiveness: A Critical Approach to The Daily Show and The Colbert Report

    Get PDF
    Television shows like The Daily Show and The Colbert Report are often venerated for their satirical criticisms of mainstream media and for their pedagogical value as critical resources for political consciousness. The programs are said to provide interrogations of contemporary forms of power while fostering more active, collaborative and politically engaged audiences. This thesis interrogates such claims by introducing a critical reading of the shows. It engages in dialogue with scholars working within a Culturalist approach to media and politics by demonstrating the importance of a Marxist-inspired approach to the study of satire news. Attention is given to the political-economy of satirical programming with a specific focus on its kinship with mainstream news media. Equal consideration is given to the programs\u27 branding strategies, including savvy forms of \u27cool\u27 consumption and the commodification and exploitation of online fan-labor that increasingly complicate the shows\u27 pedagogical value

    Building a payment card skimmer database

    Get PDF
    Amb l'auge de les targetes de crèdit com a part integral de l'economia, els delictes relacionats amb elles ha augmentat corresponentment. Una de les maneres més comunes de robar les dades de targetes de crèdit és a través de skimmers als sortidors de gasolina. L'skimmer consisteix en una simple PCB (circuit imprès) que és insertada dins el sortidor per robar les dades de les targetes dels clients. Les despeses com a causa del frau poden arribar fins als milers per persona. Grups criminals instal·len diversos skimmers a través de comtats i estats dels Estats Units. Quan els skimmers són descoberts eventualment, és pràcticament impossible dur a terme una investigació policial satisfactòriament. Els departaments policials rarament colaboren sobre aquests casos que abasten diversos comtats i estats, el qual elimina qualsevol possibilitat de ser resolts. Skimmer Tracker és una aplicació web que permet a departaments policials publicar els skim- mers que hagin trobat. Compartint l'evidència de diferents casos pretenem connectar-los com a part del mateix cas a través d'anàlisi basat en visió per computador.With the rise of credit cards as an integral part of the economy, crime related to them has risen accordingly. One of the most common ways to steal credit card data is through skimmers in gas-pumps. The skimmer device consists of a simple PCB (printed circuit board), and it is inserted inside the gas-pump to steal consumer's credit cards. Incurred costs due to fraud can go well into the thousands per person. Criminals install multiple skimmers across counties and states in the US. When skimmers are eventually discovered it is practically impossible for police to conduct a successful investigation on them. Police departments rarely collaborate on these sorts of cases that span different counties and states, which eliminates any possibility of them being solved. Skimmer Tracker is a web application that lets law enforcement agencies publish the skimmers they find. With this sharing of evidence we aim to group different skimmers and connect them as part of the same case through computer vision based analysis

    NetStage : Web application for music event comparison and management

    Get PDF
    [Abstract] This project developed an application that allows music event search and comparison, so the user can find events based on advanced criteria and follow events and artists of their interest to keep abreast of their published information. Administration and maintenance of this information will also be covered, allowing the event administrators and artists to update their data and event participation.[Resumo] Neste proxecto desenvolveuse unha aplicación que permite a búsqueda e comparación de eventos musicais, permitindo ao usuario comparacións en función de criterios avanzados, así como o seguimento de eventos e artistas do seu interese para manterse ao tanto da información que se publique e se actualice. Abárcase tamén a administración e mantemento de dita información por parte dos administradores de eventos e artistas, que poderán actualizar os seus datos e participación en eventos.Traballo fin de grao (UDC.FIC). Enxeñaría informática. Curso 2019/202
    • …
    corecore