94 research outputs found

    Privacy leakage analysis of text representations for Natural Language Processing

    Get PDF
    Le tecnologie collegate al natural language processing (NLP) offrono potenti strumenti per l'analisi testuale, ad esempio suggeritori automatici per aiutare gli utenti ad essere più efficienti o classificatori che categorizzano i documenti in base al loro contenuto. Questi strumenti sono solitamente alimentati da modelli di machine learning (ML) addestrati su dati testuali come e-mail, chat o cartelle cliniche che spesso possono contenere dati sensibili o informazioni di identificazione personale. Per le aziende che operano in questo settore è quindi importante valutare il rischio di esposizione dei dati dei clienti e di conseguenti potenziali violazioni della privacy. Questa valutazione è necessaria anche per conformarsi in modo proattivo alle leggi sulla protezione dei dati, come il regolamento generale sulla protezione dei dati (GDPR), che impone il rispetto della privacy e la protezione contro le violazioni dei dati. Questa tesi analizza alcune delle principali minacce alla privacy, emerse in recenti lavori di ricerca, derivanti dall'uso di modelli ML in ambito NLP. Particolare attenzione viene posta sui modelli di rappresentazione del testo, che vengono utilizzati per convertire i testi in vettori numerici. L'obiettivo è valutare se le informazioni sensibili possono essere dedotte semplicemente accedendo alle rappresentazioni vettoriali dei testi (embedding). A questo scopo, passiamo prima in rassegna diversi modelli di rappresentazione del testo, dai quelli classici a quelli più recenti basati sul deep learning. Successivamente, implementiamo un attacco di inversione recentemente proposto e applicato contro le rappresentazioni prodotte dai vari modelli, al fine di analizzare quale tipo di informazione può essere trapelata e in quali condizioni è possibile il recupero. I risultati empirici mostrano che i vettori che codificano i testi possono rivelare una quantità sorprendente di informazioni, compromettendo potenzialmente la privacy degli utenti. Ad esempio, nomi propri possono essere recuperati da vettori prodotti anche dai più recenti modelli all'avanguardia basati sul deep learning.Natural language processing (NLP) offers powerful tools, such as text suggestions to help users be more efficient or classifiers that categorize documents by their content. These tools are usually powered by machine learning (ML) models that are trained using textual data, such as emails, chats, or medical records, which frequently contain sensitive data or personally identifiable information. Thus, it is important for companies working in this field to assess the risk of customer data leakage and potential privacy breaches. This assessment is also required to proactively comply with data protection laws, such as the General Data Protection Regulation (GDPR), which enforces the need for privacy and demands protection against data breaches. This thesis analyzes some of the major privacy threats that have emerged in recent research work from using ML models in the NLP domain. Particular attention is placed on text representation models, which convert texts into numerical vectors. The objective is to assess whether sensitive information can be inferred simply by accessing vector representations (embeddings) of texts. For this purpose, we first review different text representation approaches, ranging from classical models to more recent ones based on deep learning. We then implement a recently proposed inversion attack and test it against the representation produced by the various models to analyze what type of information can be leaked and under which conditions recovery is possible. Empirical results show that vectors that encode texts can reveal an astonishing amount of sensitive information, potentially compromising user privacy. For example, proper names can be recovered from vectors produced even by the most recent state-of-the-art deep learning models

    Making broadband access networks transparent to researchers, developers, and users

    Get PDF
    Broadband networks are used by hundreds of millions of users to connect to the Internet today. However, most ISPs are hesitant to reveal details about their network deployments,and as a result the characteristics of broadband networks are often not known to users,developers, and researchers. In this thesis, we make progress towards mitigating this lack of transparency in broadband access networks in two ways. First, using novel measurement tools we performed the first large-scale study of thecharacteristics of broadband networks. We found that broadband networks have very different characteristics than academic networks. We also developed Glasnost, a system that enables users to test their Internet access links for traffic differentiation. Glasnost has been used by more than 350,000 users worldwide and allowed us to study ISPs' traffic management practices. We found that ISPs increasingly throttle or even block traffic from popular applications such as BitTorrent. Second, we developed two new approaches to enable realistic evaluation of networked systems in broadband networks. We developed Monarch, a tool that enables researchers to study and compare the performance of new and existing transport protocols at large scale in broadband environments. Furthermore, we designed SatelliteLab, a novel testbed that can easily add arbitrary end nodes, including broadband nodes and even smartphones, to existing testbeds like PlanetLab.Breitbandanschlüsse werden heute von hunderten Millionen Nutzern als Internetzugang verwendet. Jedoch geben die meisten ISPs nur ungern über Details ihrer Netze Auskunft und infolgedessen sind Nutzern, Anwendungsentwicklern und Forschern oft deren Eigenheiten nicht bekannt. Ziel dieser Dissertation ist es daher Breitbandnetze transparenter zu machen. Mit Hilfe neuartiger Messwerkzeuge konnte ich die erste groß angelegte Studie über die Besonderheiten von Breitbandnetzen durchführen. Dabei stellte sich heraus, dass Breitbandnetze und Forschungsnetze sehr unterschiedlich sind. Mit Glasnost habe ich ein System entwickelt, das mehr als 350.000 Nutzern weltweit ermöglichte ihren Internetanschluss auf den Einsatz von Verkehrsmanagement zu testen. Ich konnte dabei zeigen, dass ISPs zunehmend BitTorrent Verkehr drosseln oder gar blockieren. Meine Studien zeigten dar überhinaus, dass existierende Verfahren zum Testen von Internetsystemen nicht die typischen Eigenschaften von Breitbandnetzen berücksichtigen. Ich ging dieses Problem auf zwei Arten an: Zum einen entwickelte ich Monarch, ein Werkzeug mit dem das Verhalten von Transport-Protokollen über eine große Anzahl von Breitbandanschlüssen untersucht und verglichen werden kann. Zum anderen habe ich SatelliteLab entworfen, eine neuartige Testumgebung, die, anders als zuvor, beliebige Internetknoten, einschließlich Breitbandknoten und sogar Handys, in bestehende Testumgebungen wie PlanetLab einbinden kann

    Proceedings of the 2021 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory

    Get PDF
    2021, the annual joint workshop of the Fraunhofer IOSB and KIT IES was hosted at the IOSB in Karlsruhe. For a week from the 2nd to the 6th July the doctoral students extensive reports on the status of their research. The results and ideas presented at the workshop are collected in this book in the form of detailed technical reports

    Proceedings of the 2021 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory

    Get PDF
    2021, the annual joint workshop of the Fraunhofer IOSB and KIT IES was hosted at the IOSB in Karlsruhe. For a week from the 2nd to the 6th July the doctoral students extensive reports on the status of their research. The results and ideas presented at the workshop are collected in this book in the form of detailed technical reports

    Security in Distributed, Grid, Mobile, and Pervasive Computing

    Get PDF
    This book addresses the increasing demand to guarantee privacy, integrity, and availability of resources in networks and distributed systems. It first reviews security issues and challenges in content distribution networks, describes key agreement protocols based on the Diffie-Hellman key exchange and key management protocols for complex distributed systems like the Internet, and discusses securing design patterns for distributed systems. The next section focuses on security in mobile computing and wireless networks. After a section on grid computing security, the book presents an overview of security solutions for pervasive healthcare systems and surveys wireless sensor network security

    Coordinated Transit Response Planning and Operations Support Tools for Mitigating Impacts of All-Hazard Emergency Events

    Get PDF
    This report summarizes current computer simulation capabilities and the availability of near-real-time data sources allowing for a novel approach of analyzing and determining optimized responses during disruptions of complex multi-agency transit system. The authors integrated a number of technologies and data sources to detect disruptive transit system performance issues, analyze the impact on overall system-wide performance, and statistically apply the likely traveler choices and responses. The analysis of unaffected transit resources and the provision of temporary resources are then analyzed and optimized to minimize overall impact of the initiating event

    Future Transportation

    Get PDF
    Greenhouse gas (GHG) emissions associated with transportation activities account for approximately 20 percent of all carbon dioxide (co2) emissions globally, making the transportation sector a major contributor to the current global warming. This book focuses on the latest advances in technologies aiming at the sustainable future transportation of people and goods. A reduction in burning fossil fuel and technological transitions are the main approaches toward sustainable future transportation. Particular attention is given to automobile technological transitions, bike sharing systems, supply chain digitalization, and transport performance monitoring and optimization, among others

    LIPIcs, Volume 277, GIScience 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 277, GIScience 2023, Complete Volum

    Effective techniques for detecting and locating traffic differentiation in the internet

    Get PDF
    Orientador: Elias P. Duarte Jr.Coorientador: Luis C. E. BonaTese (doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa : Curitiba, 24/09/2019Inclui referências: p. 115-126Área de concentração: Ciência da ComputaçãoResumo: A Neutralidade da Rede torna-se cada vez mais relevante conforme se intensifica o debate global e diversos governos implementam regulações. Este princípio diz que todo tráfego deve ser processado sem diferenciação, independentemente da origem, destino e/ou conteúdo. Práticas de diferenciação de tráfego (DT) devem ser transparentes, independentemente de regulações, pois afetam significativamente usuários finais. Assim, é essencial monitorar DT na Internet. Várias soluções já foram propostas para detectar DT. Essas soluções baseiam-se em medições de rede e inferência estatística. Porém, existem desafios em aberto. Esta tese tem três objetivos principais: (i) consolidar o estado da arte referente ao problema de detectar DT; (ii) investigar a DT em contextos ainda não explorados, especificamente a Internet das Coisas (IoT); e (iii) propor novas soluções para detecção de DT que solucionem alguns dos desafios em aberto, em particular localizar a fonte de DT. Primeiramente descrevemos o atual estado da arte, incluindo várias soluções de detecção de DT. Também propomos uma taxonomia para os diferentes tipos de DT e de detecção, e identificamos desafios em aberto. Em seguida, avaliamos o impacto da DT na IoT, simulando DT de diferentes padrões de tráfego IoT. Resultados mostram que mesmo uma priorização pequena pode ter um impacto significativo no desempenho de dispositivos de IoT. Propomos então uma solução para detectar DT na Internet, que baseia-se em uma nova estratégia que combina diversas métricas para detectar tipos diferente de DT. Resultados de simulação mostram que esta estratégia é capaz de detectar DT em diversas situações. Em seguida, propomos um modelo geral para monitoramento contínuo de DT na Internet, que se propõe a unificar as soluções atuais e futuras de detecção de DT, ao mesmo tempo que tira proveito de tecnologias atuais e emergentes. Neste contexto, uma nova solução para identificar a fonte de DT na Internet é proposta. O objetivo desta proposta é tanto viabilizar a implementação do nosso modelo geral quanto solucionar o problema de localizar DT. A proposta tira proveito de propriedades de roteamento da Internet para identificar em qual Sistema Autônomo (AS) DT acontece. Medições de vários pontos de vista são combinadas, e a fonte de DT é inferida com base nos caminhos em nível de AS entre os pontos de medição. Para avaliar esta proposta, primeiramente executamos experimentos para confirmar que rotas na Internet realmente apresentam as propriedades requeridas. Diversas simulações foram então executadas para avaliar a eficiência da proposta de localização de DT. Resultados mostram que em diversas situações, efetuar medições a partir de poucos nodos no núcleo da Internet obtém resultados similares a efetuar medições a partir de muitos nodos na borda. Palavras-chave: Neutralidade da Rede, Diferenciação de Tráfego, Medição de Rede.Abstract: Network Neutrality is becoming increasingly important as the global debate intensifies and governments worldwide implement and withdraw regulations. According to this principle, all traffic must be processed without differentiation, regardless of origin, destination and/or content. Traffic Differentiation (TD) practices should be transparent, regardless of regulations, since they can significantly affect end-users. It is thus essential to monitor TD in the Internet. Several solutions have been proposed to detect TD. These solutions are based on network measurements and statistical inference. However, there are still open challenges. This thesis has three main objectives: (i) to consolidate the state of the art regarding the problem of detecting TD; (ii) to investigate TD on contexts not yet explored, in particular the Internet of Things (IoT); and (iii) to propose new solutions regarding TD detection that address open challenges, in particular locating the source of TD. We first describe the current state of the art, including a description of multiple solutions for detecting TD. We also propose a taxonomy for the different types of TD and the different types of detection, and identify open challenges. Then, we evaluate the impact of TD on IoT, by simulating TD on different IoT traffic patterns. Results show that even a small prioritization may have a significant impact on the performance of IoT devices. Next, we propose a solution for detecting TD in the Internet. This solution relies on a new strategy of combining several metrics to detect different types of TD. Simulation results show that this strategy is capable of detecting TD under several conditions. We then propose a general model for continuously monitoring TD on the Internet, which aims at unifying current and future TD detection solutions, while taking advantage of current and emerging technologies. In this context, a new solution for locating the source of TD in the Internet is proposed. The goal of this proposal is to both enable the implementation of our general model and address the problem of locating TD. The proposal takes advantage of properties of Internet peering to identify in which Autonomous System (AS) TD occurs. Probes from multiple vantage points are combined, and the source of TD is inferred based on the AS-level routes between the measurement points. To evaluate this proposal, we first ran several experiments to confirm that indeed Internet routes do present the required properties. Then, several simulations were performed to assess the efficiency of the proposal for locating TD. The results show that for several different scenarios issuing probes from a few end-hosts in core Internet ASes achieves similar results than from numerous end-hosts on the edge. Keywords: Network Neutrality, Traffic Differentiation, Network Measurement

    Combating Attacks and Abuse in Large Online Communities

    Get PDF
    Internet users today are connected more widely and ubiquitously than ever before. As a result, various online communities are formed, ranging from online social networks (Facebook, Twitter), to mobile communities (Foursquare, Waze), to content/interests based networks (Wikipedia, Yelp, Quora). While users are benefiting from the ease of access to information and social interactions, there is a growing concern for users' security and privacy against various attacks such as spam, phishing, malware infection and identity theft. Combating attacks and abuse in online communities is challenging. First, today’s online communities are increasingly dependent on users and user-generated content. Securing online systems demands a deep understanding of the complex and often unpredictable human behaviors. Second, online communities can easily have millions or even billions of users, which requires the corresponding security mechanisms to be highly scalable. Finally, cybercriminals are constantly evolving to launch new types of attacks. This further demands high robustness of security defenses. In this thesis, we take concrete steps towards measuring, understanding, and defending against attacks and abuse in online communities. We begin with a series of empirical measurements to understand user behaviors in different online services and the uniquesecurity and privacy challenges that users are facing with. This effort covers a broad set of popular online services including social networks for question and answering (Quora), anonymous social networks (Whisper), and crowdsourced mobile communities (Waze). Despite the differences of specific online communities, our study provides a first look at their user activity patterns based on empirical data, and reveals the need for reliable mechanisms to curate user content, protect privacy, and defend against emerging attacks. Next, we turn our attention to attacks targeting online communities, with focus on spam campaigns. While traditional spam is mostly generated by automated software, attackers today start to introduce "human intelligence" to implement attacks. This is maliciouscrowdsourcing (or crowdturfing) where a large group of real-users are organized to carry out malicious campaigns, such as writing fake reviews or spreading rumors on social media. Using collective human efforts, attackers can easily bypass many existing defenses (e.g.,CAPTCHA). To understand the ecosystem of crowdturfing, we first use measurements to examine their detailed campaign organization, workers and revenue. Based on insights from empirical data, we develop effective machine learning classifiers to detect crowdturfingactivities. In the meantime, considering the adversarial nature of crowdturfing, we also build practical adversarial models to simulate how attackers can evade or disrupt machine learning based defenses. To aid in this effort, we next explore using user behavior models to detect a wider range of attacks. Instead of making assumptions about attacker behavior, our idea is to model normal user behaviors and capture (malicious) behaviors that are deviated from norm. In this way, we can detect previously unknown attacks. Our behavior model is based on detailed clickstream data, which are sequences of click events generated by users when using the service. We build a similarity graph where each user is a node and the edges are weightedby clickstream similarity. By partitioning this graph, we obtain "clusters" of users with similar behaviors. We then use a small set of known good users to "color" these clusters to differentiate the malicious ones. This technique has been adopted by real-world social networks (Renren and LinkedIn), and already detected unexpected attacks. Finally, we extend clickstream model to understanding more-grained behaviors of attackers (and real users), and tracking how user behavior changes over time. In summary, this thesis illustrates a data-driven approach to understanding and defending against attacks and abuse in online communities. Our measurements have revealed new insights about how attackers are evolving to bypass existing security defenses today. Inaddition, our data-driven systems provide new solutions for online services to gain a deep understanding of their users, and defend them from emerging attacks and abuse
    corecore