38,550 research outputs found

    Assessing NER tools for dialogue data anonymization

    Get PDF
    As the number of organizations processing sensitive data grows, so does the need for businesses to protect and ensure the privacy of their customers. However, the prevailing methods for protecting sensitive data often involve manual or semi-automatic procedures, which can be resource-intensive and error-prone. This dissertation addresses data anonymization by focusing on Named Entity Recognition (NER) models. Particularly, we investigate and compare various NER models for the Portuguese language to automatically and effectively anonymize unstructured data. The models SpaCy, STRING, WikiNEuRal and RoBERTta are used in the machine learning approach with the goal of identifying classes such as Person, Location, and Organization. On the other hand, the rule-based approach seeks to identify classifications such as NIF, Email, Car Plate and even Postal Code. Additionally, it was created a Flask API tool capable of processing unstructured data and anonymizing it, more specifically, given a string that simulates a message, automatically anonymize the message content that might be considered as sensitive. This tool combines many techniques for identifying and extracting mentioned entities for the Portuguese language, based on rule models and machine learning. The combination of both rule-based and machine learning models in the same tool was crucial to enable the ability to encompass more sensitive classes for anonymization. The results calculated for the extraction of entities from the tool built in this work encompasses the results for the three classes calculated with the SpaCy model, with the addition of the results calculated for the rule-models created.Com o aumento do número de organizações que processam dados sensíveis, aumenta também a necessidade de as empresas assegurarem a privacidade dos seus clientes. No entanto, os métodos de segurança e proteção de dados sensíveis envolvem, frequentemente, procedimentos manuais ou semi-automáticos, os quais consomem muitos recursos e são propensos a erros. Esta tese aborda anonimização de dados, centrando-se em modelos de Reconhecimento de Entidades Mencionadas. Em particular, investigamos e comparamos vários modelos de Reconhecimento de Entidades Mencionadas para a língua portuguesa para anonimizar automaticamente dados não estruturados. Na abordagem de aprendizagem automática foram utilizados os modelos do SpaCy, STRING, WikiNEuRal e RoBERTta com o intuito de identificar classes como Pessoa, Localização e Organização. Contudo, a abordagem baseada em regras procura identificar classes como NIF, Email, Matrícula de carro e até mesmo Código Postal. Consequentemente, foi construída uma ferramenta em Flask, capaz de processar dados não estruturados e anonimizá-los, mais especificamente, capaz de, dada uma string (que simule uma mensagem), anonimizar o seu conteúdo sensível automaticamente. Esta ferramenta combina diferentes técnicas para a Identificação e Extração de Entidades Mencionadas para a língua portuguesa, baseando-se em modelos de regras e de aprendizagem automática. A junção de ambos os modelos de regras e aprendizagem automática na mesma ferramenta foi essencial para conseguirmos abranger mais classes sensíveis para anonimização, sendo que os resultados calculados para a extração de entidades da ferramenta contruída neste trabalho, engloba os resultados para as três classes calculadas com o modelo SpaCy, com a adição dos modelos de regras criados

    Kansas: Baseline Report - State Level Field Network Study of the Implementation of the Affordable Care Act

    Get PDF
    This report is part of a series of 21 state and regional studies examining the rollout of the ACA. The national network ---- with 36 states and 61 researchers ---- is led by the Rockefeller Institute of Government, the public policy research arm of the State University of New York, the Brookings Institution, and the Fels Institute of Government at the University of Pennsylvania.The Kansas report highlights the diverse approaches to implementation taken by elected officials including Insurance Commissioner Sandy Praeger, Governor Sam Brownback, and the Kansas Legislature. The political landscape changed this year when Commissioner Praeger decided not to seek re-election. Her replacement, Commissioner-Elect Ken Selzer, has expressed opposition to the ACA

    Lessons in Funder Collaboration: What the Packard Foundation Has Learned about Working with Other Funders

    Get PDF
    As the Foundation approached its 50th anniversary this year, it asked The Bridgespan Group to assist in taking stock of what Packard can learn from its many collaborations. When it comes to the structure of collaborations, Bridgespan has identified five main models: knowledge exchange, coordination of funding, coinvesting in an existing entity, creating a new entity, and funding the funder. As with all taxonomies, these five categories are meant to serve as signposts along a continuum. Each collaboration differs in a variety of ways, whether it is the flow of funds, decision making, expectations and roles of funding partners, or legal structure. Bridgespan focused on Packard's collaborations that require alignment and more intensive coordination by program staff. Six case studies provide an in-depth look inside examples of all four types of collaboration that venture beyond exchanging knowledge

    On the anonymity risk of time-varying user profiles.

    Get PDF
    Websites and applications use personalisation services to profile their users, collect their patterns and activities and eventually use this data to provide tailored suggestions. User preferences and social interactions are therefore aggregated and analysed. Every time a user publishes a new post or creates a link with another entity, either another user, or some online resource, new information is added to the user profile. Exposing private data does not only reveal information about single users’ preferences, increasing their privacy risk, but can expose more about their network that single actors intended. This mechanism is self-evident in social networks where users receive suggestions based on their friends’ activities. We propose an information-theoretic approach to measure the differential update of the anonymity risk of time-varying user profiles. This expresses how privacy is affected when new content is posted and how much third-party services get to know about the users when a new activity is shared. We use actual Facebook data to show how our model can be applied to a real-world scenario.Peer ReviewedPostprint (published version
    corecore