6 research outputs found

    The Status of Adoption of Social Media Analytics: Three Cases in South African and German Government Departments

    Get PDF
    Lack of access to technologies and quality data are key challenges for reducing the digital divide and developing digital citizens to support Smart City initiatives. This paper reviews efforts towards Smart Cities and access to smart technology and Open Data in developed economies globally and in South Africa. Reviews of literature and websites were conducted and the Qualitative Content Analysis method was used to analyse the data. The contributions are the commonalities and differences between Smart City initiatives in developed economies and in South Africa. The findings revealed that in developed countries the focus was mainly on e-services, citizen engagement, Intelligent Transport Systems and energy systems. They provided city-wide connectivity and addressed integration and interoperability challenges. The technologies included large IoT sensors and WiFi in-motion networks incorporating internationally accepted standards. Initiatives in South Africa were less mature, mostly in the initial stages and are not addressing other more urgent needs of the country such as water, food, shelter and education. Collaboration with best practice Smart Cities is needed to provide support to current and future initiatives in South Africa and for the development of African digital citizens

    Automatic privacy and utility evaluation of anonymized documents via deep learning

    Get PDF
    Text anonymization methods are evaluated by comparing their outputs with human-based anonymizations through standard information retrieval (IR) metrics. On the one hand, the residual disclosure risk is quantified with the recall metric, which gives the proportion of re-identifying terms successfully detected by the anonymization algorithm. On the other hand, the preserved utility is measured with the precision metric, which accounts the proportion of masked terms that were also annotated by the human experts. Nevertheless, because these evaluation metrics were meant for information retrieval rather than privacy-oriented tasks, they suffer from several drawbacks. First, they assume a unique ground truth, and this does not hold for text anonymization, where several masking choices could be equally valid to prevent re-identification. Second, annotation-based evaluation relies on human judgements, which are inherently subjective and may be prone to errors. Finally, both metrics weight terms uniformly, thereby ignoring the fact that the influence on the disclosure risk or on utility preservation of some terms may be much larger than of others. To overcome these drawbacks, in this thesis we propose two novel methods to evaluate both the disclosure risk and the utility preserved in anonymized texts. Our approach leverages deep learning methods to perform this evaluation automatically, thereby not requiring human annotations. For assessing disclosure risks, we propose using a re-identification attack, which we define as a multi-class classification task built on top of state-of-the art language models. To make it feasible, the attack has been designed to capture the means and computational resources expected to be available at the attacker's end. For utility assessment, we propose a method that measures the information loss incurred during the anonymization process, which relies on a neural masked language modeling. We illustrate the effectiveness of our methods by evaluating the disclosure risk and retained utility of several well-known techniques and tools for text anonymization on a common dataset. Empirical results show significant privacy risks for all of them (including manual anonymization) and consistently proportional utility preservation

    Utility-Preserving Anonymization of Textual Documents

    Get PDF
    Cada dia els 茅ssers humans afegim una gran quantitat de dades a Internet, tals com piulades, opinions, fotos i v铆deos. Les organitzacions que recullen aquestes dades tan diverses n'extreuen informaci贸 per tal de millorar llurs serveis o b茅 per a prop貌sits comercials. Tanmateix, si les dades recollides contenen informaci贸 personal sensible, hom no les pot compartir amb tercers ni les pot publicar sense el consentiment o una protecci贸 adequada dels subjectes de les dades. Els mecanismes de preservaci贸 de la privadesa forneixen maneres de sanejar les dades per tal que no revelin identitats o atributs confidencials. S'ha proposat una gran varietat de mecanismes per anonimitzar bases de dades estructurades amb atributs num猫rics i categ貌rics; en canvi, la protecci贸 autom脿tica de dades textuals no estructurades ha rebut molta menys atenci贸. En general, l'anonimitzaci贸 de dades textuals exigeix, primer, detectar trossos del text que poden revelar informaci贸 sensible i, despr茅s, emmascarar aquests trossos mitjan莽ant supressi贸 o generalitzaci贸. En aquesta tesi fem servir diverses tecnologies per anonimitzar documents textuals. De primer, millorem les t猫cniques existents basades en etiquetatge de seq眉猫ncies. Despr茅s, estenem aquestes t猫cniques per alinear-les millor amb el risc de revelaci贸 i amb les exig猫ncies de privadesa. Finalment, proposem un marc complet basat en models d'immersi贸 de paraules que captura un concepte m茅s ampli de protecci贸 de dades i que forneix una protecci贸 flexible guiada per les exig猫ncies de privadesa. Tamb茅 recorrem a les ontologies per preservar la utilitat del text emmascarat, 茅s a dir, la seva sem脿ntica i la seva llegibilitat. La nostra experimentaci贸 extensa i detallada mostra que els nostres m猫todes superen els m猫todes existents a l'hora de proporcionar anonimitzaci贸 robusta tot preservant raonablement la utilitat del text protegit.Cada d铆a las personas a帽adimos una gran cantidad de datos a Internet, tales como tweets, opiniones, fotos y v铆deos. Las organizaciones que recogen dichos datos los usan para extraer informaci贸n para mejorar sus servicios o para prop贸sitos comerciales. Sin embargo, si los datos recogidos contienen informaci贸n personal sensible, no pueden compartirse ni publicarse sin el consentimiento o una protecci贸n adecuada de los sujetos de los datos. Los mecanismos de protecci贸n de la privacidad proporcionan maneras de sanear los datos de forma que no revelen identidades ni atributos confidenciales. Se ha propuesto una gran variedad de mecanismos para anonimizar bases de datos estructuradas con atributos num茅ricos y categ贸ricos; en cambio, la protecci贸n autom谩tica de datos textuales no estructurados ha recibido mucha menos atenci贸n. En general, la anonimizaci贸n de datos textuales requiere, primero, detectar trozos de texto que puedan revelar informaci贸n sensible, para luego enmascarar dichos trozos mediante supresi贸n o generalizaci贸n. En este trabajo empleamos varias tecnolog铆as para anonimizar documentos textuales. Primero mejoramos las t茅cnicas existentes basadas en etiquetaje de secuencias. Posteriormente las extendmos para alinearlas mejor con la noci贸n de riesgo de revelaci贸n y con los requisitos de privacidad. Finalmente, proponemos un marco completo basado en modelos de inmersi贸n de palabras que captura una noci贸n m谩s amplia de protecci贸n de datos y ofrece protecci贸n flexible guiada por los requisitos de privacidad. Tambi茅n recurrimos a las ontolog铆as para preservar la utilidad del texto enmascarado, es decir, su semantica y legibilidad. Nuestra experimentaci贸n extensa y detallada muestra que nuestros m茅todos superan a los existentes a la hora de proporcionar una anonimizaci贸n m谩s robusta al tiempo que se preserva razonablemente la utilidad del texto protegido.Every day, people post a significant amount of data on the Internet, such as tweets, reviews, photos, and videos. Organizations collecting these types of data use them to extract information in order to improve their services or for commercial purposes. Yet, if the collected data contain sensitive personal information, they cannot be shared with third parties or released publicly without consent or adequate protection of the data subjects. Privacy-preserving mechanisms provide ways to sanitize data so that identities and/or confidential attributes are not disclosed. A great variety of mechanisms have been proposed to anonymize structured databases with numerical and categorical attributes; however, automatically protecting unstructured textual data has received much less attention. In general, textual data anonymization requires, first, to detect pieces of text that may disclose sensitive information and, then, to mask those pieces via suppression or generalization. In this work, we leverage several technologies to anonymize textual documents. We first improve state-of-the-art techniques based on sequence labeling. After that, we extend them to make them more aligned with the notion of privacy risk and the privacy requirements. Finally, we propose a complete framework based on word embedding models that captures a broader notion of data protection and provides flexible protection driven by privacy requirements. We also leverage ontologies to preserve the utility of the masked text, that is, its semantics and readability. Extensive experimental results show that our methods outperform the state of the art by providing more robust anonymization while reasonably preserving the utility of the protected outcome

    Dictionary of privacy, data protection and information security

    Get PDF
    The Dictionary of Privacy, Data Protection and Information Security explains the complex technical terms, legal concepts, privacy management techniques, conceptual matters and vocabulary that inform public debate about privacy. The revolutionary and pervasive influence of digital technology affects numerous disciplines and sectors of society, and concerns about its potential threats to privacy are growing. With over a thousand terms meticulously set out, described and cross-referenced, this Dictionary enables productive discussion by covering the full range of fields accessibly and comprehensively. In the ever-evolving debate surrounding privacy, this Dictionary takes a longer view, transcending the details of today''s problems, technology, and the law to examine the wider principles that underlie privacy discourse. Interdisciplinary in scope, this Dictionary is invaluable to students, scholars and researchers in law, technology and computing, cybersecurity, sociology, public policy and administration, and regulation. It is also a vital reference for diverse practitioners including data scientists, lawyers, policymakers and regulators
    corecore