30 research outputs found

    Privacy-Aware Recommender Systems Challenge on Twitter's Home Timeline

    Full text link
    Recommender systems constitute the core engine of most social network platforms nowadays, aiming to maximize user satisfaction along with other key business objectives. Twitter is no exception. Despite the fact that Twitter data has been extensively used to understand socioeconomic and political phenomena and user behaviour, the implicit feedback provided by users on Tweets through their engagements on the Home Timeline has only been explored to a limited extent. At the same time, there is a lack of large-scale public social network datasets that would enable the scientific community to both benchmark and build more powerful and comprehensive models that tailor content to user interests. By releasing an original dataset of 160 million Tweets along with engagement information, Twitter aims to address exactly that. During this release, special attention is drawn on maintaining compliance with existing privacy laws. Apart from user privacy, this paper touches on the key challenges faced by researchers and professionals striving to predict user engagements. It further describes the key aspects of the RecSys 2020 Challenge that was organized by ACM RecSys in partnership with Twitter using this dataset.Comment: 16 pages, 2 table

    k-ANONYMITY: A MODEL FOR PROTECTING PRIVACY

    Full text link

    Anonymity-preserving location data publishing

    Get PDF
    Advances in wireless communication and positioning technology have made possible the identification of a user\u27s location and hence collect large volumes of personal location data. While such data are useful to many organizations, making them publicly accessible is generally prohibited because location data may imply sensitive private information. This thesis investigates the challenges inherent in publishing location data while preserving location privacy of data subjects. Since location data itself may lead to subject re-identification, simply removing user identity from location data is not sufficient for anonymity preservation, and other measures must be employed. We provide a literature survey and discuss limitations of related work on this problem. We then propose a novel location depersonalization technique that produces efficient depersonalization of large volumes of location data. The proposed technique is evaluated using simulation. Our study shows that it is possible to guarantee a desired level of anonymity protection while allowing accurate location data to be published

    Development and evaluation of an open source software tool for deidentification of pathology reports

    Get PDF
    BACKGROUND: Electronic medical records, including pathology reports, are often used for research purposes. Currently, there are few programs freely available to remove identifiers while leaving the remainder of the pathology report text intact. Our goal was to produce an open source, Health Insurance Portability and Accountability Act (HIPAA) compliant, deidentification tool tailored for pathology reports. We designed a three-step process for removing potential identifiers. The first step is to look for identifiers known to be associated with the patient, such as name, medical record number, pathology accession number, etc. Next, a series of pattern matches look for predictable patterns likely to represent identifying data; such as dates, accession numbers and addresses as well as patient, institution and physician names. Finally, individual words are compared with a database of proper names and geographic locations. Pathology reports from three institutions were used to design and test the algorithms. The software was improved iteratively on training sets until it exhibited good performance. 1800 new pathology reports were then processed. Each report was reviewed manually before and after deidentification to catalog all identifiers and note those that were not removed. RESULTS: 1254 (69.7 %) of 1800 pathology reports contained identifiers in the body of the report. 3439 (98.3%) of 3499 unique identifiers in the test set were removed. Only 19 HIPAA-specified identifiers (mainly consult accession numbers and misspelled names) were missed. Of 41 non-HIPAA identifiers missed, the majority were partial institutional addresses and ages. Outside consultation case reports typically contain numerous identifiers and were the most challenging to deidentify comprehensively. There was variation in performance among reports from the three institutions, highlighting the need for site-specific customization, which is easily accomplished with our tool. CONCLUSION: We have demonstrated that it is possible to create an open-source deidentification program which performs well on free-text pathology reports

    Indivo: a personally controlled health record for health information exchange and communication

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Personally controlled health records (PCHRs), a subset of personal health records (PHRs), enable a patient to assemble, maintain and manage a secure copy of his or her medical data. Indivo (formerly PING) is an open source, open standards PCHR with an open application programming interface (API).</p> <p>Results</p> <p>We describe how the PCHR platform can provide standard building blocks for networked PHR applications. Indivo allows the ready integration of diverse sources of medical data under a patient's control through the use of standards-based communication protocols and APIs for connecting PCHRs to existing and future health information systems.</p> <p>Conclusion</p> <p>The strict and transparent personal control model is designed to encourage widespread participation by patients, healthcare providers and institutions, thus creating the ecosystem for development of innovative, consumer-focused healthcare applications.</p

    z-anonymity: Zero-Delay Anonymization for Data Streams

    Get PDF
    With the advent of big data and the birth of the data markets that sell personal information, individuals' privacy is of utmost importance. The classical response is anonymization, i.e., sanitizing the information that can directly or indirectly allow users' re-identification. The most popular solution in the literature is the k-anonymity. However, it is hard to achieve k-anonymity on a continuous stream of data, as well as when the number of dimensions becomes high.In this paper, we propose a novel anonymization property called z-anonymity. Differently from k-anonymity, it can be achieved with zero-delay on data streams and it is well suited for high dimensional data. The idea at the base of z-anonymity is to release an attribute (an atomic information) about a user only if at least z - 1 other users have presented the same attribute in a past time window. z-anonymity is weaker than k-anonymity since it does not work on the combinations of attributes, but treats them individually. In this paper, we present a probabilistic framework to map the z-anonymity into the k-anonymity property. Our results show that a proper choice of the z-anonymity parameters allows the data curator to likely obtain a k-anonymized dataset, with a precisely measurable probability. We also evaluate a real use case, in which we consider the website visits of a population of users and show that z-anonymity can work in practice for obtaining the k-anonymity too

    A Clustering K

    Get PDF
    Wearable technology is one of the greatest applications of the Internet of Things. The popularity of wearable devices has led to a massive scale of personal (user-specific) data. Generally, data holders (manufacturers) of wearable devices are willing to share these data with others to get benefits. However, significant privacy concerns would arise when sharing the data with the third party in an improper manner. In this paper, we first propose a specific threat model about the data sharing process of wearable devices’ data. Then we propose a K-anonymity method based on clustering to preserve privacy of wearable IoT devices’ data and guarantee the usability of the collected data. Experiment results demonstrate the effectiveness of the proposed method

    Looting Hoards of Gold and Poaching Spotted Owls: Data Confidentiality Among Archaeologists & Zoologists

    Get PDF
    Researchers in the social and health sciences are used to dealing with confidential data, and repositories in these areas have developed mechanisms to prevent unethical or illegal disclosure of this data. However, other scientific communities also collect data whose disclosure may cause harm to communities, cultures, or the environment. This paper presents results from 62 interviews and observations with archaeologists and zoologists. It focuses on how researchers’ perceptions of potential harm influence attitudes about data confidentiality, and how these, in turn, influence opinions about who should be responsible for managing access to data. This is particularly problematic in archaeology when harm is not to a living individual but is targeted at a community or culture that may or may not have living representatives, and in zoology when an environment or a species may be at risk. We find that while both archaeologists and zoologists view location information as highly important and valuable in facilitating use and reuse of data, they also acknowledge that location should at times be considered confidential information since it can be used to facilitate the destruction of cultural property through looting or decimation of endangered species through poaching. While researchers in both disciplines understand the potential dangers of allowing disclosure of this information, they disagree about who should take responsibility for access decisions and conditions.The DIPIR Project was made possible by a National Leadership Grant from the Institute for Museum and Library Services, LG-06-10-0140-10, “Dissemination Information Packages for Information Reuse.”Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/115883/1/Frank_etal_ASIST2015_Looting_Hoards_of_Gold_postprint.pdfDescription of Frank_etal_ASIST2015_Looting_Hoards_of_Gold_postprint.pdf : Conference pape
    corecore