Search CORE

1,482 research outputs found

Ethical Considerations in Web 2.0 Archives

Author: Baker Antoinette E
Publication venue: SJSU ScholarWorks
Publication date: 15/07/2011
Field of study

In April 2010, the Internet company Twitter announced that it had granted its entire archive of “Tweets” to the Library of Congress. These Tweets are typically generated by public users, who may or may not understand or expect that their submissions will be archived by a government agency. Archives of Web 2.0 material raise new ethical considerations for archivists, who must balance interests in preserving material with privacy interests of users who generated the content. Archivists can address these concerns by requiring corporate donors to fully disclose the nature of the archive to users and by allowing users to opt-out of the archive. Archivists can also restrict access to the archive for a reasonable period of time

SJSU ScholarWorks

Multimodal Network Alignment

Author: Gleich David F.
Nassar Huda
Publication venue
Publication date: 30/03/2017
Field of study

A multimodal network encodes relationships between the same set of nodes in multiple settings, and network alignment is a powerful tool for transferring information and insight between a pair of networks. We propose a method for multimodal network alignment that computes a matrix which indicates the alignment, but produces the result as a low-rank factorization directly. We then propose new methods to compute approximate maximum weight matchings of low-rank matrices to produce an alignment. We evaluate our approach by applying it on synthetic networks and use it to de-anonymize a multimodal transportation network.Comment: 14 pages, 6 figures, Siam Data Mining 201

arXiv.org e-Print Archive

Crossref

Anonymizing datasets with demographics and diagnosis codes in the presence of utility constraints

Author: Gkoulalas-Divanis Aris
Loukides Grigorios
Poulis Giorgos
Skiadopoulos Spiros
Skiadopoulos Spiros
Publication venue
Publication date: 08/11/2016
Field of study

Publishing data about patients that contain both demographics and diagnosis codes is essential to perform large-scale, low-cost medical studies. However, preserving the privacy and utility of such data is challenging, because it requires: (i) guarding against identity disclosure (re-identification) attacks based on both demographics and diagnosis codes, (ii) ensuring that the anonymized data remain useful in intended analysis tasks, and (iii) minimizing the information loss, incurred by anonymization, to preserve the utility of general analysis tasks that are difficult to determine before data publishing. Existing anonymization approaches are not suitable for being used in this setting, because they cannot satisfy all three requirements. Therefore, in this work, we propose a new approach to deal with this problem. We enforce the requirement (i) by applying (k; k^m)-anonymity, a privacy principle that prevents re-identification from attackers who know the demographics of a patient and up to m of their diagnosis codes, where k and m are tunable parameters. To capture the requirement (ii), we propose the concept of utility constraint for both demographics and diagnosis codes. Utility constraints limit the amount of generalization and are specified by data owners (e.g., the healthcare institution that performs anonymization). We also capture requirement (iii), by employing well-established information loss measures for demographics and for diagnosiscodes. To realize our approach, we develop an algorithm that enforces (k; k^m)-anonymity on a dataset containing both demographics and diagnosis codes, in a way that satisfies the specified utility constraints and with minimal information loss, according to the measures. Our experiments with a large dataset containing more than 200; 000 electronic health recordsshow the effectiveness and efficiency of our algorithm

King's Research Portal