7,462 research outputs found
An Automated Social Graph De-anonymization Technique
We present a generic and automated approach to re-identifying nodes in
anonymized social networks which enables novel anonymization techniques to be
quickly evaluated. It uses machine learning (decision forests) to matching
pairs of nodes in disparate anonymized sub-graphs. The technique uncovers
artefacts and invariants of any black-box anonymization scheme from a small set
of examples. Despite a high degree of automation, classification succeeds with
significant true positive rates even when small false positive rates are
sought. Our evaluation uses publicly available real world datasets to study the
performance of our approach against real-world anonymization strategies, namely
the schemes used to protect datasets of The Data for Development (D4D)
Challenge. We show that the technique is effective even when only small numbers
of samples are used for training. Further, since it detects weaknesses in the
black-box anonymization scheme it can re-identify nodes in one social network
when trained on another.Comment: 12 page
Defending against Sybil Devices in Crowdsourced Mapping Services
Real-time crowdsourced maps such as Waze provide timely updates on traffic,
congestion, accidents and points of interest. In this paper, we demonstrate how
lack of strong location authentication allows creation of software-based {\em
Sybil devices} that expose crowdsourced map systems to a variety of security
and privacy attacks. Our experiments show that a single Sybil device with
limited resources can cause havoc on Waze, reporting false congestion and
accidents and automatically rerouting user traffic. More importantly, we
describe techniques to generate Sybil devices at scale, creating armies of
virtual vehicles capable of remotely tracking precise movements for large user
populations while avoiding detection. We propose a new approach to defend
against Sybil devices based on {\em co-location edges}, authenticated records
that attest to the one-time physical co-location of a pair of devices. Over
time, co-location edges combine to form large {\em proximity graphs} that
attest to physical interactions between devices, allowing scalable detection of
virtual vehicles. We demonstrate the efficacy of this approach using
large-scale simulations, and discuss how they can be used to dramatically
reduce the impact of attacks against crowdsourced mapping services.Comment: Measure and integratio
Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective
This paper presents a Lisp architecture for a portable NLP system, termed
LAPNLP, for processing clinical notes. LAPNLP integrates multiple standard,
customized and in-house developed NLP tools. Our system facilitates portability
across different institutions and data systems by incorporating an enriched
Common Data Model (CDM) to standardize necessary data elements. It utilizes
UMLS to perform domain adaptation when integrating generic domain NLP tools. It
also features stand-off annotations that are specified by positional reference
to the original document. We built an interval tree based search engine to
efficiently query and retrieve the stand-off annotations by specifying
positional requirements. We also developed a utility to convert an inline
annotation format to stand-off annotations to enable the reuse of clinical text
datasets with inline annotations. We experimented with our system on several
NLP facilitated tasks including computational phenotyping for lymphoma patients
and semantic relation extraction for clinical notes. These experiments
showcased the broader applicability and utility of LAPNLP.Comment: 6 pages, accepted by IEEE BIBM 2018 as regular pape
- …