123 research outputs found
Boosting the Accuracy of Differentially-Private Histograms Through Consistency
We show that it is possible to significantly improve the accuracy of a
general class of histogram queries while satisfying differential privacy. Our
approach carefully chooses a set of queries to evaluate, and then exploits
consistency constraints that should hold over the noisy output. In a
post-processing phase, we compute the consistent input most likely to have
produced the noisy output. The final output is differentially-private and
consistent, but in addition, it is often much more accurate. We show, both
theoretically and experimentally, that these techniques can be used for
estimating the degree sequence of a graph very precisely, and for computing a
histogram that can support arbitrary range queries accurately.Comment: 15 pages, 7 figures, minor revisions to previous versio
Differential Privacy for Edge Weights in Social Networks
Social networks can be analyzed to discover important social issues; however, it will cause privacy disclosure in the process. The edge weights play an important role in social graphs, which are associated with sensitive information (e.g., the price of commercial trade). In the paper, we propose the MB-CI (Merging Barrels and Consistency Inference) strategy to protect weighted social graphs. By viewing the edge-weight sequence as an unattributed histogram, differential privacy for edge weights can be implemented based on the histogram. Considering that some edges have the same weight in a social network, we merge the barrels with the same count into one group to reduce the noise required. Moreover, k-indistinguishability between groups is proposed to fulfill differential privacy not to be violated, because simple merging operation may disclose some information by the magnitude of noise itself. For keeping most of the shortest paths unchanged, we do consistency inference according to original order of the sequence as an important postprocessing step. Experimental results show that the proposed approach effectively improved the accuracy and utility of the released data
Using Deep Learning and Google Street View to Estimate the Demographic Makeup of the US
The United States spends more than $1B each year on initiatives such as the
American Community Survey (ACS), a labor-intensive door-to-door study that
measures statistics relating to race, gender, education, occupation,
unemployment, and other demographic factors. Although a comprehensive source of
data, the lag between demographic changes and their appearance in the ACS can
exceed half a decade. As digital imagery becomes ubiquitous and machine vision
techniques improve, automated data analysis may provide a cheaper and faster
alternative. Here, we present a method that determines socioeconomic trends
from 50 million images of street scenes, gathered in 200 American cities by
Google Street View cars. Using deep learning-based computer vision techniques,
we determined the make, model, and year of all motor vehicles encountered in
particular neighborhoods. Data from this census of motor vehicles, which
enumerated 22M automobiles in total (8% of all automobiles in the US), was used
to accurately estimate income, race, education, and voting patterns, with
single-precinct resolution. (The average US precinct contains approximately
1000 people.) The resulting associations are surprisingly simple and powerful.
For instance, if the number of sedans encountered during a 15-minute drive
through a city is higher than the number of pickup trucks, the city is likely
to vote for a Democrat during the next Presidential election (88% chance);
otherwise, it is likely to vote Republican (82%). Our results suggest that
automated systems for monitoring demographic trends may effectively complement
labor-intensive approaches, with the potential to detect trends with fine
spatial resolution, in close to real time.Comment: 41 pages including supplementary material. Under review at PNA
Information Theoretic-Based Privacy Protection on Data Publishing and Biometric Authentication
Ph.DDOCTOR OF PHILOSOPH
Analysis of Patient Information: An Empirical Modeling Approach
With rising costs and increasing complexities, many hospitals seek to better understand the intricate details of their operations. Increasingly, these organizations have a strong desire to accurately predict the resources required to effectively treat their patient load. This research investigates patient length-of-stay in a hospital neurological unit using an empirical modeling approach. Factors significantly affecting patient length of stay were identified and used to construct a regression model. The predictive model provides hospital decision makers with a compact tool to input what-if scenarios and predict future patient treatment lengths, thus, allowing the hospital to properly allocate resources
- …