7,971 research outputs found
Exploring Privacy Preservation in Outsourced K-Nearest Neighbors with Multiple Data Owners
The k-nearest neighbors (k-NN) algorithm is a popular and effective
classification algorithm. Due to its large storage and computational
requirements, it is suitable for cloud outsourcing. However, k-NN is often run
on sensitive data such as medical records, user images, or personal
information. It is important to protect the privacy of data in an outsourced
k-NN system.
Prior works have all assumed the data owners (who submit data to the
outsourced k-NN system) are a single trusted party. However, we observe that in
many practical scenarios, there may be multiple mutually distrusting data
owners. In this work, we present the first framing and exploration of privacy
preservation in an outsourced k-NN system with multiple data owners. We
consider the various threat models introduced by this modification. We discover
that under a particularly practical threat model that covers numerous
scenarios, there exists a set of adaptive attacks that breach the data privacy
of any exact k-NN system. The vulnerability is a result of the mathematical
properties of k-NN and its output. Thus, we propose a privacy-preserving
alternative system supporting kernel density estimation using a Gaussian
kernel, a classification algorithm from the same family as k-NN. In many
applications, this similar algorithm serves as a good substitute for k-NN. We
additionally investigate solutions for other threat models, often through
extensions on prior single data owner systems
Large Scale Visual Recommendations From Street Fashion Images
We describe a completely automated large scale visual recommendation system
for fashion. Our focus is to efficiently harness the availability of large
quantities of online fashion images and their rich meta-data. Specifically, we
propose four data driven models in the form of Complementary Nearest Neighbor
Consensus, Gaussian Mixture Models, Texture Agnostic Retrieval and Markov Chain
LDA for solving this problem. We analyze relative merits and pitfalls of these
algorithms through extensive experimentation on a large-scale data set and
baseline them against existing ideas from color science. We also illustrate key
fashion insights learned through these experiments and show how they can be
employed to design better recommendation systems. Finally, we also outline a
large-scale annotated data set of fashion images (Fashion-136K) that can be
exploited for future vision research
Reverse Nearest Neighbor Heat Maps: A Tool for Influence Exploration
We study the problem of constructing a reverse nearest neighbor (RNN) heat
map by finding the RNN set of every point in a two-dimensional space. Based on
the RNN set of a point, we obtain a quantitative influence (i.e., heat) for the
point. The heat map provides a global view on the influence distribution in the
space, and hence supports exploratory analyses in many applications such as
marketing and resource management. To construct such a heat map, we first
reduce it to a problem called Region Coloring (RC), which divides the space
into disjoint regions within which all the points have the same RNN set. We
then propose a novel algorithm named CREST that efficiently solves the RC
problem by labeling each region with the heat value of its containing points.
In CREST, we propose innovative techniques to avoid processing expensive RNN
queries and greatly reduce the number of region labeling operations. We perform
detailed analyses on the complexity of CREST and lower bounds of the RC
problem, and prove that CREST is asymptotically optimal in the worst case.
Extensive experiments with both real and synthetic data sets demonstrate that
CREST outperforms alternative algorithms by several orders of magnitude.Comment: Accepted to appear in ICDE 201
Big Universe, Big Data: Machine Learning and Image Analysis for Astronomy
Astrophysics and cosmology are rich with data. The advent of wide-area
digital cameras on large aperture telescopes has led to ever more ambitious
surveys of the sky. Data volumes of entire surveys a decade ago can now be
acquired in a single night and real-time analysis is often desired. Thus,
modern astronomy requires big data know-how, in particular it demands highly
efficient machine learning and image analysis algorithms. But scalability is
not the only challenge: Astronomy applications touch several current machine
learning research questions, such as learning from biased data and dealing with
label and measurement noise. We argue that this makes astronomy a great domain
for computer science research, as it pushes the boundaries of data analysis. In
the following, we will present this exciting application area for data
scientists. We will focus on exemplary results, discuss main challenges, and
highlight some recent methodological advancements in machine learning and image
analysis triggered by astronomical applications
- …