216 research outputs found
Indexing the Earth Mover's Distance Using Normal Distributions
Querying uncertain data sets (represented as probability distributions)
presents many challenges due to the large amount of data involved and the
difficulties comparing uncertainty between distributions. The Earth Mover's
Distance (EMD) has increasingly been employed to compare uncertain data due to
its ability to effectively capture the differences between two distributions.
Computing the EMD entails finding a solution to the transportation problem,
which is computationally intensive. In this paper, we propose a new lower bound
to the EMD and an index structure to significantly improve the performance of
EMD based K-nearest neighbor (K-NN) queries on uncertain databases. We propose
a new lower bound to the EMD that approximates the EMD on a projection vector.
Each distribution is projected onto a vector and approximated by a normal
distribution, as well as an accompanying error term. We then represent each
normal as a point in a Hough transformed space. We then use the concept of
stochastic dominance to implement an efficient index structure in the
transformed space. We show that our method significantly decreases K-NN query
time on uncertain databases. The index structure also scales well with database
cardinality. It is well suited for heterogeneous data sets, helping to keep EMD
based queries tractable as uncertain data sets become larger and more complex.Comment: VLDB201
Answering Top-k Queries Over a Mixture of Attractive and Repulsive Dimensions
In this paper, we formulate a top-k query that compares objects in a database
to a user-provided query object on a novel scoring function. The proposed
scoring function combines the idea of attractive and repulsive dimensions into
a general framework to overcome the weakness of traditional distance or
similarity measures. We study the properties of the proposed class of scoring
functions and develop efficient and scalable index structures that index the
isolines of the function. We demonstrate various scenarios where the query
finds application. Empirical evaluation demonstrates a performance gain of one
to two orders of magnitude on querying time over existing state-of-the-art
top-k techniques. Further, a qualitative analysis is performed on a real
dataset to highlight the potential of the proposed query in discovering hidden
data characteristics.Comment: VLDB201
Mind Reader: Reconstructing complex images from brain activities
Understanding how the brain encodes external stimuli and how these stimuli
can be decoded from the measured brain activities are long-standing and
challenging questions in neuroscience. In this paper, we focus on
reconstructing the complex image stimuli from fMRI (functional magnetic
resonance imaging) signals. Unlike previous works that reconstruct images with
single objects or simple shapes, our work aims to reconstruct image stimuli
that are rich in semantics, closer to everyday scenes, and can reveal more
perspectives. However, data scarcity of fMRI datasets is the main obstacle to
applying state-of-the-art deep learning models to this problem. We find that
incorporating an additional text modality is beneficial for the reconstruction
problem compared to directly translating brain signals to images. Therefore,
the modalities involved in our method are: (i) voxel-level fMRI signals, (ii)
observed images that trigger the brain signals, and (iii) textual description
of the images. To further address data scarcity, we leverage an aligned
vision-language latent space pre-trained on massive datasets. Instead of
training models from scratch to find a latent space shared by the three
modalities, we encode fMRI signals into this pre-aligned latent space. Then,
conditioned on embeddings in this space, we reconstruct images with a
generative model. The reconstructed images from our pipeline balance both
naturalness and fidelity: they are photo-realistic and capture the ground truth
image contents well
- …