15,181 research outputs found
A reliable order-statistics-based approximate nearest neighbor search algorithm
We propose a new algorithm for fast approximate nearest neighbor search based
on the properties of ordered vectors. Data vectors are classified based on the
index and sign of their largest components, thereby partitioning the space in a
number of cones centered in the origin. The query is itself classified, and the
search starts from the selected cone and proceeds to neighboring ones. Overall,
the proposed algorithm corresponds to locality sensitive hashing in the space
of directions, with hashing based on the order of components. Thanks to the
statistical features emerging through ordering, it deals very well with the
challenging case of unstructured data, and is a valuable building block for
more complex techniques dealing with structured data. Experiments on both
simulated and real-world data prove the proposed algorithm to provide a
state-of-the-art performance
Efficient transfer entropy analysis of non-stationary neural time series
Information theory allows us to investigate information processing in neural
systems in terms of information transfer, storage and modification. Especially
the measure of information transfer, transfer entropy, has seen a dramatic
surge of interest in neuroscience. Estimating transfer entropy from two
processes requires the observation of multiple realizations of these processes
to estimate associated probability density functions. To obtain these
observations, available estimators assume stationarity of processes to allow
pooling of observations over time. This assumption however, is a major obstacle
to the application of these estimators in neuroscience as observed processes
are often non-stationary. As a solution, Gomez-Herrero and colleagues
theoretically showed that the stationarity assumption may be avoided by
estimating transfer entropy from an ensemble of realizations. Such an ensemble
is often readily available in neuroscience experiments in the form of
experimental trials. Thus, in this work we combine the ensemble method with a
recently proposed transfer entropy estimator to make transfer entropy
estimation applicable to non-stationary time series. We present an efficient
implementation of the approach that deals with the increased computational
demand of the ensemble method's practical application. In particular, we use a
massively parallel implementation for a graphics processing unit to handle the
computationally most heavy aspects of the ensemble method. We test the
performance and robustness of our implementation on data from simulated
stochastic processes and demonstrate the method's applicability to
magnetoencephalographic data. While we mainly evaluate the proposed method for
neuroscientific data, we expect it to be applicable in a variety of fields that
are concerned with the analysis of information transfer in complex biological,
social, and artificial systems.Comment: 27 pages, 7 figures, submitted to PLOS ON
Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information
Conditional independence testing is a fundamental problem underlying causal
discovery and a particularly challenging task in the presence of nonlinear and
high-dimensional dependencies. Here a fully non-parametric test for continuous
data based on conditional mutual information combined with a local permutation
scheme is presented. Through a nearest neighbor approach, the test efficiently
adapts also to non-smooth distributions due to strongly nonlinear dependencies.
Numerical experiments demonstrate that the test reliably simulates the null
distribution even for small sample sizes and with high-dimensional conditioning
sets. The test is better calibrated than kernel-based tests utilizing an
analytical approximation of the null distribution, especially for non-smooth
densities, and reaches the same or higher power levels. Combining the local
permutation scheme with the kernel tests leads to better calibration, but
suffers in power. For smaller sample sizes and lower dimensions, the test is
faster than random fourier feature-based kernel tests if the permutation scheme
is (embarrassingly) parallelized, but the runtime increases more sharply with
sample size and dimensionality. Thus, more theoretical research to analytically
approximate the null distribution and speed up the estimation for larger sample
sizes is desirable.Comment: 17 pages, 12 figures, 1 tabl
Approximate Nearest Neighbor Fields in Video
We introduce RIANN (Ring Intersection Approximate Nearest Neighbor search),
an algorithm for matching patches of a video to a set of reference patches in
real-time. For each query, RIANN finds potential matches by intersecting rings
around key points in appearance space. Its search complexity is reversely
correlated to the amount of temporal change, making it a good fit for videos,
where typically most patches change slowly with time. Experiments show that
RIANN is up to two orders of magnitude faster than previous ANN methods, and is
the only solution that operates in real-time. We further demonstrate how RIANN
can be used for real-time video processing and provide examples for a range of
real-time video applications, including colorization, denoising, and several
artistic effects.Comment: A CVPR 2015 oral pape
Enhancing Domain Word Embedding via Latent Semantic Imputation
We present a novel method named Latent Semantic Imputation (LSI) to transfer
external knowledge into semantic space for enhancing word embedding. The method
integrates graph theory to extract the latent manifold structure of the
entities in the affinity space and leverages non-negative least squares with
standard simplex constraints and power iteration method to derive spectral
embeddings. It provides an effective and efficient approach to combining entity
representations defined in different Euclidean spaces. Specifically, our
approach generates and imputes reliable embedding vectors for low-frequency
words in the semantic space and benefits downstream language tasks that depend
on word embedding. We conduct comprehensive experiments on a carefully designed
classification problem and language modeling and demonstrate the superiority of
the enhanced embedding via LSI over several well-known benchmark embeddings. We
also confirm the consistency of the results under different parameter settings
of our method.Comment: ACM SIGKDD 201
- …