71,350 research outputs found
HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces
Nearest neighbor searching of large databases in high-dimensional spaces is
inherently difficult due to the curse of dimensionality. A flavor of
approximation is, therefore, necessary to practically solve the problem of
nearest neighbor search. In this paper, we propose a novel yet simple indexing
scheme, HD-Index, to solve the problem of approximate k-nearest neighbor
queries in massive high-dimensional databases. HD-Index consists of a set of
novel hierarchical structures called RDB-trees built on Hilbert keys of
database objects. The leaves of the RDB-trees store distances of database
objects to reference objects, thereby allowing efficient pruning using distance
filters. In addition to triangular inequality, we also use Ptolemaic inequality
to produce better lower bounds. Experiments on massive (up to billion scale)
high-dimensional (up to 1000+) datasets show that HD-Index is effective,
efficient, and scalable.Comment: PVLDB 11(8):906-919, 201
Multi-Scale Morphological Analysis of SDSS DR5 Survey using the Metric Space Technique
Following novel development and adaptation of the Metric Space Technique
(MST), a multi-scale morphological analysis of the Sloan Digital Sky Survey
(SDSS) Data Release 5 (DR5) was performed. The technique was adapted to perform
a space-scale morphological analysis by filtering the galaxy point
distributions with a smoothing Gaussian function, thus giving quantitative
structural information on all size scales between 5 and 250 Mpc. The analysis
was performed on a dozen slices of a volume of space containing many newly
measured galaxies from the SDSS DR5 survey. Using the MST, observational data
were compared to galaxy samples taken from N-body simulations with current best
estimates of cosmological parameters and from random catalogs. By using the
maximal ranking method among MST output functions we also develop a way to
quantify the overall similarity of the observed samples with the simulated
samples
Confocal microscopy of colloidal particles: towards reliable, optimum coordinates
Over the last decade, the light microscope has become increasingly useful as
a quantitative tool for studying colloidal systems. The ability to obtain
particle coordinates in bulk samples from micrographs is particularly
appealing. In this paper we review and extend methods for optimal image
formation of colloidal samples, which is vital for particle coordinates of the
highest accuracy, and for extracting the most reliable coordinates from these
images. We discuss in depth the accuracy of the coordinates, which is sensitive
to the details of the colloidal system and the imaging system. Moreover, this
accuracy can vary between particles, particularly in dense systems. We
introduce a previously unreported error estimate and use it to develop an
iterative method for finding particle coordinates. This individual-particle
accuracy assessment also allows comparison between particle locations obtained
from different experiments. Though aimed primarily at confocal microscopy
studies of colloidal systems, the methods outlined here should transfer readily
to many other feature extraction problems, especially where features may
overlap one another.Comment: Accepted by Advances in Colloid and Interface Scienc
Efficient generation and optimization of stochastic template banks by a neighboring cell algorithm
Placing signal templates (grid points) as efficiently as possible to cover a
multi-dimensional parameter space is crucial in computing-intensive
matched-filtering searches for gravitational waves, but also in similar
searches in other fields of astronomy. To generate efficient coverings of
arbitrary parameter spaces, stochastic template banks have been advocated,
where templates are placed at random while rejecting those too close to others.
However, in this simple scheme, for each new random point its distance to every
template in the existing bank is computed. This rapidly increasing number of
distance computations can render the acceptance of new templates
computationally prohibitive, particularly for wide parameter spaces or in large
dimensions. This work presents a neighboring cell algorithm that can
dramatically improve the efficiency of constructing a stochastic template bank.
By dividing the parameter space into sub-volumes (cells), for an arbitrary
point an efficient hashing technique is exploited to obtain the index of its
enclosing cell along with the parameters of its neighboring templates. Hence
only distances to these neighboring templates in the bank are computed,
massively lowering the overall computing cost, as demonstrated in simple
examples. Furthermore, we propose a novel method based on this technique to
increase the fraction of covered parameter space solely by directed template
shifts, without adding any templates. As is demonstrated in examples, this
method can be highly effective..Comment: PRD accepte
Traction force microscopy with optimized regularization and automated Bayesian parameter selection for comparing cells
Adherent cells exert traction forces on to their environment, which allows
them to migrate, to maintain tissue integrity, and to form complex
multicellular structures. This traction can be measured in a perturbation-free
manner with traction force microscopy (TFM). In TFM, traction is usually
calculated via the solution of a linear system, which is complicated by
undersampled input data, acquisition noise, and large condition numbers for
some methods. Therefore, standard TFM algorithms either employ data filtering
or regularization. However, these approaches require a manual selection of
filter- or regularization parameters and consequently exhibit a substantial
degree of subjectiveness. This shortcoming is particularly serious when cells
in different conditions are to be compared because optimal noise suppression
needs to be adapted for every situation, which invariably results in systematic
errors. Here, we systematically test the performance of new methods from
computer vision and Bayesian inference for solving the inverse problem in TFM.
We compare two classical schemes, L1- and L2-regularization, with three
previously untested schemes, namely Elastic Net regularization, Proximal
Gradient Lasso, and Proximal Gradient Elastic Net. Overall, we find that
Elastic Net regularization, which combines L1 and L2 regularization,
outperforms all other methods with regard to accuracy of traction
reconstruction. Next, we develop two methods, Bayesian L2 regularization and
Advanced Bayesian L2 regularization, for automatic, optimal L2 regularization.
Using artificial data and experimental data, we show that these methods enable
robust reconstruction of traction without requiring a difficult selection of
regularization parameters specifically for each data set. Thus, Bayesian
methods can mitigate the considerable uncertainty inherent in comparing
cellular traction forces
- …