71,350 research outputs found

    HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces

    Full text link
    Nearest neighbor searching of large databases in high-dimensional spaces is inherently difficult due to the curse of dimensionality. A flavor of approximation is, therefore, necessary to practically solve the problem of nearest neighbor search. In this paper, we propose a novel yet simple indexing scheme, HD-Index, to solve the problem of approximate k-nearest neighbor queries in massive high-dimensional databases. HD-Index consists of a set of novel hierarchical structures called RDB-trees built on Hilbert keys of database objects. The leaves of the RDB-trees store distances of database objects to reference objects, thereby allowing efficient pruning using distance filters. In addition to triangular inequality, we also use Ptolemaic inequality to produce better lower bounds. Experiments on massive (up to billion scale) high-dimensional (up to 1000+) datasets show that HD-Index is effective, efficient, and scalable.Comment: PVLDB 11(8):906-919, 201

    Multi-Scale Morphological Analysis of SDSS DR5 Survey using the Metric Space Technique

    Full text link
    Following novel development and adaptation of the Metric Space Technique (MST), a multi-scale morphological analysis of the Sloan Digital Sky Survey (SDSS) Data Release 5 (DR5) was performed. The technique was adapted to perform a space-scale morphological analysis by filtering the galaxy point distributions with a smoothing Gaussian function, thus giving quantitative structural information on all size scales between 5 and 250 Mpc. The analysis was performed on a dozen slices of a volume of space containing many newly measured galaxies from the SDSS DR5 survey. Using the MST, observational data were compared to galaxy samples taken from N-body simulations with current best estimates of cosmological parameters and from random catalogs. By using the maximal ranking method among MST output functions we also develop a way to quantify the overall similarity of the observed samples with the simulated samples

    Confocal microscopy of colloidal particles: towards reliable, optimum coordinates

    Full text link
    Over the last decade, the light microscope has become increasingly useful as a quantitative tool for studying colloidal systems. The ability to obtain particle coordinates in bulk samples from micrographs is particularly appealing. In this paper we review and extend methods for optimal image formation of colloidal samples, which is vital for particle coordinates of the highest accuracy, and for extracting the most reliable coordinates from these images. We discuss in depth the accuracy of the coordinates, which is sensitive to the details of the colloidal system and the imaging system. Moreover, this accuracy can vary between particles, particularly in dense systems. We introduce a previously unreported error estimate and use it to develop an iterative method for finding particle coordinates. This individual-particle accuracy assessment also allows comparison between particle locations obtained from different experiments. Though aimed primarily at confocal microscopy studies of colloidal systems, the methods outlined here should transfer readily to many other feature extraction problems, especially where features may overlap one another.Comment: Accepted by Advances in Colloid and Interface Scienc

    Efficient generation and optimization of stochastic template banks by a neighboring cell algorithm

    Full text link
    Placing signal templates (grid points) as efficiently as possible to cover a multi-dimensional parameter space is crucial in computing-intensive matched-filtering searches for gravitational waves, but also in similar searches in other fields of astronomy. To generate efficient coverings of arbitrary parameter spaces, stochastic template banks have been advocated, where templates are placed at random while rejecting those too close to others. However, in this simple scheme, for each new random point its distance to every template in the existing bank is computed. This rapidly increasing number of distance computations can render the acceptance of new templates computationally prohibitive, particularly for wide parameter spaces or in large dimensions. This work presents a neighboring cell algorithm that can dramatically improve the efficiency of constructing a stochastic template bank. By dividing the parameter space into sub-volumes (cells), for an arbitrary point an efficient hashing technique is exploited to obtain the index of its enclosing cell along with the parameters of its neighboring templates. Hence only distances to these neighboring templates in the bank are computed, massively lowering the overall computing cost, as demonstrated in simple examples. Furthermore, we propose a novel method based on this technique to increase the fraction of covered parameter space solely by directed template shifts, without adding any templates. As is demonstrated in examples, this method can be highly effective..Comment: PRD accepte

    Traction force microscopy with optimized regularization and automated Bayesian parameter selection for comparing cells

    Full text link
    Adherent cells exert traction forces on to their environment, which allows them to migrate, to maintain tissue integrity, and to form complex multicellular structures. This traction can be measured in a perturbation-free manner with traction force microscopy (TFM). In TFM, traction is usually calculated via the solution of a linear system, which is complicated by undersampled input data, acquisition noise, and large condition numbers for some methods. Therefore, standard TFM algorithms either employ data filtering or regularization. However, these approaches require a manual selection of filter- or regularization parameters and consequently exhibit a substantial degree of subjectiveness. This shortcoming is particularly serious when cells in different conditions are to be compared because optimal noise suppression needs to be adapted for every situation, which invariably results in systematic errors. Here, we systematically test the performance of new methods from computer vision and Bayesian inference for solving the inverse problem in TFM. We compare two classical schemes, L1- and L2-regularization, with three previously untested schemes, namely Elastic Net regularization, Proximal Gradient Lasso, and Proximal Gradient Elastic Net. Overall, we find that Elastic Net regularization, which combines L1 and L2 regularization, outperforms all other methods with regard to accuracy of traction reconstruction. Next, we develop two methods, Bayesian L2 regularization and Advanced Bayesian L2 regularization, for automatic, optimal L2 regularization. Using artificial data and experimental data, we show that these methods enable robust reconstruction of traction without requiring a difficult selection of regularization parameters specifically for each data set. Thus, Bayesian methods can mitigate the considerable uncertainty inherent in comparing cellular traction forces
    • …
    corecore