87,277 research outputs found

    Indexing Metric Spaces for Exact Similarity Search

    Full text link
    With the continued digitalization of societal processes, we are seeing an explosion in available data. This is referred to as big data. In a research setting, three aspects of the data are often viewed as the main sources of challenges when attempting to enable value creation from big data: volume, velocity and variety. Many studies address volume or velocity, while much fewer studies concern the variety. Metric space is ideal for addressing variety because it can accommodate any type of data as long as its associated distance notion satisfies the triangle inequality. To accelerate search in metric space, a collection of indexing techniques for metric data have been proposed. However, existing surveys each offers only a narrow coverage, and no comprehensive empirical study of those techniques exists. We offer a survey of all the existing metric indexes that can support exact similarity search, by i) summarizing all the existing partitioning, pruning and validation techniques used for metric indexes, ii) providing the time and storage complexity analysis on the index construction, and iii) report on a comprehensive empirical comparison of their similarity query processing performance. Here, empirical comparisons are used to evaluate the index performance during search as it is hard to see the complexity analysis differences on the similarity query processing and the query performance depends on the pruning and validation abilities related to the data distribution. This article aims at revealing different strengths and weaknesses of different indexing techniques in order to offer guidance on selecting an appropriate indexing technique for a given setting, and directing the future research for metric indexes

    SOAP3-dp: Fast, Accurate and Sensitive GPU-based Short Read Aligner

    Get PDF
    To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS), most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including BWA, Bowtie2, SeqAlto, GEM and GPU-based aligners including BarraCUDA and CUSHAW, SOAP3-dp is two to tens of times faster, while maintaining the highest sensitivity and lowest false discovery rate (FDR) on Illumina reads with different lengths. Transcending its predecessor SOAP3, which does not allow gapped alignment, SOAP3-dp by default tolerates alignment similarity as low as 60 percent. Real data evaluation using human genome demonstrates SOAP3-dp's power to enable more authentic variants and longer Indels to be discovered. Fosmid sequencing shows a 9.1 percent FDR on newly discovered deletions. SOAP3-dp natively supports BAM file format and provides a scoring scheme same as BWA, which enables it to be integrated into existing analysis pipelines. SOAP3-dp has been deployed on Amazon-EC2, NIH-Biowulf and Tianhe-1A.Comment: 21 pages, 6 figures, submitted to PLoS ONE, additional files available at "https://www.dropbox.com/sh/bhclhxpoiubh371/O5CO_CkXQE". Comments most welcom

    Genetic Dissection of a QTL Affecting Bone Geometry.

    Get PDF
    Parameters of bone geometry such as width, length, and cross-sectional area are major determinants of bone strength. Although these traits are highly heritable, few genes influencing bone geometry have been identified. Here, we dissect a major quantitative trait locus (QTL) influencing femur size. This QTL was originally identified in an F2 cross between the C57BL/6J-hg/hg (HG) and CAST/EiJ strains and was referred to as femur length in high growth mice 2 (Feml2). Feml2 was located on chromosome (Chr.) 9 at ∼20 cM. Here, we show that the HG.CAST-(D9Mit249-D9Mit133)/Ucd congenic strain captures Feml2 In an F2 congenic cross, we fine-mapped the location of Feml2 to an ∼6 Mbp region extending from 57.3 to 63.3 Mbp on Chr. 9. We have identified candidates by mining the complete genome sequence of CAST/EiJ and through allele-specific expression (ASE) analysis of growth plates in C57BL/6J × CAST/EiJ F1 hybrids. Interestingly, we also find that the refined location of Feml2 overlaps a cluster of six independent genome-wide associations for human height. This work provides the foundation for the identification of novel genes affecting bone geometry

    Computing Real Roots of Real Polynomials ... and now For Real!

    Full text link
    Very recent work introduces an asymptotically fast subdivision algorithm, denoted ANewDsc, for isolating the real roots of a univariate real polynomial. The method combines Descartes' Rule of Signs to test intervals for the existence of roots, Newton iteration to speed up convergence against clusters of roots, and approximate computation to decrease the required precision. It achieves record bounds on the worst-case complexity for the considered problem, matching the complexity of Pan's method for computing all complex roots and improving upon the complexity of other subdivision methods by several magnitudes. In the article at hand, we report on an implementation of ANewDsc on top of the RS root isolator. RS is a highly efficient realization of the classical Descartes method and currently serves as the default real root solver in Maple. We describe crucial design changes within ANewDsc and RS that led to a high-performance implementation without harming the theoretical complexity of the underlying algorithm. With an excerpt of our extensive collection of benchmarks, available online at http://anewdsc.mpi-inf.mpg.de/, we illustrate that the theoretical gain in performance of ANewDsc over other subdivision methods also transfers into practice. These experiments also show that our new implementation outperforms both RS and mature competitors by magnitudes for notoriously hard instances with clustered roots. For all other instances, we avoid almost any overhead by integrating additional optimizations and heuristics.Comment: Accepted for presentation at the 41st International Symposium on Symbolic and Algebraic Computation (ISSAC), July 19--22, 2016, Waterloo, Ontario, Canad

    Numerical modelling of metal melt refining process in ladle with rotating impeller and breakwaters

    Get PDF
    The paper describes research and development of aluminium melt refining technology in a ladle with rotating impeller and breakwaters using numerical modelling of a finite volume/element method. The theoretical aspects of refining technology are outlined. The design of the numerical model is described and discussed. The differences between real process conditions and numerical model limitations are mentioned. Based on the hypothesis and the results of numerical modelling, the most appropriate setting of the numerical model is recommended. Also, the possibilities of monitoring of degassing are explained. The results of numerical modelling allow to improve the refining technology of metal melts and to control the final quality under different boundary conditions, such as rotating speed, shape and position of rotating impeller, breakwaters and intensity of inert gas blowing through the impeller.Web of Science64266465
    corecore