Search CORE

87,277 research outputs found

Indexing Metric Spaces for Exact Similarity Search

Author: Chen Lu
Gao Yunjun
Jensen Christian S.
Li Zheng
Miao Xiaoye
Song Xuan
Zhu Yifan
Publication venue
Publication date: 07/05/2020
Field of study

With the continued digitalization of societal processes, we are seeing an explosion in available data. This is referred to as big data. In a research setting, three aspects of the data are often viewed as the main sources of challenges when attempting to enable value creation from big data: volume, velocity and variety. Many studies address volume or velocity, while much fewer studies concern the variety. Metric space is ideal for addressing variety because it can accommodate any type of data as long as its associated distance notion satisfies the triangle inequality. To accelerate search in metric space, a collection of indexing techniques for metric data have been proposed. However, existing surveys each offers only a narrow coverage, and no comprehensive empirical study of those techniques exists. We offer a survey of all the existing metric indexes that can support exact similarity search, by i) summarizing all the existing partitioning, pruning and validation techniques used for metric indexes, ii) providing the time and storage complexity analysis on the index construction, and iii) report on a comprehensive empirical comparison of their similarity query processing performance. Here, empirical comparisons are used to evaluate the index performance during search as it is hard to see the complexity analysis differences on the similarity query processing and the query performance depends on the pruning and validation abilities related to the data distribution. This article aims at revealing different strengths and weaknesses of different indexing techniques in order to offer guidance on selecting an appropriate indexing technique for a given setting, and directing the future research for metric indexes

arXiv.org e-Print Archive

VBN

Recommended from our members

Genetic variation in the HLA region is associated with susceptibility to herpes zoster.

Author: Armstrong G
Baldwin E
Borthwick KM
Bottinger E
Burt A
Carlson CS
Carrell DS
Carroll RJ
Comstock BA
Crane PK
Crawford DC
Crosslin DR
de Andrade M
Denny JC
Doheny KF
Hanna DS
Harley JB
Hayes MG
Jarvik GP
Keating B
Kho A
Kim DS
Kuivaniemi H
Kullo IJ
Larson EB
Li R
McCarty CA
Mirel DB
Mukherjee S
Pacheco J
Peissig PL
Pugh E
Ritchie MD
Stallings S
Tromp G
Underwood JG
Verma SS
Publication venue: eScholarship, University of California
Publication date: 01/01/2015
Field of study

Herpes zoster, commonly referred to as shingles, is caused by the varicella zoster virus (VZV). VZV initially manifests as chicken pox, most commonly in childhood, can remain asymptomatically latent in nerve tissues for many years and often re-emerges as shingles. Although reactivation may be related to immune suppression, aging and female sex, most inter-individual variability in re-emergence risk has not been explained to date. We performed a genome-wide association analyses in 22,981 participants (2280 shingles cases) from the electronic Medical Records and Genomics Network. Using Cox survival and logistic regression, we identified a genomic region in the combined and European ancestry groups that has an age of onset effect reaching genome-wide significance (P>1.0 × 10(-8)). This region tags the non-coding gene HCP5 (HLA Complex P5) in the major histocompatibility complex. This gene is an endogenous retrovirus and likely influences viral activity through regulatory functions. Variants in this genetic region are known to be associated with delay in development of AIDS in people infected by HIV. Our study provides further suggestion that this region may have a critical role in viral suppression and could potentially harbor a clinically actionable variant for the shingles vaccine

eScholarship - University of California

SOAP3-dp: Fast, Accurate and Sensitive GPU-based Short Read Aligner

Author: Chang Yu
Chi-Man Liu
David W Cheung
Edward Wu
Haoxiang Lin
Hing-Fung Ting
Jianqiao Zhu
Lap-Kei Lee
Ruibang Luo
Ruiqiang Li
Shaoliang Peng
Siu-Ming Yiu
Tak-Wah Lam
Thomas Wong
Wenjuan Zhu
Xiaoqian Zhu
Yingrui Li
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS), most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including BWA, Bowtie2, SeqAlto, GEM and GPU-based aligners including BarraCUDA and CUSHAW, SOAP3-dp is two to tens of times faster, while maintaining the highest sensitivity and lowest false discovery rate (FDR) on Illumina reads with different lengths. Transcending its predecessor SOAP3, which does not allow gapped alignment, SOAP3-dp by default tolerates alignment similarity as low as 60 percent. Real data evaluation using human genome demonstrates SOAP3-dp's power to enable more authentic variants and longer Indels to be discovered. Fosmid sequencing shows a 9.1 percent FDR on newly discovered deletions. SOAP3-dp natively supports BAM file format and provides a scoring scheme same as BWA, which enables it to be integrated into existing analysis pipelines. SOAP3-dp has been deployed on Amazon-EC2, NIH-Biowulf and Tianhe-1A.Comment: 21 pages, 6 figures, submitted to PLoS ONE, additional files available at "https://www.dropbox.com/sh/bhclhxpoiubh371/O5CO_CkXQE". Comments most welcom

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

HKU Scholars Hub

FigShare

Genetic Dissection of a QTL Affecting Bone Geometry.

Author: Farber Charles R
Medrano Juan F
Sabik Olivia L
Publication venue: eScholarship, University of California
Publication date: 11/01/2017
Field of study

Parameters of bone geometry such as width, length, and cross-sectional area are major determinants of bone strength. Although these traits are highly heritable, few genes influencing bone geometry have been identified. Here, we dissect a major quantitative trait locus (QTL) influencing femur size. This QTL was originally identified in an F2 cross between the C57BL/6J-hg/hg (HG) and CAST/EiJ strains and was referred to as femur length in high growth mice 2 (Feml2). Feml2 was located on chromosome (Chr.) 9 at ∼20 cM. Here, we show that the HG.CAST-(D9Mit249-D9Mit133)/Ucd congenic strain captures Feml2 In an F2 congenic cross, we fine-mapped the location of Feml2 to an ∼6 Mbp region extending from 57.3 to 63.3 Mbp on Chr. 9. We have identified candidates by mining the complete genome sequence of CAST/EiJ and through allele-specific expression (ASE) analysis of growth plates in C57BL/6J × CAST/EiJ F1 hybrids. Interestingly, we also find that the refined location of Feml2 overlaps a cluster of six independent genome-wide associations for human height. This work provides the foundation for the identification of novel genes affecting bone geometry

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Computing Real Roots of Real Polynomials ... and now For Real!

Author: Kobel Alexander
Rouillier Fabrice
Sagraloff Michael
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Very recent work introduces an asymptotically fast subdivision algorithm, denoted ANewDsc, for isolating the real roots of a univariate real polynomial. The method combines Descartes' Rule of Signs to test intervals for the existence of roots, Newton iteration to speed up convergence against clusters of roots, and approximate computation to decrease the required precision. It achieves record bounds on the worst-case complexity for the considered problem, matching the complexity of Pan's method for computing all complex roots and improving upon the complexity of other subdivision methods by several magnitudes. In the article at hand, we report on an implementation of ANewDsc on top of the RS root isolator. RS is a highly efficient realization of the classical Descartes method and currently serves as the default real root solver in Maple. We describe crucial design changes within ANewDsc and RS that led to a high-performance implementation without harming the theoretical complexity of the underlying algorithm. With an excerpt of our extensive collection of benchmarks, available online at http://anewdsc.mpi-inf.mpg.de/, we illustrate that the theoretical gain in performance of ANewDsc over other subdivision methods also transfers into practice. These experiments also show that our new implementation outperforms both RS and mature competitors by magnitudes for notoriously hard instances with clustered roots. For all other instances, we avoid almost any overhead by integrating additional optimizations and heuristics.Comment: Accepted for presentation at the 41st International Symposium on Symbolic and Algebraic Computation (ISSAC), July 19--22, 2016, Waterloo, Ontario, Canad

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

MPG.PuRe

Numerical modelling of metal melt refining process in ladle with rotating impeller and breakwaters

Author: Merder Tomasz
Michalek Karel
Pieprzyca Jacek
Saternus Mariola
Sviželová Jana
Tkadlečková Markéta
Walek Josef
Publication venue: Polska Akademia Nauk, Instytut Metalurgii i Inżynierii Materiałowej
Publication date: 01/01/2019
Field of study

The paper describes research and development of aluminium melt refining technology in a ladle with rotating impeller and breakwaters using numerical modelling of a finite volume/element method. The theoretical aspects of refining technology are outlined. The design of the numerical model is described and discussed. The differences between real process conditions and numerical model limitations are mentioned. Based on the hypothesis and the results of numerical modelling, the most appropriate setting of the numerical model is recommended. Also, the possibilities of monitoring of degassing are explained. The results of numerical modelling allow to improve the refining technology of metal melts and to control the final quality under different boundary conditions, such as rotating speed, shape and position of rotating impeller, breakwaters and intensity of inert gas blowing through the impeller.Web of Science64266465

Biblioteka Nauki - repozytorium artykuÅÃ³w

DSpace at VSB Technical University of Ostrava