87,277 research outputs found
Indexing Metric Spaces for Exact Similarity Search
With the continued digitalization of societal processes, we are seeing an
explosion in available data. This is referred to as big data. In a research
setting, three aspects of the data are often viewed as the main sources of
challenges when attempting to enable value creation from big data: volume,
velocity and variety. Many studies address volume or velocity, while much fewer
studies concern the variety. Metric space is ideal for addressing variety
because it can accommodate any type of data as long as its associated distance
notion satisfies the triangle inequality. To accelerate search in metric space,
a collection of indexing techniques for metric data have been proposed.
However, existing surveys each offers only a narrow coverage, and no
comprehensive empirical study of those techniques exists. We offer a survey of
all the existing metric indexes that can support exact similarity search, by i)
summarizing all the existing partitioning, pruning and validation techniques
used for metric indexes, ii) providing the time and storage complexity analysis
on the index construction, and iii) report on a comprehensive empirical
comparison of their similarity query processing performance. Here, empirical
comparisons are used to evaluate the index performance during search as it is
hard to see the complexity analysis differences on the similarity query
processing and the query performance depends on the pruning and validation
abilities related to the data distribution. This article aims at revealing
different strengths and weaknesses of different indexing techniques in order to
offer guidance on selecting an appropriate indexing technique for a given
setting, and directing the future research for metric indexes
Recommended from our members
Genetic variation in the HLA region is associated with susceptibility to herpes zoster.
Herpes zoster, commonly referred to as shingles, is caused by the varicella zoster virus (VZV). VZV initially manifests as chicken pox, most commonly in childhood, can remain asymptomatically latent in nerve tissues for many years and often re-emerges as shingles. Although reactivation may be related to immune suppression, aging and female sex, most inter-individual variability in re-emergence risk has not been explained to date. We performed a genome-wide association analyses in 22,981 participants (2280 shingles cases) from the electronic Medical Records and Genomics Network. Using Cox survival and logistic regression, we identified a genomic region in the combined and European ancestry groups that has an age of onset effect reaching genome-wide significance (P>1.0 × 10(-8)). This region tags the non-coding gene HCP5 (HLA Complex P5) in the major histocompatibility complex. This gene is an endogenous retrovirus and likely influences viral activity through regulatory functions. Variants in this genetic region are known to be associated with delay in development of AIDS in people infected by HIV. Our study provides further suggestion that this region may have a critical role in viral suppression and could potentially harbor a clinically actionable variant for the shingles vaccine
SOAP3-dp: Fast, Accurate and Sensitive GPU-based Short Read Aligner
To tackle the exponentially increasing throughput of Next-Generation
Sequencing (NGS), most of the existing short-read aligners can be configured to
favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging
the computational power of both CPU and GPU with optimized algorithms, delivers
high speed and sensitivity simultaneously. Compared with widely adopted
aligners including BWA, Bowtie2, SeqAlto, GEM and GPU-based aligners including
BarraCUDA and CUSHAW, SOAP3-dp is two to tens of times faster, while
maintaining the highest sensitivity and lowest false discovery rate (FDR) on
Illumina reads with different lengths. Transcending its predecessor SOAP3,
which does not allow gapped alignment, SOAP3-dp by default tolerates alignment
similarity as low as 60 percent. Real data evaluation using human genome
demonstrates SOAP3-dp's power to enable more authentic variants and longer
Indels to be discovered. Fosmid sequencing shows a 9.1 percent FDR on newly
discovered deletions. SOAP3-dp natively supports BAM file format and provides a
scoring scheme same as BWA, which enables it to be integrated into existing
analysis pipelines. SOAP3-dp has been deployed on Amazon-EC2, NIH-Biowulf and
Tianhe-1A.Comment: 21 pages, 6 figures, submitted to PLoS ONE, additional files
available at "https://www.dropbox.com/sh/bhclhxpoiubh371/O5CO_CkXQE".
Comments most welcom
Genetic Dissection of a QTL Affecting Bone Geometry.
Parameters of bone geometry such as width, length, and cross-sectional area are major determinants of bone strength. Although these traits are highly heritable, few genes influencing bone geometry have been identified. Here, we dissect a major quantitative trait locus (QTL) influencing femur size. This QTL was originally identified in an F2 cross between the C57BL/6J-hg/hg (HG) and CAST/EiJ strains and was referred to as femur length in high growth mice 2 (Feml2). Feml2 was located on chromosome (Chr.) 9 at ∼20 cM. Here, we show that the HG.CAST-(D9Mit249-D9Mit133)/Ucd congenic strain captures Feml2 In an F2 congenic cross, we fine-mapped the location of Feml2 to an ∼6 Mbp region extending from 57.3 to 63.3 Mbp on Chr. 9. We have identified candidates by mining the complete genome sequence of CAST/EiJ and through allele-specific expression (ASE) analysis of growth plates in C57BL/6J × CAST/EiJ F1 hybrids. Interestingly, we also find that the refined location of Feml2 overlaps a cluster of six independent genome-wide associations for human height. This work provides the foundation for the identification of novel genes affecting bone geometry
Computing Real Roots of Real Polynomials ... and now For Real!
Very recent work introduces an asymptotically fast subdivision algorithm,
denoted ANewDsc, for isolating the real roots of a univariate real polynomial.
The method combines Descartes' Rule of Signs to test intervals for the
existence of roots, Newton iteration to speed up convergence against clusters
of roots, and approximate computation to decrease the required precision. It
achieves record bounds on the worst-case complexity for the considered problem,
matching the complexity of Pan's method for computing all complex roots and
improving upon the complexity of other subdivision methods by several
magnitudes.
In the article at hand, we report on an implementation of ANewDsc on top of
the RS root isolator. RS is a highly efficient realization of the classical
Descartes method and currently serves as the default real root solver in Maple.
We describe crucial design changes within ANewDsc and RS that led to a
high-performance implementation without harming the theoretical complexity of
the underlying algorithm.
With an excerpt of our extensive collection of benchmarks, available online
at http://anewdsc.mpi-inf.mpg.de/, we illustrate that the theoretical gain in
performance of ANewDsc over other subdivision methods also transfers into
practice. These experiments also show that our new implementation outperforms
both RS and mature competitors by magnitudes for notoriously hard instances
with clustered roots. For all other instances, we avoid almost any overhead by
integrating additional optimizations and heuristics.Comment: Accepted for presentation at the 41st International Symposium on
Symbolic and Algebraic Computation (ISSAC), July 19--22, 2016, Waterloo,
Ontario, Canad
Numerical modelling of metal melt refining process in ladle with rotating impeller and breakwaters
The paper describes research and development of aluminium melt refining technology in a ladle with rotating impeller and breakwaters using numerical modelling of a finite volume/element method. The theoretical aspects of refining technology are outlined. The design of the numerical model is described and discussed. The differences between real process conditions and numerical model limitations are mentioned. Based on the hypothesis and the results of numerical modelling, the most appropriate setting of the numerical model is recommended. Also, the possibilities of monitoring of degassing are explained. The results of numerical modelling allow to improve the refining technology of metal melts and to control the final quality under different boundary conditions, such as rotating speed, shape and position of rotating impeller, breakwaters and intensity of inert gas blowing through the impeller.Web of Science64266465
- …