Search CORE

3,428 research outputs found

Recommended from our members

The Variable Markov Oracle: Algorithms for Human Gesture Applications

Author: Dubnov Shlomo
Wang Cheng-i
Publication venue: eScholarship, University of California
Publication date: 01/01/2015
Field of study

This article introduces the Variable Markov Oracle (VMO) data structure for multivariate time series indexing. VMO can identify repetitive fragments and find sequential similarities between observations. VMO can also be viewed as a combination of online clustering algorithms with variable-order Markov constraints. The authors use VMO for gesture query-by-content and gesture following. A probabilistic interpretation of the VMO query-matching algorithm is proposed to find an analogy to the inference problem in a hidden Markov model (HMM). This probabilistic interpretation extends VMO to be not only a data structure but also a model for time series. Query-by-content experiments were conducted on a gesture database that was recorded using a Kinect 3D camera, showing state-of-the-art performance. The query-by-content experiments' results are compared to previous works using HMM and dynamic time warping. Gesture following is described in the context of an interactive dance environment that aims to integrate human movements with computer-generated graphics to create an augmented reality performance

eScholarship - University of California

The Parallelism Motifs of Genomic Data Analysis

Author: Awan Muaaz
Azad Ariful
Brock Benjamin
Buluc Aydin
Egan Rob
Ekanayake Saliya
Ellis Marquita
Georganas Evangelos
Guidi Giulia
Hofmeyr Steven
Oliker Leonid
Selvitopi Oguz
Teodoropol Cristina
Yelick Katherine
Publication venue: 'The Royal Society'
Publication date: 20/01/2020
Field of study

Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

arXiv.org e-Print Archive

eScholarship - University of California

Indexing arbitrary-length $k$ -mers in sequencing reads

Author: Deorowicz Sebastian
Grabowski Szymon
Kowalski Tomasz
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 13/02/2015
Field of study

We propose a lightweight data structure for indexing and querying collections of NGS reads data in main memory. The data structure supports the interface proposed in the pioneering work by Philippe et al. for counting and locating

k

-mers in sequencing reads. Our solution, PgSA (pseudogenome suffix array), based on finding overlapping reads, is competitive to the existing algorithms in the space use, query times, or both. The main applications of our index include variant calling, error correction and analysis of reads from RNA-seq experiments

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Directory of Open Access Journals

Entropy-scaling search of massive biological data

Author: Berger Bonnie
Daniels Noah M.
Danko David Christian
Yu Y. William
Publication venue: 'Elsevier BV'
Publication date: 01/06/2015
Field of study

Many datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the dataset is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve "compressive omics," and the general theory can be readily applied to data science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo

arXiv.org e-Print Archive

Elsevier - Publisher Connector

DSpace@MIT

PubMed Central

Sorting suffixes of a text via its Lyndon Factorization

Author: Mantaci Sabrina
Restivo Antonio
Rosone Giovanna
Sciortino Marinella
Publication venue
Publication date: 01/01/2013
Field of study

The process of sorting the suffixes of a text plays a fundamental role in Text Algorithms. They are used for instance in the constructions of the Burrows-Wheeler transform and the suffix array, widely used in several fields of Computer Science. For this reason, several recent researches have been devoted to finding new strategies to obtain effective methods for such a sorting. In this paper we introduce a new methodology in which an important role is played by the Lyndon factorization, so that the local suffixes inside factors detected by this factorization keep their mutual order when extended to the suffixes of the whole word. This property suggests a versatile technique that easily can be adapted to different implementative scenarios.Comment: Submitted to the Prague Stringology Conference 2013 (PSC 2013

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Palermo