Search CORE

21,107 research outputs found

Entropy-scaling search of massive biological data

Author: Berger Bonnie
Daniels Noah M.
Danko David Christian
Yu Y. William
Publication venue: 'Elsevier BV'
Publication date: 01/06/2015
Field of study

Many datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the dataset is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve "compressive omics," and the general theory can be readily applied to data science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo

arXiv.org e-Print Archive

Elsevier - Publisher Connector

DSpace@MIT

PubMed Central

Euclidean distance geometry and applications

Author: Lavor Carlile
Liberti Leo
Maculan Nelson
Mucherino Antonio
Publication venue
Publication date: 02/05/2012
Field of study

Euclidean distance geometry is the study of Euclidean geometry based on the concept of distance. This is useful in several applications where the input data consists of an incomplete set of distances, and the output is a set of points in Euclidean space that realizes the given distances. We survey some of the theory of Euclidean distance geometry and some of the most important applications: molecular conformation, localization of sensor networks and statics.Comment: 64 pages, 21 figure

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

INRIA a CCSD electronic archive server

Repositorio da Producao Cientifica e Intelectual da Unicamp

HAL-Polytechnique

HAL-Rennes 1

Gains in Power from Structured Two-Sample Tests of Means on Graphs

Author: Dudoit Sandrine
Jacob Laurent
Neuvial Pierre
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2010
Field of study

We consider multivariate two-sample tests of means, where the location shift between the two populations is expected to be related to a known graph structure. An important application of such tests is the detection of differentially expressed genes between two patient populations, as shifts in expression levels are expected to be coherent with the structure of graphs reflecting gene properties such as biological process, molecular function, regulation, or metabolism. For a fixed graph of interest, we demonstrate that accounting for graph structure can yield more powerful tests under the assumption of smooth distribution shift on the graph. We also investigate the identification of non-homogeneous subgraphs of a given large graph, which poses both computational and multiple testing problems. The relevance and benefits of the proposed approach are illustrated on synthetic data and on breast cancer gene expression data analyzed in context of KEGG pathways

arXiv.org e-Print Archive

Collection Of Biostatistics Research Archive

Basic Understanding of Condensed Phases of Matter via Packing Models

Author: Torquato Salvatore
Publication venue: 'AIP Publishing'
Publication date: 09/05/2018
Field of study

Packing problems have been a source of fascination for millenia and their study has produced a rich literature that spans numerous disciplines. Investigations of hard-particle packing models have provided basic insights into the structure and bulk properties of condensed phases of matter, including low-temperature states (e.g., molecular and colloidal liquids, crystals and glasses), multiphase heterogeneous media, granular media, and biological systems. The densest packings are of great interest in pure mathematics, including discrete geometry and number theory. This perspective reviews pertinent theoretical and computational literature concerning the equilibrium, metastable and nonequilibrium packings of hard-particle packings in various Euclidean space dimensions. In the case of jammed packings, emphasis will be placed on the "geometric-structure" approach, which provides a powerful and unified means to quantitatively characterize individual packings via jamming categories and "order" maps. It incorporates extremal jammed states, including the densest packings, maximally random jammed states, and lowest-density jammed structures. Packings of identical spheres, spheres with a size distribution, and nonspherical particles are also surveyed. We close this review by identifying challenges and open questions for future research.Comment: 33 pages, 20 figures, Invited "Perspective" submitted to the Journal of Chemical Physics. arXiv admin note: text overlap with arXiv:1008.298

arXiv.org e-Print Archive

Princeton University Open Access Repository

A geometric method for model reduction of biochemical networks with polynomial rate functions

Author: Fröhlich Holger
Grigoriev Dima
Radulescu Ovidiu
Samal Satya Swarup
Weber Andreas
Publication venue
Publication date: 01/01/2015
Field of study

Model reduction of biochemical networks relies on the knowledge of slow and fast variables. We provide a geometric method, based on the Newton polytope, to identify slow variables of a biochemical network with polynomial rate functions. The gist of the method is the notion of tropical equilibration that provides approximate descriptions of slow invariant manifolds. Compared to extant numerical algorithms such as the intrinsic low dimensional manifold method, our approach is symbolic and utilizes orders of magnitude instead of precise values of the model parameters. Application of this method to a large collection of biochemical network models supports the idea that the number of dynamical variables in minimal models of cell physiology can be small, in spite of the large number of molecular regulatory actors

arXiv.org e-Print Archive

MPG.PuRe