1,715 research outputs found

    Casual Compressive Sensing for Gene Network Inference

    Full text link
    We propose a novel framework for studying causal inference of gene interactions using a combination of compressive sensing and Granger causality techniques. The gist of the approach is to discover sparse linear dependencies between time series of gene expressions via a Granger-type elimination method. The method is tested on the Gardner dataset for the SOS network in E. coli, for which both known and unknown causal relationships are discovered

    Graph Kernels

    Get PDF
    We present a unified framework to study graph kernels, special cases of which include the random walk (Gärtner et al., 2003; Borgwardt et al., 2005) and marginalized (Kashima et al., 2003, 2004; Mahé et al., 2004) graph kernels. Through reduction to a Sylvester equation we improve the time complexity of kernel computation between unlabeled graphs with n vertices from O(n^6) to O(n^3). We find a spectral decomposition approach even more efficient when computing entire kernel matrices. For labeled graphs we develop conjugate gradient and fixed-point methods that take O(dn^3) time per iteration, where d is the size of the label set. By extending the necessary linear algebra to Reproducing Kernel Hilbert Spaces (RKHS) we obtain the same result for d-dimensional edge kernels, and O(n^4) in the infinite-dimensional case; on sparse graphs these algorithms only take O(n^2) time per iteration in all cases. Experiments on graphs from bioinformatics and other application domains show that these techniques can speed up computation of the kernel by an order of magnitude or more. We also show that certain rational kernels (Cortes et al., 2002, 2003, 2004) when specialized to graphs reduce to our random walk graph kernel. Finally, we relate our framework to R-convolution kernels (Haussler, 1999) and provide a kernel that is close to the optimal assignment kernel of Fröhlich et al. (2006) yet provably positive semi-definite

    Computational biology in the 21st century

    Get PDF
    Computational biologists answer biological and biomedical questions by using computation in support of—or in place of—laboratory procedures, hoping to obtain more accurate answers at a greatly reduced cost. The past two decades have seen unprecedented technological progress with regard to generating biological data; next-generation sequencing, mass spectrometry, microarrays, cryo-electron microscopy, and other highthroughput approaches have led to an explosion of data. However, this explosion is a mixed blessing. On the one hand, the scale and scope of data should allow new insights into genetic and infectious diseases, cancer, basic biology, and even human migration patterns. On the other hand, researchers are generating datasets so massive that it has become difficult to analyze them to discover patterns that give clues to the underlying biological processes.National Institutes of Health. (U.S.) ( grant GM108348)Hertz Foundatio

    Entropy-scaling search of massive biological data

    Get PDF
    Many datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the dataset is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve "compressive omics," and the general theory can be readily applied to data science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo

    Computational solutions for omics data

    Get PDF
    High-throughput experimental technologies are generating increasingly massive and complex genomic data sets. The sheer enormity and heterogeneity of these data threaten to make the arising problems computationally infeasible. Fortunately, powerful algorithmic techniques lead to software that can answer important biomedical questions in practice. In this Review, we sample the algorithmic landscape, focusing on state-of-the-art techniques, the understanding of which will aid the bench biologist in analysing omics data. We spotlight specific examples that have facilitated and enriched analyses of sequence, transcriptomic and network data sets.National Institutes of Health (U.S.) (Grant GM081871

    Dietary grape pomace supplementation in dairy cows: Effect on nutritional quality of milk and its derived dairy products

    Get PDF
    Grape pomace (GP) is the main solid by-product of winemaking and represents a rich source of potent bioactive compounds which could display a wide range of beneficial effects in human health for their association with reduced risk of several chronic diseases. Several studies have proposed the use of GP as a macro-ingredient to obtain economically worthwhile animal feedstuffs naturally enriched by polyphenols and dietary fibers. Moreover, the research carried out in this field in the last two decades evidences the ability of GP to induce beneficial effects in cow milk and its derived dairy products. First of all, a general increase in concentration of polyunsaturated fatty acids (PUFA) was observed, and this could be considered the reflection of the high content of these compounds in the by-product. Furthermore, an improvement in the oxidative stability of dairy products was observed, presumably as a direct consequence of the high content of bioactive compounds in GP that are credited with high and well-characterized antioxidant functions. Last but not least, particularly in ripened cheeses, volatile compounds (VOCs) were identified, arising both from lipolytic and proteolytic processes and commonly associated with pleasant aromatic notes. In conclusion, the GP introduction in the diet of lactating cows made it possible to obtain dairy products characterized by improved nutritional properties and high health functionality. Furthermore, the presumable improvement of organoleptic properties seems to be effective in contributing to an increase in the consumer acceptability of the novel products. This review aims to evaluate the effect of the dietary GP supplementation on the quality of milk and dairy products deriving from lactating dairy cows

    Algorithms for Inferring Multiple Microbial Networks

    Get PDF
    The interactions among the constituent members of a microbial community play a major role in determining the overall behavior of the community and the abundance levels of its members. These interactions can be modeled using a network whose nodes represent microbial taxa and edges represent pairwise interactions. A microbial network is a weighted graph that is constructed from a sample-taxa count matrix and can be used to model co-occurrences and/or interactions of the constituent members of a microbial community. The nodes in this graph represent microbial taxa and the edges represent pairwise associations amongst these taxa. A microbial network is typically constructed from a sample-taxa count matrix that is obtained by sequencing multiple biological samples and identifying taxa counts. From large-scale microbiome studies, it is evident that microbial community compositions and interactions are impacted by environmental and/or host factors. Thus, it is not unreasonable to expect that a sample-taxa matrix generated as part of a large study involving multiple environmental or clinical parameters can be associated with more than one microbial network. However, to our knowledge, microbial network inference methods proposed thus far assume that the sample-taxa matrix is associated with a single network. This dissertation addresses the scenario when the sample-taxa matrix is associated with K microbial networks and considers the computational problem of inferring K microbial networks from a given sample-taxa matrix. The contributions of this dissertation include 1) new frameworks to generate synthetic sample-taxa count data; 2)novel methods to combine mixture modeling with probabilistic graphical models to infer multiple interaction/association networks from microbial count data; 3) dealing with the compositionality aspect of microbial count data;4) extensive experiments on real and synthetic data; 5)new methods for model selection to infer the correct value of K
    corecore