981 research outputs found

    Entropy-scaling search of massive biological data

    Get PDF
    Many datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the dataset is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve "compressive omics," and the general theory can be readily applied to data science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo

    From metagenomics to the metagenome: Conceptual change and the rhetoric of translational genomic research

    Get PDF
    As the international genomic research community moves from the tool-making efforts of the Human Genome Project into biomedical applications of those tools, new metaphors are being suggested as useful to understanding how our genes work – and for understanding who we are as biological organisms. In this essay we focus on the Human Microbiome Project as one such translational initiative. The HMP is a new ‘metagenomic’ research effort to sequence the genomes of human microbiological flora, in order to pursue the interesting hypothesis that our ‘microbiome’ plays a vital and interactive role with our human genome in normal human physiology. Rather than describing the human genome as the ‘blueprint’ for human nature, the promoters of the HMP stress the ways in which our primate lineage DNA is interdependent with the genomes of our microbiological flora. They argue that the human body should be understood as an ecosystem with multiple ecological niches and habitats in which a variety of cellular species collaborate and compete, and that human beings should be understood as ‘superorganisms’ that incorporate multiple symbiotic cell species into a single individual with very blurry boundaries. These metaphors carry interesting philosophical messages, but their inspiration is not entirely ideological. Instead, part of their cachet within genome science stems from the ways in which they are rooted in genomic research techniques, in what philosophers of science have called a ‘tools-to-theory’ heuristic. Their emergence within genome science illustrates the complexity of conceptual change in translational research, by showing how it reflects both aspirational and methodological influences

    Microbial Biodiversity and Molecular Approach

    Get PDF
    Biodiversity is given by the variety of species on Earth resulting from billions of years of evolution. Molecular-phylogenetic studies have revealed that the main diversity of life is microbial and it is distributed among three domains: Achaea, Bacteria, and Eukarya. The functioning of whole biosphere depends absolutely on the activities of the microbial world. Due to their versatility microbes are the major natural providers of ecological services as well play major role in semi-artificial systems such as sewage treatment plants, landfills, and in toxic waste bioremediation. As for other organisms many pressures and drivers are causing decrease of microbial biodiversity. Several publications document the effect of chemical pollutants e.g. Polycyclic Aromatic Hydrocarbons (PAHs), of atmospheric pollution, of temperature change and of fertilization on microbial community structure. These studies are now possible because sequencing technologies are in ongoing revolution allowing massive de novo sequencing producing millions of bases in a single day. Metagenomics, metatranscriptomics, metaproteomics and single-cell sequencing are approaches providing a view not only of the community structure (species phylogeny, richness, and distribution) but also of the functional (metabolic) potential of a community because virtually about all genes are captured and sequenced. Unfortunately, although microrganisms are very important for the functioning of whole biosphere public knowledge, awareness and political actions did not yet deal with microbes when biodiversity and its decrease are highlighted.JRC.DDG.H.5-Rural, water and ecosystem resource

    Sequencing effort dictates gene discovery in marine microbial metagenomes

    Get PDF
    © 2020 The Authors. Environmental Microbiology published by Society for Applied Microbiology and John Wiley & Sons Ltd. Massive metagenomic sequencing combined with gene prediction methods were previously used to compile the gene catalogue of the ocean and host-associated microbes. Global expeditions conducted over the past 15 years have sampled the ocean to build a catalogue of genes from pelagic microbes. Here we undertook a large sequencing effort of a perturbed Red Sea plankton community to uncover that the rate of gene discovery increases continuously with sequencing effort, with no indication that the retrieved 2.83 million non-redundant (complete) genes predicted from the experiment represented a nearly complete inventory of the genes present in the sampled community (i.e., no evidence of saturation). The underlying reason is the Pareto-like distribution of the abundance of genes in the plankton community, resulting in a very long tail of millions of genes present at remarkably low abundances, which can only be retrieved through massive sequencing. Microbial metagenomic projects retrieve a variable number of unique genes per Tera base-pair (Tbp), with a median value of 14.7 million unique genes per Tbp sequenced across projects. The increase in the rate of gene discovery in microbial metagenomes with sequencing effort implies that there is ample room for new gene discovery in further ocean and holobiont sequencing studies

    Untapped Bounty: Sampling the Seas to Survey Microbial Biodiversity

    Get PDF
    • …
    corecore