7,970 research outputs found

    Packing a Knapsack of Unknown Capacity

    Get PDF
    We study the problem of packing a knapsack without knowing its capacity. Whenever we attempt to pack an item that does not fit, the item is discarded; if the item fits, we have to include it in the packing. We show that there is always a policy that packs a value within factor 2 of the optimum packing, irrespective of the actual capacity. If all items have unit density, we achieve a factor equal to the golden ratio. Both factors are shown to be best possible. In fact, we obtain the above factors using packing policies that are universal in the sense that they fix a particular order of the items and try to pack the items in this order, independent of the observations made while packing. We give efficient algorithms computing these policies. On the other hand, we show that, for any alpha>1, the problem of deciding whether a given universal policy achieves a factor of alpha is coNP-complete. If alpha is part of the input, the same problem is shown to be coNP-complete for items with unit densities. Finally, we show that it is coNP-hard to decide, for given alpha, whether a set of items admits a universal policy with factor alpha, even if all items have unit densities

    Merging DNA metabarcoding and ecological network analysis to understand and build resilient terrestrial ecosystems

    Get PDF
    Summary 1. Significant advances in both mathematical and molecular approaches in ecology offer unprecedented opportunities to describe and understand ecosystem functioning. Ecological networks describe interactions between species, the underlying structure of communities and the function and stability of ecosystems. They provide the ability to assess the robustness of complex ecological communities to species loss, as well as a novel way of guiding restoration. However, empirically quantifying the interactions between entire communities remains a significant challenge. 2. Concomitantly, advances in DNA sequencing technologies are resolving previously intractable questions in functional and taxonomic biodiversity and provide enormous potential to determine hitherto difficult to observe species interactions. Combining DNA metabarcoding approaches with ecological network analysis presents important new opportunities for understanding large-scale ecological and evolutionary processes, as well as providing powerful tools for building ecosystems that are resilient to environmental change. 3. We propose a novel ‘nested tagging’ metabarcoding approach for the rapid construction of large, phylogenetically structured species-interaction networks. Taking tree–insect–parasitoid ecological networks as an illustration, we show how measures of network robustness, constructed using DNA metabarcoding, can be used to determine the consequences of tree species loss within forests, and forest habitat loss within wider landscapes. By determining which species and habitats are important to network integrity, we propose new directions for forest management. 4. Merging metabarcoding with ecological network analysis provides a revolutionary opportunity to construct some of the largest, phylogenetically structured species-interaction networks to date, providing new ways to: (i) monitor biodiversity and ecosystem functioning; (ii) assess the robustness of interacting communities to species loss; and (iii) build ecosystems that are more resilient to environmental change

    A model of large-scale proteome evolution

    Get PDF
    The next step in the understanding of the genome organization, after the determination of complete sequences, involves proteomics. The proteome includes the whole set of protein-protein interactions, and two recent independent studies have shown that its topology displays a number of surprising features shared by other complex networks, both natural and artificial. In order to understand the origins of this topology and its evolutionary implications, we present a simple model of proteome evolution that is able to reproduce many of the observed statistical regularities reported from the analysis of the yeast proteome. Our results suggest that the observed patterns can be explained by a process of gene duplication and diversification that would evolve proteome networks under a selection pressure, favoring robustness against failure of its individual components

    Universal Sequencing on an Unreliable Machine

    Get PDF
    We consider scheduling on an unreliable machine that may experience unexpected changes in processing speed or even full breakdowns. Our objective is to minimize ∑ wjf(Cj) for any nondecreasing, nonnegative, differentiable cost function f(Cj). We aim for a universal solution that performs well without adaptation for all cost functions for any possible machine behavior. We design a deterministic algorithm that finds a universal scheduling sequence with a solution value within 4 times the value of an optimal clairvoyant algorithm that knows the machine behavior in advance. A randomized version of this algorithm attains in expectation a ratio of e. We also show that both performance guarantees are best possible for any unbounded cost function. Our algorithms can be adapted to run in polynomial time with slightly increased cost. When jobs have individual release dates, the situation changes drastically. Even if all weights are equal, there are instances for which any universal solution is a factor of Ω(log n / log log n) worse than an optimal sequence for any unbounded cost function. Motivated by this hardness, we study the special case when the processing time of each job is proportional to its weight. We present a nontrivial algorithm with a small constant performance guarantee

    PhylOTU: a high-throughput procedure quantifies microbial community diversity and resolves novel taxa from metagenomic data.

    Get PDF
    Microbial diversity is typically characterized by clustering ribosomal RNA (SSU-rRNA) sequences into operational taxonomic units (OTUs). Targeted sequencing of environmental SSU-rRNA markers via PCR may fail to detect OTUs due to biases in priming and amplification. Analysis of shotgun sequenced environmental DNA, known as metagenomics, avoids amplification bias but generates fragmentary, non-overlapping sequence reads that cannot be clustered by existing OTU-finding methods. To circumvent these limitations, we developed PhylOTU, a computational workflow that identifies OTUs from metagenomic SSU-rRNA sequence data through the use of phylogenetic principles and probabilistic sequence profiles. Using simulated metagenomic data, we quantified the accuracy with which PhylOTU clusters reads into OTUs. Comparisons of PCR and shotgun sequenced SSU-rRNA markers derived from the global open ocean revealed that while PCR libraries identify more OTUs per sequenced residue, metagenomic libraries recover a greater taxonomic diversity of OTUs. In addition, we discover novel species, genera and families in the metagenomic libraries, including OTUs from phyla missed by analysis of PCR sequences. Taken together, these results suggest that PhylOTU enables characterization of part of the biosphere currently hidden from PCR-based surveys of diversity

    An Improved Algorithm for Generating Database Transactions from Relational Algebra Specifications

    Full text link
    Alloy is a lightweight modeling formalism based on relational algebra. In prior work with Fisler, Giannakopoulos, Krishnamurthi, and Yoo, we have presented a tool, Alchemy, that compiles Alloy specifications into implementations that execute against persistent databases. The foundation of Alchemy is an algorithm for rewriting relational algebra formulas into code for database transactions. In this paper we report on recent progress in improving the robustness and efficiency of this transformation

    A computational method for estimating the PCR duplication rate in DNA and RNA-seq experiments.

    Get PDF
    BackgroundPCR amplification is an important step in the preparation of DNA sequencing libraries prior to high-throughput sequencing. PCR amplification introduces redundant reads in the sequence data and estimating the PCR duplication rate is important to assess the frequency of such reads. Existing computational methods do not distinguish PCR duplicates from "natural" read duplicates that represent independent DNA fragments and therefore, over-estimate the PCR duplication rate for DNA-seq and RNA-seq experiments.ResultsIn this paper, we present a computational method to estimate the average PCR duplication rate of high-throughput sequence datasets that accounts for natural read duplicates by leveraging heterozygous variants in an individual genome. Analysis of simulated data and exome sequence data from the 1000 Genomes project demonstrated that our method can accurately estimate the PCR duplication rate on paired-end as well as single-end read datasets which contain a high proportion of natural read duplicates. Further, analysis of exome datasets prepared using the Nextera library preparation method indicated that 45-50% of read duplicates correspond to natural read duplicates likely due to fragmentation bias. Finally, analysis of RNA-seq datasets from individuals in the 1000 Genomes project demonstrated that 70-95% of read duplicates observed in such datasets correspond to natural duplicates sampled from genes with high expression and identified outlier samples with a 2-fold greater PCR duplication rate than other samples.ConclusionsThe method described here is a useful tool for estimating the PCR duplication rate of high-throughput sequence datasets and for assessing the fraction of read duplicates that correspond to natural read duplicates. An implementation of the method is available at https://github.com/vibansal/PCRduplicates
    • …
    corecore