25,796 research outputs found

    A detachment algorithm for inferring a graph from path frequency

    Get PDF
    Abstract: Inferring graphs from path frequency has been studied as an important problem which has a potential application to drug design and elucidation of chemical structures. Given a multiple set g of strings of labels with length at most K, the problem asks to find a vertex-labeled graph G that attains a one-to-one correspondence between g and the occurrences of labels along all paths of length at most K in G. In this paper, we prove that the problem with K = 1 can be formulated as a problem of finding a loopless and connected detachment, based on which an efficient algorithm for solving the problem is derived. Our algorithm also solves the problem with an additional constraint such that every vertex in an inferred graph is required to have a specified degree

    REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads.

    Get PDF
    Repeat elements are important components of eukaryotic genomes. One limitation in our understanding of repeat elements is that most analyses rely on reference genomes that are incomplete and often contain missing data in highly repetitive regions that are difficult to assemble. To overcome this problem we develop a new method, REPdenovo, which assembles repeat sequences directly from raw shotgun sequencing data. REPdenovo can construct various types of repeats that are highly repetitive and have low sequence divergence within copies. We show that REPdenovo is substantially better than existing methods both in terms of the number and the completeness of the repeat sequences that it recovers. The key advantage of REPdenovo is that it can reconstruct long repeats from sequence reads. We apply the method to human data and discover a number of potentially new repeats sequences that have been missed by previous repeat annotations. Many of these sequences are incorporated into various parasite genomes, possibly because the filtering process for host DNA involved in the sequencing of the parasite genomes failed to exclude the host derived repeat sequences. REPdenovo is a new powerful computational tool for annotating genomes and for addressing questions regarding the evolution of repeat families. The software tool, REPdenovo, is available for download at https://github.com/Reedwarbler/REPdenovo

    Segregating Event Streams and Noise with a Markov Renewal Process Model

    Get PDF
    DS and MP are supported by EPSRC Leadership Fellowship EP/G007144/1

    Induction of Topological Environment Maps from Sequences of Visited Places

    Get PDF
    In this paper we address the problem of topologically mapping environments which contain inherent perceptual aliasing caused by repeated environment structures. We propose an approach that does not use motion or odometric information but only a sequence of deterministic measurements observed by traversing an environment. Our algorithm implements a stochastic local search to build a small map which is consistent with local adjacency information extracted from a sequence of observations. Moreover, local adjacency information is incorporated to disambiguate places which are physically different but appear identical to the robots senses. Experiments show that the proposed method is capable of mapping environments with a high degree of perceptual aliasing, and that it infers a small map quickly

    Plausible Mobility: Inferring Movement from Contacts

    Full text link
    We address the difficult question of inferring plausible node mobility based only on information from wireless contact traces. Working with mobility information allows richer protocol simulations, particularly in dense networks, but requires complex set-ups to measure, whereas contact information is easier to measure but only allows for simplistic simulation models. In a contact trace a lot of node movement information is irretrievably lost so the original positions and velocities are in general out of reach. We propose a fast heuristic algorithm, inspired by dynamic force-based graph drawing, capable of inferring a plausible movement from any contact trace, and evaluate it on both synthetic and real-life contact traces. Our results reveal that (i) the quality of the inferred mobility is directly linked to the precision of the measured contact trace, and (ii) the simple addition of appropriate anticipation forces between nodes leads to an accurate inferred mobility.Comment: 8 pages, 8 figures, 1 tabl

    Inference of population splits and mixtures from genome-wide allele frequency data

    Full text link
    Many aspects of the historical relationships between populations in a species are reflected in genetic data. Inferring these relationships from genetic data, however, remains a challenging task. In this paper, we present a statistical model for inferring the patterns of population splits and mixtures in multiple populations. In this model, the sampled populations in a species are related to their common ancestor through a graph of ancestral populations. Using genome-wide allele frequency data and a Gaussian approximation to genetic drift, we infer the structure of this graph. We applied this method to a set of 55 human populations and a set of 82 dog breeds and wild canids. In both species, we show that a simple bifurcating tree does not fully describe the data; in contrast, we infer many migration events. While some of the migration events that we find have been detected previously, many have not. For example, in the human data we infer that Cambodians trace approximately 16% of their ancestry to a population ancestral to other extant East Asian populations. In the dog data, we infer that both the boxer and basenji trace a considerable fraction of their ancestry (9% and 25%, respectively) to wolves subsequent to domestication, and that East Asian toy breeds (the Shih Tzu and the Pekingese) result from admixture between modern toy breeds and "ancient" Asian breeds. Software implementing the model described here, called TreeMix, is available at http://treemix.googlecode.comComment: 28 pages, 6 figures in main text. Attached supplement is 22 pages, 15 figures. This is an updated version of the preprint available at http://precedings.nature.com/documents/6956/version/
    corecore