12,053 research outputs found

    Developing and applying heterogeneous phylogenetic models with XRate

    Get PDF
    Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog

    Program Synthesis using Natural Language

    Get PDF
    Interacting with computers is a ubiquitous activity for millions of people. Repetitive or specialized tasks often require creation of small, often one-off, programs. End-users struggle with learning and using the myriad of domain-specific languages (DSLs) to effectively accomplish these tasks. We present a general framework for constructing program synthesizers that take natural language (NL) inputs and produce expressions in a target DSL. The framework takes as input a DSL definition and training data consisting of NL/DSL pairs. From these it constructs a synthesizer by learning optimal weights and classifiers (using NLP features) that rank the outputs of a keyword-programming based translation. We applied our framework to three domains: repetitive text editing, an intelligent tutoring system, and flight information queries. On 1200+ English descriptions, the respective synthesizers rank the desired program as the top-1 and top-3 for 80% and 90% descriptions respectively

    Data-Driven Shape Analysis and Processing

    Full text link
    Data-driven methods play an increasingly important role in discovering geometric, structural, and semantic relationships between 3D shapes in collections, and applying this analysis to support intelligent modeling, editing, and visualization of geometric data. In contrast to traditional approaches, a key feature of data-driven approaches is that they aggregate information from a collection of shapes to improve the analysis and processing of individual shapes. In addition, they are able to learn models that reason about properties and relationships of shapes without relying on hard-coded rules or explicitly programmed instructions. We provide an overview of the main concepts and components of these techniques, and discuss their application to shape classification, segmentation, matching, reconstruction, modeling and exploration, as well as scene analysis and synthesis, through reviewing the literature and relating the existing works with both qualitative and numerical comparisons. We conclude our report with ideas that can inspire future research in data-driven shape analysis and processing.Comment: 10 pages, 19 figure

    A simple online competitive adaptation of Lempel-Ziv compression with efficient random access support

    Get PDF
    We present a simple adaptation of the Lempel Ziv 78' (LZ78) compression scheme ({\em IEEE Transactions on Information Theory, 1978}) that supports efficient random access to the input string. Namely, given query access to the compressed string, it is possible to efficiently recover any symbol of the input string. The compression algorithm is given as input a parameter \eps >0, and with very high probability increases the length of the compressed string by at most a factor of (1+\eps). The access time is O(\log n + 1/\eps^2) in expectation, and O(\log n/\eps^2) with high probability. The scheme relies on sparse transitive-closure spanners. Any (consecutive) substring of the input string can be retrieved at an additional additive cost in the running time of the length of the substring. We also formally establish the necessity of modifying LZ78 so as to allow efficient random access. Specifically, we construct a family of strings for which Ω(n/logn)\Omega(n/\log n) queries to the LZ78-compressed string are required in order to recover a single symbol in the input string. The main benefit of the proposed scheme is that it preserves the online nature and simplicity of LZ78, and that for {\em every} input string, the length of the compressed string is only a small factor larger than that obtained by running LZ78
    corecore