22,867 research outputs found

    Learning from graphs with structural variation

    Full text link
    We study the effect of structural variation in graph data on the predictive performance of graph kernels. To this end, we introduce a novel, noise-robust adaptation of the GraphHopper kernel and validate it on benchmark data, obtaining modestly improved predictive performance on a range of datasets. Next, we investigate the performance of the state-of-the-art Weisfeiler-Lehman graph kernel under increasing synthetic structural errors and find that the effect of introducing errors depends strongly on the dataset.Comment: Presented at the NIPS 2017 workshop "Learning on Distributions, Functions, Graphs and Groups

    Structural variation in generated health reports

    Get PDF
    We present a natural language generator that produces a range of medical reports on the clinical histories of cancer patients, and discuss the problem of conceptual restatement in generating various textual views of the same conceptual content. We focus on two features of our system: the demand for 'loose paraphrases' between the various reports on a given patient, with a high degree of semantic overlap but some necessary amount of distinctive content; and the requirement for paraphrasing at primarily the discourse level

    Protein structural variation in computational models and crystallographic data

    Get PDF
    Normal mode analysis offers an efficient way of modeling the conformational flexibility of protein structures. Simple models defined by contact topology, known as elastic network models, have been used to model a variety of systems, but the validation is typically limited to individual modes for a single protein. We use anisotropic displacement parameters from crystallography to test the quality of prediction of both the magnitude and directionality of conformational variance. Normal modes from four simple elastic network model potentials and from the CHARMM forcefield are calculated for a data set of 83 diverse, ultrahigh resolution crystal structures. While all five potentials provide good predictions of the magnitude of flexibility, the methods that consider all atoms have a clear edge at prediction of directionality, and the CHARMM potential produces the best agreement. The low-frequency modes from different potentials are similar, but those computed from the CHARMM potential show the greatest difference from the elastic network models. This was illustrated by computing the dynamic correlation matrices from different potentials for a PDZ domain structure. Comparison of normal mode results with anisotropic temperature factors opens the possibility of using ultrahigh resolution crystallographic data as a quantitative measure of molecular flexibility. The comprehensive evaluation demonstrates the costs and benefits of using normal mode potentials of varying complexity. Comparison of the dynamic correlation matrices suggests that a combination of topological and chemical potentials may help identify residues in which chemical forces make large contributions to intramolecular coupling.Comment: 17 pages, 4 figure

    Structural Data Recognition with Graph Model Boosting

    Get PDF
    This paper presents a novel method for structural data recognition using a large number of graph models. In general, prevalent methods for structural data recognition have two shortcomings: 1) Only a single model is used to capture structural variation. 2) Naive recognition methods are used, such as the nearest neighbor method. In this paper, we propose strengthening the recognition performance of these models as well as their ability to capture structural variation. The proposed method constructs a large number of graph models and trains decision trees using the models. This paper makes two main contributions. The first is a novel graph model that can quickly perform calculations, which allows us to construct several models in a feasible amount of time. The second contribution is a novel approach to structural data recognition: graph model boosting. Comprehensive structural variations can be captured with a large number of graph models constructed in a boosting framework, and a sophisticated classifier can be formed by aggregating the decision trees. Consequently, we can carry out structural data recognition with powerful recognition capability in the face of comprehensive structural variation. The experiments shows that the proposed method achieves impressive results and outperforms existing methods on datasets of IAM graph database repository.Comment: 8 page

    Identifying Structural Variation in Haploid Microbial Genomes from Short-Read Resequencing Data Using Breseq

    Get PDF
    Mutations that alter chromosomal structure play critical roles in evolution and disease, including in the origin of new lifestyles and pathogenic traits in microbes. Large-scale rearrangements in genomes are often mediated by recombination events involving new or existing copies of mobile genetic elements, recently duplicated genes, or other repetitive sequences. Most current software programs for predicting structural variation from short-read DNA resequencing data are intended primarily for use on human genomes. They typically disregard information in reads mapping to repeat sequences, and significant post-processing and manual examination of their output is often required to rule out false-positive predictions and precisely describe mutational events. Results: We have implemented an algorithm for identifying structural variation from DNA resequencing data as part of the breseq computational pipeline for predicting mutations in haploid microbial genomes. Our method evaluates the support for new sequence junctions present in a clonal sample from split-read alignments to a reference genome, including matches to repeat sequences. Then, it uses a statistical model of read coverage evenness to accept or reject these predictions. Finally, breseq combines predictions of new junctions and deleted chromosomal regions to output biologically relevant descriptions of mutations and their effects on genes. We demonstrate the performance of breseq on simulated Escherichia coli genomes with deletions generating unique breakpoint sequences, new insertions of mobile genetic elements, and deletions mediated by mobile elements. Then, we reanalyze data from an E. coli K-12 mutation accumulation evolution experiment in which structural variation was not previously identified. Transposon insertions and large-scale chromosomal changes detected by breseq account for similar to 25% of spontaneous mutations in this strain. In all cases, we find that breseq is able to reliably predict structural variation with modest read-depth coverage of the reference genome (>40-fold). Conclusions: Using breseq to predict structural variation should be useful for studies of microbial epidemiology, experimental evolution, synthetic biology, and genetics when a reference genome for a closely related strain is available. In these cases, breseq can discover mutations that may be responsible for important or unintended changes in genomes that might otherwise go undetected.U.S. National Institutes of Health R00-GM087550U.S. National Science Foundation (NSF) DEB-0515729NSF BEACON Center for the Study of Evolution in Action DBI-0939454Cancer Prevention & Research Institute of Texas (CPRIT) RP130124University of Texas at Austin startup fundsUniversity of Texas at AustinCPRIT Cancer Research TraineeshipMolecular Bioscience

    Louse (Insecta : Phthiraptera) mitochondrial 12S rRNA secondary structure is highly variable

    Get PDF
    Lice are ectoparasitic insects hosted by birds and mammals. Mitochondrial 12S rRNA sequences obtained from lice show considerable length variation and are very difficult to align. We show that the louse 12S rRNA domain III secondary structure displays considerable variation compared to other insects, in both the shape and number of stems and loops. Phylogenetic trees constructed from tree edit distances between louse 12S rRNA structures do not closely resemble trees constructed from sequence data, suggesting that at least some of this structural variation has arisen independently in different louse lineages. Taken together with previous work on mitochondrial gene order and elevated rates of substitution in louse mitochondrial sequences, the structural variation in louse 12S rRNA confirms the highly distinctive nature of molecular evolution in these insects

    Multi-platform discovery of haplotype-resolved structural variation in human genomes

    Get PDF
    corecore