22,867 research outputs found
Learning from graphs with structural variation
We study the effect of structural variation in graph data on the predictive
performance of graph kernels. To this end, we introduce a novel, noise-robust
adaptation of the GraphHopper kernel and validate it on benchmark data,
obtaining modestly improved predictive performance on a range of datasets.
Next, we investigate the performance of the state-of-the-art Weisfeiler-Lehman
graph kernel under increasing synthetic structural errors and find that the
effect of introducing errors depends strongly on the dataset.Comment: Presented at the NIPS 2017 workshop "Learning on Distributions,
Functions, Graphs and Groups
Structural variation in generated health reports
We present a natural language generator that produces a range of medical reports on the clinical histories of
cancer patients, and discuss the problem of conceptual restatement in generating various textual views of the
same conceptual content. We focus on two features of our system: the demand for 'loose paraphrases' between
the various reports on a given patient, with a high degree of semantic overlap but some necessary amount of distinctive content; and the requirement for paraphrasing at primarily the discourse level
Protein structural variation in computational models and crystallographic data
Normal mode analysis offers an efficient way of modeling the conformational
flexibility of protein structures. Simple models defined by contact topology,
known as elastic network models, have been used to model a variety of systems,
but the validation is typically limited to individual modes for a single
protein. We use anisotropic displacement parameters from crystallography to
test the quality of prediction of both the magnitude and directionality of
conformational variance. Normal modes from four simple elastic network model
potentials and from the CHARMM forcefield are calculated for a data set of 83
diverse, ultrahigh resolution crystal structures. While all five potentials
provide good predictions of the magnitude of flexibility, the methods that
consider all atoms have a clear edge at prediction of directionality, and the
CHARMM potential produces the best agreement. The low-frequency modes from
different potentials are similar, but those computed from the CHARMM potential
show the greatest difference from the elastic network models. This was
illustrated by computing the dynamic correlation matrices from different
potentials for a PDZ domain structure. Comparison of normal mode results with
anisotropic temperature factors opens the possibility of using ultrahigh
resolution crystallographic data as a quantitative measure of molecular
flexibility. The comprehensive evaluation demonstrates the costs and benefits
of using normal mode potentials of varying complexity. Comparison of the
dynamic correlation matrices suggests that a combination of topological and
chemical potentials may help identify residues in which chemical forces make
large contributions to intramolecular coupling.Comment: 17 pages, 4 figure
Structural Data Recognition with Graph Model Boosting
This paper presents a novel method for structural data recognition using a
large number of graph models. In general, prevalent methods for structural data
recognition have two shortcomings: 1) Only a single model is used to capture
structural variation. 2) Naive recognition methods are used, such as the
nearest neighbor method. In this paper, we propose strengthening the
recognition performance of these models as well as their ability to capture
structural variation. The proposed method constructs a large number of graph
models and trains decision trees using the models. This paper makes two main
contributions. The first is a novel graph model that can quickly perform
calculations, which allows us to construct several models in a feasible amount
of time. The second contribution is a novel approach to structural data
recognition: graph model boosting. Comprehensive structural variations can be
captured with a large number of graph models constructed in a boosting
framework, and a sophisticated classifier can be formed by aggregating the
decision trees. Consequently, we can carry out structural data recognition with
powerful recognition capability in the face of comprehensive structural
variation. The experiments shows that the proposed method achieves impressive
results and outperforms existing methods on datasets of IAM graph database
repository.Comment: 8 page
Identifying Structural Variation in Haploid Microbial Genomes from Short-Read Resequencing Data Using Breseq
Mutations that alter chromosomal structure play critical roles in evolution and disease, including in the origin of new lifestyles and pathogenic traits in microbes. Large-scale rearrangements in genomes are often mediated by recombination events involving new or existing copies of mobile genetic elements, recently duplicated genes, or other repetitive sequences. Most current software programs for predicting structural variation from short-read DNA resequencing data are intended primarily for use on human genomes. They typically disregard information in reads mapping to repeat sequences, and significant post-processing and manual examination of their output is often required to rule out false-positive predictions and precisely describe mutational events. Results: We have implemented an algorithm for identifying structural variation from DNA resequencing data as part of the breseq computational pipeline for predicting mutations in haploid microbial genomes. Our method evaluates the support for new sequence junctions present in a clonal sample from split-read alignments to a reference genome, including matches to repeat sequences. Then, it uses a statistical model of read coverage evenness to accept or reject these predictions. Finally, breseq combines predictions of new junctions and deleted chromosomal regions to output biologically relevant descriptions of mutations and their effects on genes. We demonstrate the performance of breseq on simulated Escherichia coli genomes with deletions generating unique breakpoint sequences, new insertions of mobile genetic elements, and deletions mediated by mobile elements. Then, we reanalyze data from an E. coli K-12 mutation accumulation evolution experiment in which structural variation was not previously identified. Transposon insertions and large-scale chromosomal changes detected by breseq account for similar to 25% of spontaneous mutations in this strain. In all cases, we find that breseq is able to reliably predict structural variation with modest read-depth coverage of the reference genome (>40-fold). Conclusions: Using breseq to predict structural variation should be useful for studies of microbial epidemiology, experimental evolution, synthetic biology, and genetics when a reference genome for a closely related strain is available. In these cases, breseq can discover mutations that may be responsible for important or unintended changes in genomes that might otherwise go undetected.U.S. National Institutes of Health R00-GM087550U.S. National Science Foundation (NSF) DEB-0515729NSF BEACON Center for the Study of Evolution in Action DBI-0939454Cancer Prevention & Research Institute of Texas (CPRIT) RP130124University of Texas at Austin startup fundsUniversity of Texas at AustinCPRIT Cancer Research TraineeshipMolecular Bioscience
Louse (Insecta : Phthiraptera) mitochondrial 12S rRNA secondary structure is highly variable
Lice are ectoparasitic insects hosted by birds and mammals. Mitochondrial 12S rRNA sequences obtained from lice show considerable length variation and are very difficult to align. We show that the louse 12S rRNA domain III secondary structure displays considerable variation compared to other insects, in both the shape and number of stems and loops. Phylogenetic trees constructed from tree edit distances between louse 12S rRNA structures do not closely resemble trees constructed from sequence data, suggesting that at least some of this structural variation has arisen independently in different louse lineages. Taken together with previous work on mitochondrial gene order and elevated rates of substitution in louse mitochondrial sequences, the structural variation in louse 12S rRNA confirms the highly distinctive nature of molecular evolution in these insects
- …