428 research outputs found

    Directed acyclic graph kernels for structural RNA analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent discoveries of a large variety of important roles for non-coding RNAs (ncRNAs) have been reported by numerous researchers. In order to analyze ncRNAs by kernel methods including support vector machines, we propose stem kernels as an extension of string kernels for measuring the similarities between two RNA sequences from the viewpoint of secondary structures. However, applying stem kernels directly to large data sets of ncRNAs is impractical due to their computational complexity.</p> <p>Results</p> <p>We have developed a new technique based on directed acyclic graphs (DAGs) derived from base-pairing probability matrices of RNA sequences that significantly increases the computation speed of stem kernels. Furthermore, we propose profile-profile stem kernels for multiple alignments of RNA sequences which utilize base-pairing probability matrices for multiple alignments instead of those for individual sequences. Our kernels outperformed the existing methods with respect to the detection of known ncRNAs and kernel hierarchical clustering.</p> <p>Conclusion</p> <p>Stem kernels can be utilized as a reliable similarity measure of structural RNAs, and can be used in various kernel-based applications.</p

    Dagstuhl Reports : Volume 1, Issue 2, February 2011

    Get PDF
    Online Privacy: Towards Informational Self-Determination on the Internet (Dagstuhl Perspectives Workshop 11061) : Simone Fischer-Hübner, Chris Hoofnagle, Kai Rannenberg, Michael Waidner, Ioannis Krontiris and Michael Marhöfer Self-Repairing Programs (Dagstuhl Seminar 11062) : Mauro Pezzé, Martin C. Rinard, Westley Weimer and Andreas Zeller Theory and Applications of Graph Searching Problems (Dagstuhl Seminar 11071) : Fedor V. Fomin, Pierre Fraigniaud, Stephan Kreutzer and Dimitrios M. Thilikos Combinatorial and Algorithmic Aspects of Sequence Processing (Dagstuhl Seminar 11081) : Maxime Crochemore, Lila Kari, Mehryar Mohri and Dirk Nowotka Packing and Scheduling Algorithms for Information and Communication Services (Dagstuhl Seminar 11091) Klaus Jansen, Claire Mathieu, Hadas Shachnai and Neal E. Youn

    Kernel-based visualisation of genes with the gene ontology

    Full text link
    With the development of microarray-based high- throughput technologies for examining genetic and biological information en masse, biologists are now faced with making sense of large lists of genes identi-ffed from their biological experiments. There is a vital need for \system biology" approaches which can allow biologists to see new or unanticipated potential relationships which will lead to new hypotheses and eventual new knowledge. Finding and understanding relationships in this data is a problem well suited to visualisation. We augment genes with their associated terms from the Gene Ontology and visualise them using kernel Principal Component Analysis with both specialised linear and Gaussian kernels. Our results show that this method can correctly visualise genes by their functional relationships and we describe the difference between using the linear and Gaussian kernels on the problem. © 2008, Australian Computer Society, Inc

    Graph kernels based on tree patterns for molecules

    Full text link
    Motivated by chemical applications, we revisit and extend a family of positive definite kernels for graphs based on the detection of common subtrees, initially proposed by Ramon et al. (2003). We propose new kernels with a parameter to control the complexity of the subtrees used as features to represent the graphs. This parameter allows to smoothly interpolate between classical graph kernels based on the count of common walks, on the one hand, and kernels that emphasize the detection of large common subtrees, on the other hand. We also propose two modular extensions to this formulation. The first extension increases the number of subtrees that define the feature space, and the second one removes noisy features from the graph representations. We validate experimentally these new kernels on binary classification tasks consisting in discriminating toxic and non-toxic molecules with support vector machines

    GraphClust: alignment-free structural clustering of local RNA secondary structures

    Get PDF
    Motivation: Clustering according to sequence–structure similarity has now become a generally accepted scheme for ncRNA annotation. Its application to complete genomic sequences as well as whole transcriptomes is therefore desirable but hindered by extremely high computational costs

    The Weight Function in the Subtree Kernel is Decisive

    Get PDF
    Tree data are ubiquitous because they model a large variety of situations, e.g., the architecture of plants, the secondary structure of RNA, or the hierarchy of XML files. Nevertheless, the analysis of these non-Euclidean data is difficult per se. In this paper, we focus on the subtree kernel that is a convolution kernel for tree data introduced by Vishwanathan and Smola in the early 2000's. More precisely, we investigate the influence of the weight function from a theoretical perspective and in real data applications. We establish on a 2-classes stochastic model that the performance of the subtree kernel is improved when the weight of leaves vanishes, which motivates the definition of a new weight function, learned from the data and not fixed by the user as usually done. To this end, we define a unified framework for computing the subtree kernel from ordered or unordered trees, that is particularly suitable for tuning parameters. We show through eight real data classification problems the great efficiency of our approach, in particular for small datasets, which also states the high importance of the weight function. Finally, a visualization tool of the significant features is derived.Comment: 36 page

    09511 Abstracts Collection -- Parameterized complexity and approximation algorithms

    Get PDF
    From 14. 12. 2009 to 17. 12. 2009., the Dagstuhl Seminar 09511 ``Parameterized complexity and approximation algorithms \u27\u27 was held in Schloss Dagstuhl~--~Leibniz Center for Informatics. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

    An assessment of gene regulatory network inference algorithms

    Get PDF
    A conceptual issue regarding gene regulatory network (GRN) inference algorithms is establishing their validity or correctness. In this study, we argue that for this purpose it is useful to conceive these algorithms as estimators of graph-valued parameters of explicit models for gene expression data. On this basis, we perform an assessment of a selection of influential GRN inference algorithms as estimators for two types of models: (i) causal graphs with associated structural equations models (SEMs), and (ii) differential equations models based on the thermodynamics of gene expression. Our findings corroborate that networks of marginal dependence fail in estimating GRNs, but they also suggest that the strength of statistical association as measured by mutual information may be indicative of GRN structure. Also, in simulations, we find that the GRN inference algorithms GENIE3 and TIGRESS outperform competing algorithms. However, more importantly, we also find that many observed patterns hinge on the GRN topology and the assumed data generating mechanism.Un problema conceptual con respecto a los algoritmos de inferencia de redes de regulación génica (RRG) es cómo establecer su validez. En este estudio sostenemos que para este objetivo conviene concebir estos algoritmos como estimadores de parámetros de modelos estadísticos explícitos para datos de expresión génica. Sobre esta base, realizamos una evaluación de una selección de algoritmos de inferencia de RRG como estimadores para dos tipos de modelos: (i) modelos de grafos causales asociados a modelos de ecuaciones estructurales (MEE), y (ii) modelos de ecuaciones diferenciales basados en la termodinámica de la expresion genica. Nuestros hallazgos corroboran que las redes de dependencias marginales fallan en la estimación de las RRG, pero también sugieren que la fuerza de la asociación estadística medida por la información mutua puede reflejar en cierto grado la estructura de las RRG. Además, en un estudio de simulaciones, encontramos que los algoritmos de inferencia GENIE3 y TIGRESS son los de mejor desempeño. Sin embargo, crucialmente, también encontramos que muchos patrones observados en las simulaciones dependen de la topología de la RRG y del modelo generador de datos.Maestrí
    corecore