1,357 research outputs found

    A weighted q-gram method for glycan structure classification

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Glycobiology pertains to the study of carbohydrate sugar chains, or glycans, in a particular cell or organism. Many computational approaches have been proposed for analyzing these complex glycan structures, which are chains of monosaccharides. The monosaccharides are linked to one another by glycosidic bonds, which can take on a variety of comformations, thus forming branches and resulting in complex tree structures. The <it>q</it>-gram method is one of these recent methods used to understand glycan function based on the classification of their tree structures. This <it>q</it>-gram method assumes that for a certain <it>q</it>, different <it>q</it>-grams share no similarity among themselves. That is, that if two structures have completely different components, then they are completely different. However, from a biological standpoint, this is not the case. In this paper, we propose a weighted <it>q</it>-gram method to measure the similarity among glycans by incorporating the similarity of the geometric structures, monosaccharides and glycosidic bonds among <it>q</it>-grams. In contrast to the traditional <it>q</it>-gram method, our weighted <it>q</it>-gram method admits similarity among <it>q</it>-grams for a certain <it>q</it>. Thus our new kernels for glycan structure were developed and then applied in SVMs to classify glycans.</p> <p>Results</p> <p>Two glycan datasets were used to compare the weighted <it>q</it>-gram method and the original <it>q</it>-gram method. The results show that the incorporation of <it>q</it>-gram similarity improves the classification performance for all of the important glycan classes tested.</p> <p>Conclusion</p> <p>The results in this paper indicate that similarity among <it>q</it>-grams obtained from geometric structure, monosaccharides and glycosidic linkage contributes to the glycan function classification. This is a big step towards the understanding of glycan function based on their complex structures.</p

    Extracting glycan motifs using a biochemicallyweighted kernel

    Get PDF
    Carbohydrates, or glycans, are one of the most abundant and structurally diverse biopolymers constitute the third major class of biomolecules, following DNA and proteins. However, the study of carbohydrate sugar chains has lagged behind compared to that of DNA and proteins, mainly due to their inherent structural complexity. However, their analysis is important because they serve various important roles in biological processes, including signaling transduction and cellular recognition. In order to glean some light into glycan function based on carbohydrate structure, kernel methods have been developed in the past, in particular to extract potential glycan biomarkers by classifying glycan structures found in different tissue samples. The recently developed weighted qgram method (LK-method) exhibits good performance on glycan structure classification while having limitations in feature selection. That is, it was unable to extract biologically meaningful features from the data. Therefore, we propose a biochemicallyweighted tree kernel (BioLK-method) which is based on a glycan similarity matrix and also incorporates biochemical information of individual q-grams in constructing the kernel matrix. We further applied our new method for the classification and recognition of motifs on publicly available glycan data. Our novel tree kernel (BioLK-method) using a Support Vector Machine (SVM) is capable of detecting biologically important motifs accurately while LK-method failed to do so. It was tested on three glycan data sets from the Consortium for Functional Glycomics (CFG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) GLYCAN and showed that the results are consistent with the literature. The newly developed BioLK-method also maintains comparable classification performance with the LK-method. Our results obtained here indicate that the incorporation of biochemical information of q-grams further shows the flexibility and capability of the novel kernel in feature extraction, which may aid in the prediction of glycan biomarkers

    Tree Echo State Networks

    Get PDF
    In this paper we present the Tree Echo State Network (TreeESN) model, generalizing the paradigm of Reservoir Computing to tree structured data. TreeESNs exploit an untrained generalized recursive reservoir, exhibiting extreme efficiency for learning in structured domains. In addition, we highlight through the paper other characteristics of the approach: First, we discuss the Markovian characterization of reservoir dynamics, extended to the case of tree domains, that is implied by the contractive setting of the TreeESN state transition function. Second, we study two types of state mapping functions to map the tree structured state of TreeESN into a fixed-size feature representation for classification or regression tasks. The critical role of the relation between the choice of the state mapping function and the Markovian characterization of the task is analyzed and experimentally investigated on both artificial and real-world tasks. Finally, experimental results on benchmark and real-world tasks show that the TreeESN approach, in spite of its efficiency, can achieve comparable results with state-of-the-art, although more complex, neural and kernel based models for tree structured data

    Chemistry-informed Macromolecule Graph Representation for Similarity Computation and Supervised Learning

    Full text link
    Macromolecules are large, complex molecules composed of covalently bonded monomer units, existing in different stereochemical configurations and topologies. As a result of such chemical diversity, representing, comparing, and learning over macromolecules emerge as critical challenges. To address this, we developed a macromolecule graph representation, with monomers and bonds as nodes and edges, respectively. We captured the inherent chemistry of the macromolecule by using molecular fingerprints for node and edge attributes. For the first time, we demonstrated computation of chemical similarity between 2 macromolecules of varying chemistry and topology, using exact graph edit distances and graph kernels. We also trained graph neural networks for a variety of glycan classification tasks, achieving state-of-the-art results. Our work has two-fold implications - it provides a general framework for representation, comparison, and learning of macromolecules; and enables quantitative chemistry-informed decision-making and iterative design in the macromolecular chemical space.Comment: Main text: 4 pages, 2 figures, 1 table; Appendix: 18 pages, 25 figures, 3 table

    Structure of the gut microbiome following colonization with human feces determines colonic tumor burden

    Full text link
    Abstract Background A growing body of evidence indicates that the gut microbiome plays a role in the development of colorectal cancer (CRC). Patients with CRC harbor gut microbiomes that are structurally distinct from those of healthy individuals; however, without the ability to track individuals during disease progression, it has not been possible to observe changes in the microbiome over the course of tumorigenesis. Mouse models have demonstrated that these changes can further promote colonic tumorigenesis. However, these models have relied upon mouse-adapted bacterial populations and so it remains unclear which human-adapted bacterial populations are responsible for modulating tumorigenesis. Results We transplanted fecal microbiota from three CRC patients and three healthy individuals into germ-free mice, resulting in six structurally distinct microbial communities. Subjecting these mice to a chemically induced model of CRC resulted in different levels of tumorigenesis between mice. Differences in the number of tumors were strongly associated with the baseline microbiome structure in mice, but not with the cancer status of the human donors. Partitioning of baseline communities into enterotypes by Dirichlet multinomial mixture modeling resulted in three enterotypes that corresponded with tumor burden. The taxa most strongly positively correlated with increased tumor burden were members of the Bacteroides, Parabacteroides, Alistipes, and Akkermansia, all of which are Gram-negative. Members of the Gram-positive Clostridiales, including multiple members of Clostridium Group XIVa, were strongly negatively correlated with tumors. Analysis of the inferred metagenome of each community revealed a negative correlation between tumor count and the potential for butyrate production, and a positive correlation between tumor count and the capacity for host glycan degradation. Despite harboring distinct gut communities, all mice underwent conserved structural changes over the course of the model. The extent of these changes was also correlated with tumor incidence. Conclusion Our results suggest that the initial structure of the microbiome determines susceptibility to colonic tumorigenesis. There appear to be opposing roles for certain Gram-negative (Bacteroidales and Verrucomicrobia) and Gram-positive (Clostridiales) bacteria in tumor susceptibility. Thus, the impact of community structure is potentially mediated by the balance between protective, butyrate-producing populations and inflammatory, mucin-degrading populations.http://deepblue.lib.umich.edu/bitstream/2027.42/109448/1/40168_2014_Article_48.pd

    Structure of the gut microbiome following colonization with human feces determines colonic tumor burden

    Full text link

    Immunoglobulin G N-glycan biomarkers for autoimmune diseases: Current state and a glycoinformatics perspective

    Get PDF
    The effective treatment of autoimmune disorders can greatly benefit from disease-specific biomarkers that are functionally involved in immune system regulation and can be collected through minimally invasive procedures. In this regard, human serum IgG N-glycans are promising for uncovering disease predisposition and monitoring progression, and for the identification of specific molecular targets for advanced therapies. In particular, the IgG N-glycome in diseased tissues is considered to be disease-dependent; thus, specific glycan structures may be involved in the pathophysiology of autoimmune diseases. This study provides a critical overview of the literature on human IgG N-glycomics, with a focus on the identification of disease-specific glycan alterations. In order to expedite the establishment of clinically-relevant N-glycan biomarkers, the employment of advanced computational tools for the interpretation of clinical data and their relationship with the underlying molecular mechanisms may be critical. Glycoinformatics tools, including artificial intelligence and systems glycobiology approaches, are reviewed for their potential to provide insight into patient stratification and disease etiology. Challenges in the integration of such glycoinformatics approaches in N-glycan biomarker research are critically discussed

    Computational Biology and Chemistry

    Get PDF
    The use of computers and software tools in biochemistry (biology) has led to a deep revolution in basic sciences and medicine. Bioinformatics and systems biology are the direct results of this revolution. With the involvement of computers, software tools, and internet services in scientific disciplines comprising biology and chemistry, new terms, technologies, and methodologies appeared and established. Bioinformatic software tools, versatile databases, and easy internet access resulted in the occurrence of computational biology and chemistry. Today, we have new types of surveys and laboratories including “in silico studies” and “dry labs” in which bioinformaticians conduct their investigations to gain invaluable outcomes. These features have led to 3-dimensioned illustrations of different molecules and complexes to get a better understanding of nature

    Genome sequence of the squalene-degrading bacterium Corynebacterium terpenotabidum type strain Y-11T (= DSM 44721T)

    Get PDF
    Rückert C, Albersmeier A, Al-Dilaimi A, et al. Genome sequence of the squalene-degrading bacterium Corynebacterium terpenotabidum type strain Y-11T (= DSM 44721T). Standards in Genomic Sciences. 2013;9(3):505-513.Corynebacterium terpenotabidum Takeuchi et. al 1999 is a member of the genus Corynebacterium, which contains Gram-positive and non-spore forming bacteria with a high G+C content. C. terpenotabidum was isolated from soil based on its ability to degrade squalene and belongs to the aerobic and non-hemolytic Corynebacteria. It displays tolerance to salts (up to 8%) and is related to Corynebacterium variabile involved in cheese ripening. As this is a type strain of Corynebacterium, this project describing the 2.75 Mbp long chromosome with its 2,369 protein-coding and 72 RNA genes will aid the Genomic Encyclopedia of Bacteria and Archaea project
    corecore