70 research outputs found
Extracting glycan motifs using a biochemicallyweighted kernel
Carbohydrates, or glycans, are one of the most abundant and structurally diverse biopolymers constitute the third major class of
biomolecules, following DNA and proteins. However, the study of carbohydrate sugar chains has lagged behind compared to that
of DNA and proteins, mainly due to their inherent structural complexity. However, their analysis is important because they serve
various important roles in biological processes, including signaling transduction and cellular recognition. In order to glean some
light into glycan function based on carbohydrate structure, kernel methods have been developed in the past, in particular to extract
potential glycan biomarkers by classifying glycan structures found in different tissue samples. The recently developed weighted qgram
method (LK-method) exhibits good performance on glycan structure classification while having limitations in feature
selection. That is, it was unable to extract biologically meaningful features from the data. Therefore, we propose a biochemicallyweighted
tree kernel (BioLK-method) which is based on a glycan similarity matrix and also incorporates biochemical information
of individual q-grams in constructing the kernel matrix. We further applied our new method for the classification and recognition
of motifs on publicly available glycan data. Our novel tree kernel (BioLK-method) using a Support Vector Machine (SVM) is
capable of detecting biologically important motifs accurately while LK-method failed to do so. It was tested on three glycan data
sets from the Consortium for Functional Glycomics (CFG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) GLYCAN and
showed that the results are consistent with the literature. The newly developed BioLK-method also maintains comparable
classification performance with the LK-method. Our results obtained here indicate that the incorporation of biochemical
information of q-grams further shows the flexibility and capability of the novel kernel in feature extraction, which may aid in the
prediction of glycan biomarkers
UniCarbKB: building a knowledge platform for glycoproteomics
The UniCarb KnowledgeBase (UniCarbKB; http://unicarbkb.org) offers public access to a growing, curated database of information on the glycan structures of glycoproteins. UniCarbKB is an international effort that aims to further our understanding of structures, pathways and networks involved in glycosylation and glyco-mediated processes by integrating structural, experimental and functional glycoscience information. This initiative builds upon the success of the glycan structure database GlycoSuiteDB, together with the informatic standards introduced by EUROCarbDB, to provide a high-quality and updated resource to support glycomics and glycoproteomics research. UniCarbKB provides comprehensive information concerning glycan structures, and published glycoprotein information including global and site-specific attachment information. For the first release over 890 references, 3740 glycan structure entries and 400 glycoproteins have been curated. Further, 598 protein glycosylation sites have been annotated with experimentally confirmed glycan structures from the literature. Among these are 35 glycoproteins, 502 structures and 60 publications previously not included in GlycoSuiteDB. This article provides an update on the transformation of GlycoSuiteDB (featured in previous NAR Database issues and hosted by ExPASy since 2009) to UniCarbKB and its integration with UniProtKB and GlycoMod. Here, we introduce a refactored database, supported by substantial new curated data collections and intuitive user-interfaces that improve database searchin
A weighted q-gram method for glycan structure classification
<p>Abstract</p> <p>Background</p> <p>Glycobiology pertains to the study of carbohydrate sugar chains, or glycans, in a particular cell or organism. Many computational approaches have been proposed for analyzing these complex glycan structures, which are chains of monosaccharides. The monosaccharides are linked to one another by glycosidic bonds, which can take on a variety of comformations, thus forming branches and resulting in complex tree structures. The <it>q</it>-gram method is one of these recent methods used to understand glycan function based on the classification of their tree structures. This <it>q</it>-gram method assumes that for a certain <it>q</it>, different <it>q</it>-grams share no similarity among themselves. That is, that if two structures have completely different components, then they are completely different. However, from a biological standpoint, this is not the case. In this paper, we propose a weighted <it>q</it>-gram method to measure the similarity among glycans by incorporating the similarity of the geometric structures, monosaccharides and glycosidic bonds among <it>q</it>-grams. In contrast to the traditional <it>q</it>-gram method, our weighted <it>q</it>-gram method admits similarity among <it>q</it>-grams for a certain <it>q</it>. Thus our new kernels for glycan structure were developed and then applied in SVMs to classify glycans.</p> <p>Results</p> <p>Two glycan datasets were used to compare the weighted <it>q</it>-gram method and the original <it>q</it>-gram method. The results show that the incorporation of <it>q</it>-gram similarity improves the classification performance for all of the important glycan classes tested.</p> <p>Conclusion</p> <p>The results in this paper indicate that similarity among <it>q</it>-grams obtained from geometric structure, monosaccharides and glycosidic linkage contributes to the glycan function classification. This is a big step towards the understanding of glycan function based on their complex structures.</p
A new software tool for carbohydrate microarray data storage, processing, presentation, and reporting
Publisher Copyright: © 2022 The Author(s) 2022. Published by Oxford University Press. This project is supported by Wellcome Trust Biomedical Resource grants (WT099197/Z/12/Z, 108430/Z/15/Z and 218304/Z/19/Z); March of Dimes European Prematurity Research Centre grant 22-FY18-82 and NIH Commons Fund 1U01GM125267-01Glycan microarrays are essential tools in glycobiology and are being widely used for assignment of glycan ligands in diverse glycan recognition systems. We have developed a new software, called Carbohydrate microArray Analysis and Reporting Tool (CarbArrayART), to address the need for a distributable application for glycan microarray data management. The main features of CarbArrayART include: (i) Storage of quantified array data from different array layouts with scan data and array-specific metadata, such as lists of arrayed glycans, array geometry, information on glycan-binding samples, and experimental protocols. (ii) Presentation of microarray data as charts, tables, and heatmaps derived from the average fluorescence intensity values that are calculated based on the imaging scan data and array geometry, as well as filtering and sorting functions according to monosaccharide content and glycan sequences. (iii) Data export for reporting in Word, PDF, and Excel formats, together with metadata that are compliant with the guidelines of MIRAGE (Minimum Information Required for A Glycomics Experiment). CarbArrayART is designed for routine use in recording, storage, and management of any slide-based glycan microarray experiment. In conjunction with the MIRAGE guidelines, CarbArrayART addresses issues that are critical for glycobiology, namely, clarity of data for evaluation of reproducibility and validity.publishersversionpublishe
Identification of Genes Required for Neural-Specific Glycosylation Using Functional Genomics
Glycosylation plays crucial regulatory roles in various biological processes such as development, immunity, and neural functions. For example, α1,3-fucosylation, the addition of a fucose moiety abundant in Drosophila neural cells, is essential for neural development, function, and behavior. However, it remains largely unknown how neural-specific α1,3-fucosylation is regulated. In the present study, we searched for genes involved in the glycosylation of a neural-specific protein using a Drosophila RNAi library. We obtained 109 genes affecting glycosylation that clustered into nine functional groups. Among them, members of the RNA regulation group were enriched by a secondary screen that identified genes specifically regulating α1,3-fucosylation. Further analyses revealed that an RNA–binding protein, second mitotic wave missing (Swm), upregulates expression of the neural-specific glycosyltransferase FucTA and facilitates its mRNA export from the nucleus. This first large-scale genetic screen for glycosylation-related genes has revealed novel regulation of fucTA mRNA in neural cells
Functional evaluation of novel variants of B4GALNT1 in a patient with hereditary spastic paraplegia and the general population
Hereditary spastic paraplegia (HSP) is a heterogeneous group of neurological disorders that are characterized by progressive spasticity and weakness in the lower limbs. SPG26 is a complicated form of HSP, which includes not only weakness in the lower limbs, but also cognitive impairment, developmental delay, cerebellar ataxia, dysarthria, and peripheral neuropathy, and is caused by biallelic mutations in the B4GALNT1 (beta-1,4-N-acetylgalactosaminyltransferase 1) gene. The B4GALNT1 gene encodes ganglioside GM2/GD2 synthase (GM2S), which catalyzes the transfer of N-acetylgalactosamine to lactosylceramide, GM3, and GD3 to generate GA2, GM2, and GD2, respectively. The present study attempted to characterize a novel B4GALNT1 variant (NM_001478.5:c.937G>A p.Asp313Asn) detected in a patient with progressive multi-system neurodegeneration as well as deleterious variants found in the general population in Japan. Peripheral blood T cells from our patient lacked the ability for activation-induced ganglioside expression assessed by cell surface cholera toxin binding. Structural predictions suggested that the amino acid substitution, p.Asp313Asn, impaired binding to the donor substrate UDP-GalNAc. An in vitro enzyme assay demonstrated that the variant protein did not exhibit GM2S activity, leading to the diagnosis of HSP26. This is the first case diagnosed with SPG26 in Japan. We then extracted 10 novel missense variants of B4GALNT1 from the whole-genome reference panel jMorp (8.3KJPN) of the Tohoku medical megabank organization, which were predicted to be deleterious by Polyphen-2 and SIFT programs. We performed a functional evaluation of these variants and demonstrated that many showed perturbed subcellular localization. Five of these variants exhibited no or significantly decreased GM2S activity with less than 10% activity of the wild-type protein, indicating that they are carrier variants for HSP26. These results provide the basis for molecular analyses of B4GALNT1 variants present in the Japanese population and will help improve the molecular diagnosis of patients suspected of having HSP
The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications
<p>Abstract</p> <p>Background</p> <p>The interaction between biological researchers and the bioinformatics tools they use is still hampered by incomplete interoperability between such tools. To ensure interoperability initiatives are effectively deployed, end-user applications need to be aware of, and support, best practices and standards. Here, we report on an initiative in which software developers and genome biologists came together to explore and raise awareness of these issues: BioHackathon 2009.</p> <p>Results</p> <p>Developers in attendance came from diverse backgrounds, with experts in Web services, workflow tools, text mining and visualization. Genome biologists provided expertise and exemplar data from the domains of sequence and pathway analysis and glyco-informatics. One goal of the meeting was to evaluate the ability to address real world use cases in these domains using the tools that the developers represented. This resulted in i) a workflow to annotate 100,000 sequences from an invertebrate species; ii) an integrated system for analysis of the transcription factor binding sites (TFBSs) enriched based on differential gene expression data obtained from a microarray experiment; iii) a workflow to enumerate putative physical protein interactions among enzymes in a metabolic pathway using protein structure data; iv) a workflow to analyze glyco-gene-related diseases by searching for human homologs of glyco-genes in other species, such as fruit flies, and retrieving their phenotype-annotated SNPs.</p> <p>Conclusions</p> <p>Beyond deriving prototype solutions for each use-case, a second major purpose of the BioHackathon was to highlight areas of insufficiency. We discuss the issues raised by our exploration of the problem/solution space, concluding that there are still problems with the way Web services are modeled and annotated, including: i) the absence of several useful data or analysis functions in the Web service "space"; ii) the lack of documentation of methods; iii) lack of compliance with the SOAP/WSDL specification among and between various programming-language libraries; and iv) incompatibility between various bioinformatics data formats. Although it was still difficult to solve real world problems posed to the developers by the biological researchers in attendance because of these problems, we note the promise of addressing these issues within a semantic framework.</p
- …