74 research outputs found

    The Minimum Information Required for a Glycomics Experiment (MIRAGE) project: improving the standards for reporting glycan microarray-based data

    Get PDF
    MIRAGE (Minimum Information Required for A Glycomics Experiment) is an initiative that was created by experts in the fields of glycobiology, glycoanalytics, and glycoinformatics to produce guidelines for reporting results from the diverse types of experiments and analyses used in structural and functional studies of glycans in the scientific literature. As a sequel to the guidelines for sample preparation (Struwe et al. 2016, Glycobiology, 26, 907-910) and mass spectrometry (MS) data (Kolarich et al. 2013, Mol. Cell Proteomics. 12, 991-995), here we present the first version of guidelines intended to improve the standards for reporting data from glycan microarray analyses. For each of eight areas in the workflow of a glycan microarray experiment, we provide guidelines for the minimal information that should be provided in reporting results. We hope that the MIRAGE glycan microarray guidelines proposed here will gain broad acceptance by the community, and will facilitate interpretation and reproducibility of the glycan microarray results with implications in comparison of data from different laboratories and eventual deposition of glycan microarray data in international databases

    Cross-Platform Microarray Data Normalisation for Regulatory Network Inference

    Get PDF
    Background Inferring Gene Regulatory Networks (GRNs) from time course microarray data suffers from the dimensionality problem created by the short length of available time series compared to the large number of genes in the network. To overcome this, data integration from diverse sources is mandatory. Microarray data from different sources and platforms are publicly available, but integration is not straightforward, due to platform and experimental differences. Methods We analyse here different normalisation approaches for microarray data integration, in the context of reverse engineering of GRN quantitative models. We introduce two preprocessing approaches based on existing normalisation techniques and provide a comprehensive comparison of normalised datasets. Conclusions Results identify a method based on a combination of Loess normalisation and iterative K-means as best for time series normalisation for this problem

    Human Gene Coexpression Landscape: Confident Network Derived from Tissue Transcriptomic Profiles

    Get PDF
    This is an open-access article distributed under the terms of the Creative Commons Attribution License.[Background]: Analysis of gene expression data using genome-wide microarrays is a technique often used in genomic studies to find coexpression patterns and locate groups of co-transcribed genes. However, most studies done at global >omic> scale are not focused on human samples and when they correspond to human very often include heterogeneous datasets, mixing normal with disease-altered samples. Moreover, the technical noise present in genome-wide expression microarrays is another well reported problem that many times is not addressed with robust statistical methods, and the estimation of errors in the data is not provided. [Methodology/Principal Findings]: Human genome-wide expression data from a controlled set of normal-healthy tissues is used to build a confident human gene coexpression network avoiding both pathological and technical noise. To achieve this we describe a new method that combines several statistical and computational strategies: robust normalization and expression signal calculation; correlation coefficients obtained by parametric and non-parametric methods; random cross-validations; and estimation of the statistical accuracy and coverage of the data. All these methods provide a series of coexpression datasets where the level of error is measured and can be tuned. To define the errors, the rates of true positives are calculated by assignment to biological pathways. The results provide a confident human gene coexpression network that includes 3327 gene-nodes and 15841 coexpression-links and a comparative analysis shows good improvement over previously published datasets. Further functional analysis of a subset core network, validated by two independent methods, shows coherent biological modules that share common transcription factors. The network reveals a map of coexpression clusters organized in well defined functional constellations. Two major regions in this network correspond to genes involved in nuclear and mitochondrial metabolism and investigations on their functional assignment indicate that more than 60% are house-keeping and essential genes. The network displays new non-described gene associations and it allows the placement in a functional context of some unknown non-assigned genes based on their interactions with known gene families. [Conclusions/Significance]: The identification of stable and reliable human gene to gene coexpression networks is essential to unravel the interactions and functional correlations between human genes at an omic scale. This work contributes to this aim, and we are making available for the scientific community the validated human gene coexpression networks obtained, to allow further analyses on the network or on some specific gene associations. The data are available free online at http://bioinfow.dep.usal.es/coexpression/. © 2008 Prieto et al.Funding and grant support was provided by the Ministery of Health, Spanish Government (ISCiii-FIS, MSyC; Project reference PI061153) and by the Ministery of Education, Castilla-Leon Local Government (JCyL; Project reference CSI03A06).Peer Reviewe

    A Global Characterization and Identification of Multifunctional Enzymes

    Get PDF
    Multi-functional enzymes are enzymes that perform multiple physiological functions. Characterization and identification of multi-functional enzymes are critical for communication and cooperation between different functions and pathways within a complex cellular system or between cells. In present study, we collected literature-reported 6,799 multi-functional enzymes and systematically characterized them in structural, functional, and evolutionary aspects. It was found that four physiochemical properties, that is, charge, polarizability, hydrophobicity, and solvent accessibility, are important for characterization of multi-functional enzymes. Accordingly, a combinational model of support vector machine and random forest model was constructed, based on which 6,956 potential novel multi-functional enzymes were successfully identified from the ENZYME database. Moreover, it was observed that multi-functional enzymes are non-evenly distributed in species, and that Bacteria have relatively more multi-functional enzymes than Archaebacteria and Eukaryota. Comparative analysis indicated that the multi-functional enzymes experienced a fluctuation of gene gain and loss during the evolution from S. cerevisiae to H. sapiens. Further pathway analyses indicated that a majority of multi-functional enzymes were well preserved in catalyzing several essential cellular processes, for example, metabolisms of carbohydrates, nucleotides, and amino acids. What’s more, a database of known multi-functional enzymes and a server for novel multi-functional enzyme prediction were also constructed for free access at http://bioinf.xmu.edu.cn/databases/MFEs/index.htm

    Comparison of evolutionary algorithms in gene regulatory network model inference

    Get PDF
    Background: The evolution of high throughput technologies that measure gene expression levels has created a data base for inferring GRNs (a process also known as reverse engineering of GRNs). However, the nature of these data has made this process very di±cult. At the moment, several methods of discovering qualitative causal relationships between genes with high accuracy from microarray data exist, but large scale quantitative analysis on real biological datasets cannot be performed, to date, as existing approaches are not suitable for real microarray data which are noisy and insu±cient. Results: This paper performs an analysis of several existing evolutionary algorithms for quantitative gene regulatory network modelling. The aim is to present the techniques used and o®er a comprehensive comparison of approaches, under a common framework. Algorithms are applied to both synthetic and real gene expression data from DNA microarrays, and ability to reproduce biological behaviour, scalability and robustness to noise are assessed and compared. Conclusions: Presented is a comparison framework for assessment of evolutionary algorithms, used to infer gene regulatory networks. Promising methods are identi¯ed and a platform for development of appropriate model formalisms is established

    TargetMine, an Integrated Data Warehouse for Candidate Gene Prioritisation and Target Discovery

    Get PDF
    Prioritising candidate genes for further experimental characterisation is a non-trivial challenge in drug discovery and biomedical research in general. An integrated approach that combines results from multiple data types is best suited for optimal target selection. We developed TargetMine, a data warehouse for efficient target prioritisation. TargetMine utilises the InterMine framework, with new data models such as protein-DNA interactions integrated in a novel way. It enables complicated searches that are difficult to perform with existing tools and it also offers integration of custom annotations and in-house experimental data. We proposed an objective protocol for target prioritisation using TargetMine and set up a benchmarking procedure to evaluate its performance. The results show that the protocol can identify known disease-associated genes with high precision and coverage. A demonstration version of TargetMine is available at http://targetmine.nibio.go.jp/

    Bioinformatics and molecular modeling in glycobiology

    Get PDF
    The field of glycobiology is concerned with the study of the structure, properties, and biological functions of the family of biomolecules called carbohydrates. Bioinformatics for glycobiology is a particularly challenging field, because carbohydrates exhibit a high structural diversity and their chains are often branched. Significant improvements in experimental analytical methods over recent years have led to a tremendous increase in the amount of carbohydrate structure data generated. Consequently, the availability of databases and tools to store, retrieve and analyze these data in an efficient way is of fundamental importance to progress in glycobiology. In this review, the various graphical representations and sequence formats of carbohydrates are introduced, and an overview of newly developed databases, the latest developments in sequence alignment and data mining, and tools to support experimental glycan analysis are presented. Finally, the field of structural glycoinformatics and molecular modeling of carbohydrates, glycoproteins, and protein–carbohydrate interaction are reviewed

    Comparative Genomics of Gardnerella vaginalis Strains Reveals Substantial Differences in Metabolic and Virulence Potential

    Get PDF
    Gardnerella vaginalis is described as a common vaginal bacterial species whose presence correlates strongly with bacterial vaginosis (BV). Here we report the genome sequencing and comparative analyses of three strains of G. vaginalis. Strains 317 (ATCC 14019) and 594 (ATCC 14018) were isolated from the vaginal tracts of women with symptomatic BV, while Strain 409-05 was isolated from a healthy, asymptomatic individual with a Nugent score of 9.Substantial genomic rearrangement and heterogeneity were observed that appeared to have resulted from both mobile elements and substantial lateral gene transfer. These genomic differences translated to differences in metabolic potential. All strains are equipped with significant virulence potential, including genes encoding the previously described vaginolysin, pili for cytoadhesion, EPS biosynthetic genes for biofilm formation, and antimicrobial resistance systems, We also observed systems promoting multi-drug and lantibiotic extrusion. All G. vaginalis strains possess a large number of genes that may enhance their ability to compete with and exclude other vaginal colonists. These include up to six toxin-antitoxin systems and up to nine additional antitoxins lacking cognate toxins, several of which are clustered within each genome. All strains encode bacteriocidal toxins, including two lysozyme-like toxins produced uniquely by strain 409-05. Interestingly, the BV isolates encode numerous proteins not found in strain 409-05 that likely increase their pathogenic potential. These include enzymes enabling mucin degradation, a trait previously described to strongly correlate with BV, although commonly attributed to non-G. vaginalis species.Collectively, our results indicate that all three strains are able to thrive in vaginal environments, and therein the BV isolates are capable of occupying a niche that is unique from 409-05. Each strain has significant virulence potential, although genomic and metabolic differences, such as the ability to degrade mucin, indicate that the detection of G. vaginalis in the vaginal tract provides only partial information on the physiological potential of the organism
    corecore