25,099 research outputs found

    Advances in protein ontology project

    Get PDF
    Advances in proteomics and protein expression techniques have lead to the elucidation of large amounts of protein data. Various data mining algorithms and mathematical models provide methods for analyzing this data; however, there are two issues that need to be addressed: (1) the need for standards for defining protein data description and exchange formats so they can be exchanged across the World Wide Web, and also read into data mining software in a consistent format and (2) eliminating errors which arise with the data integration methodologies for complex queries. Protein Ontology is designed to meet these needs by providing a structured protein data specification for Protein Data Representation. Protein Ontology is a standard for representing protein data in a way that helps in defining data integration and data mining models for Protein Structure and Function. In this paper we summarize the structure of Protein Ontology we developed earlier, its current applications to various protein families, and its future development

    Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?

    Get PDF
    The organization and mining of malaria genomic and post-genomic data is highly motivated by the necessity to predict and characterize new biological targets and new drugs. Biological targets are sought in a biological space designed from the genomic data from Plasmodium falciparum, but using also the millions of genomic data from other species. Drug candidates are sought in a chemical space containing the millions of small molecules stored in public and private chemolibraries. Data management should therefore be as reliable and versatile as possible. In this context, we examined five aspects of the organization and mining of malaria genomic and post-genomic data: 1) the comparison of protein sequences including compositionally atypical malaria sequences, 2) the high throughput reconstruction of molecular phylogenies, 3) the representation of biological processes particularly metabolic pathways, 4) the versatile methods to integrate genomic data, biological representations and functional profiling obtained from X-omic experiments after drug treatments and 5) the determination and prediction of protein structures and their molecular docking with drug candidate structures. Progresses toward a grid-enabled chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa

    ImmPort, toward repurposing of open access immunological assay data for translational and clinical research

    Get PDF
    Immunology researchers are beginning to explore the possibilities of reproducibility, reuse and secondary analyses of immunology data. Open-access datasets are being applied in the validation of the methods used in the original studies, leveraging studies for meta-analysis, or generating new hypotheses. To promote these goals, the ImmPort data repository was created for the broader research community to explore the wide spectrum of clinical and basic research data and associated findings. The ImmPort ecosystem consists of four components–Private Data, Shared Data, Data Analysis, and Resources—for data archiving, dissemination, analyses, and reuse. To date, more than 300 studies have been made freely available through the ImmPort Shared Data portal , which allows research data to be repurposed to accelerate the translation of new insights into discoveries

    Applicability of semi-supervised learning assumptions for gene ontology terms prediction

    Get PDF
    Gene Ontology (GO) is one of the most important resources in bioinformatics, aiming to provide a unified framework for the biological annotation of genes and proteins across all species. Predicting GO terms is an essential task for bioinformatics, but the number of available labelled proteins is in several cases insufficient for training reliable machine learning classifiers. Semi-supervised learning methods arise as a powerful solution that explodes the information contained in unlabelled data in order to improve the estimations of traditional supervised approaches. However, semi-supervised learning methods have to make strong assumptions about the nature of the training data and thus, the performance of the predictor is highly dependent on these assumptions. This paper presents an analysis of the applicability of semi-supervised learning assumptions over the specific task of GO terms prediction, focused on providing judgment elements that allow choosing the most suitable tools for specific GO terms. The results show that semi-supervised approaches significantly outperform the traditional supervised methods and that the highest performances are reached when applying the cluster assumption. Besides, it is experimentally demonstrated that cluster and manifold assumptions are complimentary to each other and an analysis of which GO terms can be more prone to be correctly predicted with each assumption, is provided.Postprint (published version

    The Blood Ontology: An ontology in the domain of hematology

    Get PDF
    Despite the importance of human blood to clinical practice and research, hematology and blood transfusion data remain scattered throughout a range of disparate sources. This lack of systematization concerning the use and definition of terms poses problems for physicians and biomedical professionals. We are introducing here the Blood Ontology, an ongoing initiative designed to serve as a controlled vocabulary for use in organizing information about blood. The paper describes the scope of the Blood Ontology, its stage of development and some of its anticipated uses

    Yeast Features: Identifying Significant Features Shared Among Yeast Proteins for Functional Genomics

    Get PDF
    Background
High throughput yeast functional genomics experiments are revealing associations among tens to hundreds of genes using numerous experimental conditions. To fully understand how the identified genes might be involved in the observed system, it is essential to consider the widest range of biological annotation possible. Biologists often start their search by collating the annotation provided for each protein within databases such as the Saccharomyces Genome Database, manually comparing them for similar features, and empirically assessing their significance. Such tasks can be automated, and more precise calculations of the significance can be determined using established probability measures. 
Results
We developed Yeast Features, an intuitive online tool to help establish the significance of finding a diverse set of shared features among a collection of yeast proteins. A total of 18,786 features from the Saccharomyces Genome Database are considered, including annotation based on the Gene Ontology’s molecular function, biological process and cellular compartment, as well as conserved domains, protein-protein and genetic interactions, complexes, metabolic pathways, phenotypes and publications. The significance of shared features is estimated using a hypergeometric probability, but novel options exist to improve the significance by adding background knowledge of the experimental system. For instance, increased statistical significance is achieved in gene deletion experiments because interactions with essential genes will never be observed. We further demonstrate the utility by suggesting the functional roles of the indirect targets of an aminoglycoside with a known mechanism of action, and also the targets of an herbal extract with a previously unknown mode of action. The identification of shared functional features may also be used to propose novel roles for proteins of unknown function, including a role in protein synthesis for YKL075C.
Conclusions
Yeast Features (YF) is an easy to use web-based application (http://software.dumontierlab.com/yeastfeatures/) which can identify and prioritize features that are shared among a set of yeast proteins. This approach is shown to be valuable in the analysis of complex data sets, in which the extracted associations revealed significant functional relationships among the gene products.
&#xa

    Genome-wide signatures of complex introgression and adaptive evolution in the big cats.

    Get PDF
    The great cats of the genus Panthera comprise a recent radiation whose evolutionary history is poorly understood. Their rapid diversification poses challenges to resolving their phylogeny while offering opportunities to investigate the historical dynamics of adaptive divergence. We report the sequence, de novo assembly, and annotation of the jaguar (Panthera onca) genome, a novel genome sequence for the leopard (Panthera pardus), and comparative analyses encompassing all living Panthera species. Demographic reconstructions indicated that all of these species have experienced variable episodes of population decline during the Pleistocene, ultimately leading to small effective sizes in present-day genomes. We observed pervasive genealogical discordance across Panthera genomes, caused by both incomplete lineage sorting and complex patterns of historical interspecific hybridization. We identified multiple signatures of species-specific positive selection, affecting genes involved in craniofacial and limb development, protein metabolism, hypoxia, reproduction, pigmentation, and sensory perception. There was remarkable concordance in pathways enriched in genomic segments implicated in interspecies introgression and in positive selection, suggesting that these processes were connected. We tested this hypothesis by developing exome capture probes targeting ~19,000 Panthera genes and applying them to 30 wild-caught jaguars. We found at least two genes (DOCK3 and COL4A5, both related to optic nerve development) bearing significant signatures of interspecies introgression and within-species positive selection. These findings indicate that post-speciation admixture has contributed genetic material that facilitated the adaptive evolution of big cat lineages
    • …
    corecore