5,220 research outputs found

    CATHEDRAL: A Fast and Effective Algorithm to Predict Folds and Domain Boundaries from Multidomain Protein Structures

    Get PDF
    We present CATHEDRAL, an iterative protocol for determining the location of previously observed protein folds in novel multidomain protein structures. CATHEDRAL builds on the features of a fast secondary-structureā€“based method (using graph theory) to locate known folds within a multidomain context and a residue-based, double-dynamic programming algorithm, which is used to align members of the target fold groups against the query protein structure to identify the closest relative and assign domain boundaries. To increase the fidelity of the assignments, a support vector machine is used to provide an optimal scoring scheme. Once a domain is verified, it is excised, and the search protocol is repeated in an iterative fashion until all recognisable domains have been identified. We have performed an initial benchmark of CATHEDRAL against other publicly available structure comparison methods using a consensus dataset of domains derived from the CATH and SCOP domain classifications. CATHEDRAL shows superior performance in fold recognition and alignment accuracy when compared with many equivalent methods. If a novel multidomain structure contains a known fold, CATHEDRAL will locate it in 90% of cases, with <1% false positives. For nearly 80% of assigned domains in a manually validated test set, the boundaries were correctly delineated within a tolerance of ten residues. For the remaining cases, previously classified domains were very remotely related to the query chain so that embellishments to the core of the fold caused significant differences in domain sizes and manual refinement of the boundaries was necessary. To put this performance in context, a well-established sequence method based on hidden Markov models was only able to detect 65% of domains, with 33% of the subsequent boundaries assigned within ten residues. Since, on average, 50% of newly determined protein structures contain more than one domain unit, and typically 90% or more of these domains are already classified in CATH, CATHEDRAL will considerably facilitate the automation of protein structure classification

    Tv-RIO1 ā€“ an atypical protein kinase from the parasitic nematode Trichostrongylus vitrinus

    Get PDF
    Background: Protein kinases are key enzymes that regulate a wide range of cellular processes, including cell-cycle progression, transcription, DNA replication and metabolic functions. These enzymes catalyse the transfer of phosphates to serine, threonine and tyrosine residues, thus playing functional roles in reversible protein phosphorylation. There are two main groups, namely eukaryotic protein kinases (ePKs) and atypical protein kinases (aPKs); RIO kinases belong to the latter group. While there is some information about RIO kinases and their roles in animals, nothing is known about them in parasites. This is the first study to characterise a RIO1 kinase from any parasite. Results: A full-length cDNA (Tv-rio-1) encoding a RIO1 protein kinase (Tv-RIO1) was isolated from the economically important parasitic nematode Trichostrongylus vitrinus (Order Strongylida). The uninterrupted open reading frame (ORF) of 1476 nucleotides encoded a protein of 491 amino acids, containing the characteristic RIO1 motif LVHADLSEYNTL. Tv-rio-1 was transcribed at the highest level in the third-stage larva (L3), and a higher level in adult females than in males. Comparison with homologues from other organisms showed that protein Tv-RIO1 had significant homology to related proteins from a range of metazoans and plants. Amino acid sequence identity was most pronounced in the ATP-binding motif, active site and metal binding loop. Phylogenetic analyses of selected amino acid sequence data revealed Tv-RIO1 to be most closely related to the proteins in the species of Caenorhabditis. A structural model of Tv-RIO1 was constructed and compared with the published crystal structure of RIO1 of Archaeoglobus fulgidus (Af-Rio1). Conclusion: This study provides the first insights into the RIO1 protein kinases of nematodes, and a foundation for further investigations into the biochemical and functional roles of this molecule in biological processes in parasitic nematodes

    Extraction of Transcript Diversity from Scientific Literature

    Get PDF
    Transcript diversity generated by alternative splicing and associated mechanisms contributes heavily to the functional complexity of biological systems. The numerous examples of the mechanisms and functional implications of these events are scattered throughout the scientific literature. Thus, it is crucial to have a tool that can automatically extract the relevant facts and collect them in a knowledge base that can aid the interpretation of data from high-throughput methods. We have developed and applied a composite text-mining method for extracting information on transcript diversity from the entire MEDLINE database in order to create a database of genes with alternative transcripts. It contains information on tissue specificity, number of isoforms, causative mechanisms, functional implications, and experimental methods used for detection. We have mined this resource to identify 959 instances of tissue-specific splicing. Our results in combination with those from EST-based methods suggest that alternative splicing is the preferred mechanism for generating transcript diversity in the nervous system. We provide new annotations for 1,860 genes with the potential for generating transcript diversity. We assign the MeSH term ā€œalternative splicingā€ to 1,536 additional abstracts in the MEDLINE database and suggest new MeSH terms for other events. We have successfully extracted information about transcript diversity and semiautomatically generated a database, LSAT, that can provide a quantitative understanding of the mechanisms behind tissue-specific gene expression. LSAT (Literature Support for Alternative Transcripts) is publicly available at http://www.bork.embl.de/LSAT/

    Overview of BioCreative II gene normalization

    Get PDF
    Background: The goal of the gene normalization task is to link genes or gene products mentioned in the literature to biological databases. This is a key step in an accurate search of the biological literature. It is a challenging task, even for the human expert; genes are often described rather than referred to by gene symbol and, confusingly, one gene name may refer to different genes (often from different organisms). For BioCreative II, the task was to list the Entrez Gene identifiers for human genes or gene products mentioned in PubMed/MEDLINE abstracts. We selected abstracts associated with articles previously curated for human genes. We provided 281 expert-annotated abstracts containing 684 gene identifiers for training, and a blind test set of 262 documents containing 785 identifiers, with a gold standard created by expert annotators. Inter-annotator agreement was measured at over 90%. Results: Twenty groups submitted one to three runs each, for a total of 54 runs. Three systems achieved F-measures (balanced precision and recall) between 0.80 and 0.81. Combining the system outputs using simple voting schemes and classifiers obtained improved results; the best composite system achieved an F-measure of 0.92 with 10-fold cross-validation. A 'maximum recall' system based on the pooled responses of all participants gave a recall of 0.97 (with precision 0.23), identifying 763 out of 785 identifiers. Conclusion: Major advances for the BioCreative II gene normalization task include broader participation (20 versus 8 teams) and a pooled system performance comparable to human experts, at over 90% agreement. These results show promise as tools to link the literature with biological databases

    The Foundational Model of Anatomy Ontology

    Get PDF
    Anatomy is the structure of biological organisms. The term also denotes the scientific discipline devoted to the study of anatomical entities and the structural and developmental relations that obtain among these entities during the lifespan of an organism. Anatomical entities are the independent continuants of biomedical reality on which physiological and disease processes depend, and which, in response to etiological agents, can transform themselves into pathological entities. For these reasons, hard copy and in silico information resources in virtually all fields of biology and medicine, as a rule, make extensive reference to anatomical entities. Because of the lack of a generalizable, computable representation of anatomy, developers of computable terminologies and ontologies in clinical medicine and biomedical research represented anatomy from their own more or less divergent viewpoints. The resulting heterogeneity presents a formidable impediment to correlating human anatomy not only across computational resources but also with the anatomy of model organisms used in biomedical experimentation. The Foundational Model of Anatomy (FMA) is being developed to fill the need for a generalizable anatomy ontology, which can be used and adapted by any computer-based application that requires anatomical information. Moreover it is evolving into a standard reference for divergent views of anatomy and a template for representing the anatomy of animals. A distinction is made between the FMA ontology as a theory of anatomy and the implementation of this theory as the FMA artifact. In either sense of the term, the FMA is a spatial-structural ontology of the entities and relations which together form the phenotypic structure of the human organism at all biologically salient levels of granularity. Making use of explicit ontological principles and sound methods, it is designed to be understandable by human beings and navigable by computers. The FMAā€™s ontological structure provides for machine-based inference, enabling powerful computational tools of the future to reason with biomedical data
    • ā€¦
    corecore