203 research outputs found

    Visualising biological data: a semantic approach to tool and database integration

    Get PDF
    <p>Abstract</p> <p>Motivation</p> <p>In the biological sciences, the need to analyse vast amounts of information has become commonplace. Such large-scale analyses often involve drawing together data from a variety of different databases, held remotely on the internet or locally on in-house servers. Supporting these tasks are <it>ad hoc </it>collections of data-manipulation tools, scripting languages and visualisation software, which are often combined in arcane ways to create cumbersome systems that have been customised for a particular purpose, and are consequently not readily adaptable to other uses. For many day-to-day bioinformatics tasks, the sizes of current databases, and the scale of the analyses necessary, now demand increasing levels of automation; nevertheless, the unique experience and intuition of human researchers is still required to interpret the end results in any meaningful biological way. Putting humans in the loop requires tools to support real-time interaction with these vast and complex data-sets. Numerous tools do exist for this purpose, but many do not have optimal interfaces, most are effectively isolated from other tools and databases owing to incompatible data formats, and many have limited real-time performance when applied to realistically large data-sets: much of the user's cognitive capacity is therefore focused on controlling the software and manipulating esoteric file formats rather than on performing the research.</p> <p>Methods</p> <p>To confront these issues, harnessing expertise in human-computer interaction (HCI), high-performance rendering and distributed systems, and guided by bioinformaticians and end-user biologists, we are building reusable software components that, together, create a toolkit that is both architecturally sound from a computing point of view, and addresses both user and developer requirements. Key to the system's usability is its direct exploitation of semantics, which, crucially, gives individual components knowledge of their own functionality and allows them to interoperate seamlessly, removing many of the existing barriers and bottlenecks from standard bioinformatics tasks.</p> <p>Results</p> <p>The toolkit, named Utopia, is freely available from <url>http://utopia.cs.man.ac.uk/</url>.</p

    Effectively incorporating selected multimedia content into medical publications

    Get PDF
    Until fairly recently, medical publications have been handicapped by being restricted to non-electronic formats, effectively preventing the dissemination of complex audiovisual and three-dimensional data. However, authors and readers could significantly profit from advances in electronic publishing that permit the inclusion of multimedia content directly into an article. For the first time, the de facto gold standard for scientific publishing, the portable document format (PDF), is used here as a platform to embed a video and an audio sequence of patient data into a publication. Fully interactive three-dimensional models of a face and a schematic representation of a human brain are also part of this publication. We discuss the potential of this approach and its impact on the communication of scientific medical data, particularly with regard to electronic and open access publications. Finally, we emphasise how medical teaching can benefit from this new tool and comment on the future of medical publishing

    Designing a course model for distance-based online bioinformatics training in Africa: the H3ABioNet experience

    Get PDF
    Africa is not unique in its need for basic bioinformatics training for individuals from a diverse range of academic backgrounds. However, particular logistical challenges in Africa, most notably access to bioinformatics expertise and internet stability, must be addressed in order to meet this need on the continent. H3ABioNet (www.h3abionet.org), the Pan African Bioinformatics Network for H3Africa, has therefore developed an innovative, free-of-charge "Introduction to Bioinformatics" course, taking these challenges into account as part of its educational efforts to provide on-site training and develop local expertise inside its network. A multiple-delivery±mode learning model was selected for this 3-month course in order to increase access to (mostly) African, expert bioinformatics trainers. The content of the course was developed to include a range of fundamental bioinformatics topics at the introductory level. For the first iteration of the course (2016), classrooms with a total of 364 enrolled participants were hosted at 20 institutions across 10 African countries. To ensure that classroom success did not depend on stable internet, trainers pre-recorded their lectures, and classrooms downloaded and watched these locally during biweekly contact sessions. The trainers were available via video conferencing to take questions during contact sessions, as well as via online "question and discussion" forums outside of contact session time. This learning model, developed for a resource-limited setting, could easily be adapted to other settings.IS

    The Origin of GPCRs: Identification of Mammalian like Rhodopsin, Adhesion, Glutamate and Frizzled GPCRs in Fungi

    Get PDF
    G protein-coupled receptors (GPCRs) in humans are classified into the five main families named Glutamate, Rhodopsin, Adhesion, Frizzled and Secretin according to the GRAFS classification. Previous results show that these mammalian GRAFS families are well represented in the Metazoan lineages, but they have not been shown to be present in Fungi. Here, we systematically mined 79 fungal genomes and provide the first evidence that four of the five main mammalian families of GPCRs, namely Rhodopsin, Adhesion, Glutamate and Frizzled, are present in Fungi and found 142 novel sequences between them. Significantly, we provide strong evidence that the Rhodopsin family emerged from the cAMP receptor family in an event close to the split of Opisthokonts and not in Placozoa, as earlier assumed. The Rhodopsin family then expanded greatly in Metazoans while the cAMP receptor family is found in 3 invertebrate species and lost in the vertebrates. We estimate that the Adhesion and Frizzled families evolved before the split of Unikonts from a common ancestor of all major eukaryotic lineages. Also, the study highlights that the fungal Adhesion receptors do not have N-terminal domains whereas the fungal Glutamate receptors have a broad repertoire of mammalian-like N-terminal domains. Further, mining of the close unicellular relatives of the Metazoan lineage, Salpingoeca rosetta and Capsaspora owczarzaki, obtained a rich group of both the Adhesion and Glutamate families, which in particular provided insight to the early emergence of the N-terminal domains of the Adhesion family. We identified 619 Fungi specific GPCRs across 79 genomes and revealed that Blastocladiomycota and Chytridiomycota phylum have Metazoan-like GPCRs rather than the GPCRs specific for Fungi. Overall, this study provides the first evidence of the presence of four of the five main GRAFS families in Fungi and clarifies the early evolutionary history of the GPCR superfamily

    Fast index based algorithms and software for matching position specific scoring matrices

    Get PDF
    BACKGROUND: In biological sequence analysis, position specific scoring matrices (PSSMs) are widely used to represent sequence motifs in nucleotide as well as amino acid sequences. Searching with PSSMs in complete genomes or large sequence databases is a common, but computationally expensive task. RESULTS: We present a new non-heuristic algorithm, called ESAsearch, to efficiently find matches of PSSMs in large databases. Our approach preprocesses the search space, e.g., a complete genome or a set of protein sequences, and builds an enhanced suffix array that is stored on file. This allows the searching of a database with a PSSM in sublinear expected time. Since ESAsearch benefits from small alphabets, we present a variant operating on sequences recoded according to a reduced alphabet. We also address the problem of non-comparable PSSM-scores by developing a method which allows the efficient computation of a matrix similarity threshold for a PSSM, given an E-value or a p-value. Our method is based on dynamic programming and, in contrast to other methods, it employs lazy evaluation of the dynamic programming matrix. We evaluated algorithm ESAsearch with nucleotide PSSMs and with amino acid PSSMs. Compared to the best previous methods, ESAsearch shows speedups of a factor between 17 and 275 for nucleotide PSSMs, and speedups up to factor 1.8 for amino acid PSSMs. Comparisons with the most widely used programs even show speedups by a factor of at least 3.8. Alphabet reduction yields an additional speedup factor of 2 on amino acid sequences compared to results achieved with the 20 symbol standard alphabet. The lazy evaluation method is also much faster than previous methods, with speedups of a factor between 3 and 330. CONCLUSION: Our analysis of ESAsearch reveals sublinear runtime in the expected case, and linear runtime in the worst case for sequences not shorter than | [Formula: see text] |(m )+ m - 1, where m is the length of the PSSM and [Formula: see text] a finite alphabet. In practice, ESAsearch shows superior performance over the most widely used programs, especially for DNA sequences. The new algorithm for accurate on-the-fly calculations of thresholds has the potential to replace formerly used approximation approaches. Beyond the algorithmic contributions, we provide a robust, well documented, and easy to use software package, implementing the ideas and algorithms presented in this manuscript

    Investigation of G72 (DAOA) expression in the human brain

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Polymorphisms at the G72/G30 locus on chromosome 13q have been associated with schizophrenia or bipolar disorder in more than ten independent studies. Even though the genetic findings are very robust, the physiological role of the predicted G72 protein has thus far not been resolved. Initial reports suggested G72 as an activator of D-amino acid oxidase (DAO), supporting the glutamate dysfunction hypothesis of schizophrenia. However, these findings have subsequently not been reproduced and reports of endogenous human G72 mRNA and protein expression are extremely limited. In order to better understand the function of this putative schizophrenia susceptibility gene, we attempted to demonstrate G72 mRNA and protein expression in relevant human brain regions.</p> <p>Methods</p> <p>The expression of G72 mRNA was studied by northern blotting and semi-quantitative SYBR-Green and Taqman RT-PCR. Protein expression in human tissue lysates was investigated by western blotting using two custom-made specific anti-G72 peptide antibodies. An in-depth <it>in silico </it>analysis of the G72/G30 locus was performed in order to try and identify motifs or regulatory elements that provide insight to G72 mRNA expression and transcript stability.</p> <p>Results</p> <p>Despite using highly sensitive techniques, we failed to identify significant levels of G72 mRNA in a variety of human tissues (e.g. adult brain, amygdala, caudate nucleus, fetal brain, spinal cord and testis) human cell lines or schizophrenia/control post mortem BA10 samples. Furthermore, using western blotting in combination with sensitive detection methods, we were also unable to detect G72 protein in a number of human brain regions (including cerebellum and amygdala), spinal cord or testis. A detailed <it>in silico </it>analysis provides several lines of evidence that support the apparent low or absent expression of G72.</p> <p>Conclusion</p> <p>Our results suggest that native G72 protein is not normally present in the tissues that we analysed in this study. We also conclude that the lack of demonstrable G72 expression in relevant brain regions does not support a role for G72 in modulation of DAO activity and the pathology of schizophrenia via a DAO-mediated mechanism. <it>In silico </it>analysis suggests that G72 is not robustly expressed and that the transcript is potentially labile. Further studies are required to understand the significance of the G72/30 locus to schizophrenia.</p

    PhenoFam-gene set enrichment analysis through protein structural information

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With the current technological advances in high-throughput biology, the necessity to develop tools that help to analyse the massive amount of data being generated is evident. A powerful method of inspecting large-scale data sets is gene set enrichment analysis (GSEA) and investigation of protein structural features can guide determining the function of individual genes. However, a convenient tool that combines these two features to aid in high-throughput data analysis has not been developed yet. In order to fill this niche, we developed the user-friendly, web-based application, PhenoFam.</p> <p>Results</p> <p>PhenoFam performs gene set enrichment analysis by employing structural and functional information on families of protein domains as annotation terms. Our tool is designed to analyse complete sets of results from quantitative high-throughput studies (gene expression microarrays, functional RNAi screens, <it>etc</it>.) without prior pre-filtering or hits-selection steps. PhenoFam utilizes Ensembl databases to link a list of user-provided identifiers with protein features from the InterPro database, and assesses whether results associated with individual domains differ significantly from the overall population. To demonstrate the utility of PhenoFam we analysed a genome-wide RNA interference screen and discovered a novel function of plexins containing the cytoplasmic RasGAP domain. Furthermore, a PhenoFam analysis of breast cancer gene expression profiles revealed a link between breast carcinoma and altered expression of PX domain containing proteins.</p> <p>Conclusions</p> <p>PhenoFam provides a user-friendly, easily accessible web interface to perform GSEA based on high-throughput data sets and structural-functional protein information, and therefore aids in functional annotation of genes.</p

    Quantitative sequence-function relationships in proteins based on gene ontology

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The relationship between divergence of amino-acid sequence and divergence of function among homologous proteins is complex. The assumption that homologs share function – the basis of transfer of annotations in databases – must therefore be regarded with caution. Here, we present a quantitative study of sequence and function divergence, based on the Gene Ontology classification of function. We determined the relationship between sequence divergence and function divergence in 6828 protein families from the PFAM database. Within families there is a broad range of sequence similarity from very closely related proteins – for instance, orthologs in different mammals – to very distantly-related proteins at the limit of reliable recognition of homology.</p> <p>Results</p> <p>We correlated the divergence in sequences determined from pairwise alignments, and the divergence in function determined by path lengths in the Gene Ontology graph, taking into account the fact that many proteins have multiple functions. Our results show that, among homologous proteins, the proportion of divergent functions decreases dramatically above a threshold of sequence similarity at about 50% residue identity. For proteins with more than 50% residue identity, transfer of annotation between homologs will lead to an erroneous attribution with a totally dissimilar function in fewer than 6% of cases. This means that for very similar proteins (about 50 % identical residues) the chance of completely incorrect annotation is low; however, because of the phenomenon of recruitment, it is still non-zero.</p> <p>Conclusion</p> <p>Our results describe general features of the evolution of protein function, and serve as a guide to the reliability of annotation transfer, based on the closeness of the relationship between a new protein and its nearest annotated relative.</p
    • …
    corecore