10 research outputs found

    The 3rd DBCLS BioHackathon: improving life science data integration with Semantic Web technologies.

    Get PDF
    BACKGROUND: BioHackathon 2010 was the third in a series of meetings hosted by the Database Center for Life Sciences (DBCLS) in Tokyo, Japan. The overall goal of the BioHackathon series is to improve the quality and accessibility of life science research data on the Web by bringing together representatives from public databases, analytical tool providers, and cyber-infrastructure researchers to jointly tackle important challenges in the area of in silico biological research. RESULTS: The theme of BioHackathon 2010 was the 'Semantic Web', and all attendees gathered with the shared goal of producing Semantic Web data from their respective resources, and/or consuming or interacting those data using their tools and interfaces. We discussed on topics including guidelines for designing semantic data and interoperability of resources. We consequently developed tools and clients for analysis and visualization. CONCLUSION: We provide a meeting report from BioHackathon 2010, in which we describe the discussions, decisions, and breakthroughs made as we moved towards compliance with Semantic Web technologies - from source provider, through middleware, to the end-consumer.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    Systems biology approaches to a rational drug discovery paradigm

    Full text link
    The published manuscript is available at EurekaSelect via http://www.eurekaselect.com/openurl/content.php?genre=article&doi=10.2174/1568026615666150826114524.Prathipati P., Mizuguchi K.. Systems biology approaches to a rational drug discovery paradigm. Current Topics in Medicinal Chemistry, 16, 9, 1009. https://doi.org/10.2174/1568026615666150826114524

    Conceptualization of Computational Modeling Approaches and Interpretation of the Role of Neuroimaging Indices in Pathomechanisms for Pre-Clinical Detection of Alzheimer Disease

    Get PDF
    With swift advancements in next-generation sequencing technologies alongside the voluminous growth of biological data, a diversity of various data resources such as databases and web services have been created to facilitate data management, accessibility, and analysis. However, the burden of interoperability between dynamically growing data resources is an increasingly rate-limiting step in biomedicine, specifically concerning neurodegeneration. Over the years, massive investments and technological advancements for dementia research have resulted in large proportions of unmined data. Accordingly, there is an essential need for intelligent as well as integrative approaches to mine available data and substantiate novel research outcomes. Semantic frameworks provide a unique possibility to integrate multiple heterogeneous, high-resolution data resources with semantic integrity using standardized ontologies and vocabularies for context- specific domains. In this current work, (i) the functionality of a semantically structured terminology for mining pathway relevant knowledge from the literature, called Pathway Terminology System, is demonstrated and (ii) a context-specific high granularity semantic framework for neurodegenerative diseases, known as NeuroRDF, is presented. Neurodegenerative disorders are especially complex as they are characterized by widespread manifestations and the potential for dramatic alterations in disease progression over time. Early detection and prediction strategies through clinical pointers can provide promising solutions for effective treatment of AD. In the current work, we have presented the importance of bridging the gap between clinical and molecular biomarkers to effectively contribute to dementia research. Moreover, we address the need for a formalized framework called NIFT to automatically mine relevant clinical knowledge from the literature for substantiating high-resolution cause-and-effect models

    Data integration strategies for informing computational design in synthetic biology

    Get PDF
    PhD ThesisThe potential design space for biological systems is complex, vast and multidimensional. Therefore, effective large-scale synthetic biology requires computational design and simulation. By constraining this design space, the time- and cost-efficient design of biological systems can be facilitated. One way in which a tractable design space can be achieved is to use the extensive and growing amount of biological data available to inform the design process. By using existing knowledge design efforts can be focused on biologically plausible areas of design space. However, biological data is large, incomplete, heterogeneous, and noisy. Data must be integrated in a systematic fashion in order to maximise its benefit. To date, data integration has not been widely applied to design in synthetic biology. The aim of this project is to apply data integration techniques to facilitate the efficient design of novel biological systems. The specific focus is on the development and application of integration techniques for the design of genetic regulatory networks in the model bacterium Bacillus subtilis. A dataset was constructed by integrating data from a range of sources in order to capture existing knowledge about B. subtilis 168. The dataset is represented as a computationally-accessible, semantically-rich network which includes information concerning biological entities and their relationships. Also included are sequence-based features mined from the B. subtilis genome, which are a useful source of parts for synthetic biology. In addition, information about the interactions of these parts has been captured, in order to facilitate the construction of circuits with desired behaviours. This dataset was also modelled in the form of an ontology, providing a formal specification of parts and their interactions. The ontology is a major step towards the unification of the data required for modelling with a range of part catalogues specifically designed for synthetic biology. The data from the ontology is available to existing reasoners for implicit knowledge extraction. The ontology was applied to the automated identification of promoters, operators and coding sequences. Information from the ontology was also used to generate dynamic models of parts. The work described here contributed to the development of a formalism called Standard Virtual Parts (SVPs), which aims to represent models of biological parts in a standardised manner. SVPs comprise a mapping between biological parts and modular computational models. A genetic circuit designed at a part-level abstraction can be investigated in detail by analysing a circuit model composed of SVPs. The ontology was used to construct SVPs in the form of standard Systems Biology Markup Language models. These models are publicly available from a computationally-accessible repository, and include metadata which facilitates the computational composition of SVPs in order to create models of larger biological systems. To test a genetic circuit in vitro or in vivo, the genetics elements necessary to encode the enitites in the in silico model, and their associated behaviour, must be derived. Ultimately, this process results in the specification for synthesisable DNA sequence. For large models, particularly those that are produced computationally, the transformation process is challenging. To automate this process, a model-to-sequence conversion algorithm was developed. The algorithm was implemented as a Java application called MoSeC. Using MoSeC, both CellML and SBML models built with SVPs can be converted into DNA sequences ready to synthesise. Selection of the host bacterial cell for a synthetic genetic circuit is very important. In order not to interfere with the existing cellular machinery, orthogonal parts from other species are used since these parts are less likely to have undesired interactions with the host. In order to find orthogonal transcription factors (OTFs), and their target binding sequences, a subset of the data from the integrated B. subtilis dataset was used. B. subtilis gene regulatory networks were used to re-construct regulatory networks in closely related Bacillus species. The system, called BacillusRegNet, stores both experimental data for B. subtilis and homology predictions in other species. BacillusRegNet was mined to extract OTFs and their binding sequences, in order to facilitate the engineering of novel regulatory networks in other Bacillus species. Although the techniques presented here were demonstrated using B. subtilis, they can be applied to any other organism. The approaches and tools developed as part of this project demonstrate the utility of this novel integrated approach to synthetic biology.EPSRC: NSF: The Newcastle University School of Computing Science

    KnetMiner - An integrated data platform for gene mining and biological knowledge discovery

    Get PDF
    Hassani-Pak K. KnetMiner - An integrated data platform for gene mining and biological knowledge discovery. Bielefeld: Universität Bielefeld; 2017.Discovery of novel genes that control important phenotypes and diseases is one of the key challenges in biological sciences. Now, in the post-genomics era, scientists have access to a vast range of genomes, genotypes, phenotypes and ‘omics data which - when used systematically - can help to gain new insights and make faster discoveries. However, the volume and diversity of such un-integrated data is often seen as a burden that only those with specialist bioinformatics skills, but often only minimal specialist biological knowledge, can penetrate. Therefore, new tools are required to allow researchers to connect, explore and compare large-scale datasets to identify the genes and pathways that control important phenotypes and diseases in plants, animals and humans. KnetMiner, with a silent "K" and standing for Knowledge Network Miner, is a suite of open-source software tools for integrating and visualising large biological datasets. The software mines the myriad databases that describe an organism’s biology to present links between relevant pieces of information, such as genes, biological pathways, phenotypes and publications with the aim to provide leads for scientists who are investigating the molecular basis for a particular trait. The KnetMiner approach is based on 1) integration of heterogeneous, complex and interconnected biological information into a knowledge graph; 2) text-mining to enrich the knowledge graph with novel relations extracted from literature; 3) graph queries of varying depths to find paths between genes and evidence nodes; 4) evidence-based gene rank algorithm that combines graph and information theory; 5) fast search and interactive knowledge visualisation techniques. Overall, [KnetMiner](http://knetminer.rothamsted.ac.uk) is a publicly available resource that helps scientists trawl diverse biological databases for clues to design better crop varieties and understand diseases. The key strength of KnetMiner is to include the end user into the “interactive” knowledge discovery process with the goal of supporting human intelligence with machine intelligence

    Data Infrastructure for Medical Research

    Get PDF
    While we are witnessing rapid growth in data across the sciences and in many applications, this growth is particularly remarkable in the medical domain, be it because of higher resolution instruments and diagnostic tools (e.g. MRI), new sources of structured data like activity trackers, the wide-spread use of electronic health records and many others. The sheer volume of the data is not, however, the only challenge to be faced when using medical data for research. Other crucial challenges include data heterogeneity, data quality, data privacy and so on. In this article, we review solutions addressing these challenges by discussing the current state of the art in the areas of data integration, data cleaning, data privacy, scalable data access and processing in the context of medical data. The techniques and tools we present will give practitioners — computer scientists and medical researchers alike — a starting point to understand the challenges and solutions and ultimately to analyse medical data and gain better and quicker insights

    Making Linked Data SPARQL with the InterMine Biological Data Warehouse

    No full text
    Abstract. InterMine is a system for integrating, analysing, and republishing biological data from multiple sources. It provides access to these data via a web user interface and programmatic web services. However, the precise invocation of services and subsequent exploration of returned data require substantial expertise on the structure of the underlying database. Here, we describe an approach that uses Semantic Web technologies to make InterMine data more broadly accessible and reusable, in accordance with the FAIR principles. We describe a pipeline to extract, transform, and load a Linked Data representation of the InterMine store. We use Docker to bring together SPARQL-aware applications to search, browse, explore, and query the InterMine-based data. Our work therefore extends interoperability of the InterMine platform, and supports new query functionality across InterMine installations and the network of open Linked Data

    Génomique et métagénomique comparatives des bactéries

    Get PDF
    Les domaines de la génomique et de la métagénomique ont apporté un support incommensurable à l'avancement de nos connaissances sur la génétique des bactéries. Les bactéries pathogènes sont maintenant séquencées et analysées pour identifier les facteurs causant leur virulence et/ou leur résistance aux antibiotiques ainsi que leur capacité à transmettre ces éléments génétiques qui sont d'un intérêt clinique. Les bactéries commensales, quant à elles, sont de plus en plus associées à la santé humaine et sont étudiées à l'aide de la métagénomique pour contrer les difficultés liées à leur culture étant donné leur grande diversité en matière de besoins métaboliques. Les nouvelles technologies de séquençages permettent donc de produire en masse ces séquences d'ADN à des fins de caractérisation et de comparaison dans le but d'élucider des questions souvent reliées à la santé humaine. Les avancées en génomique et en métagénomique requièrent des logiciels bio-informatiques capables de gérer et de s'adapter à la quantité massive et croissante des données biologiques. Les deux premières hypothèses de ce doctorat concernaient le développement de méthodes efficaces et flexibles pour l'analyse de génomes et de métagénomes bactériens. Plusieurs méthodes d'analyses bio-informatiques ont été explorées et ont mené à l'implémentation de deux logiciels pour supporter les hypothèses de recherche : Ray Surveyor et kAAmer. La première hypothèse de recherche consistait à vérifier s'il était possible d'obtenir une comparaison de génomes, depuis leur simple contenu en k-mers de séquences d'ADN, avec des résultats analogues aux comparaisons génomiques standards comme le pourcentage moyen d'identités ou les arbres phylogénétiques, mais sans nécessiter d'alignements de séquences. Nous avons démontré avec le logiciel Ray Surveyor et plusieurs analyses de génomique et de métagénomique bactérienne, qu'il était possible d'obtenir de tels comparaisons à l'aide de séquences d'ADN découpées en k-mer. Dans l'étude qui présenta les résultats de l'hypothèse de recherche, nous avons aussi estimé la propension génotypique de plusieurs espèces bactériennes à des phénotypes d'intérêt clinique à l'aide de bases de données de gènes spécialisées. La deuxième hypothèse était de tester s'il était possible de développer un logiciel pour l'identification de séquences protéiques, basé sur des k-mers d'acides aminés, qui serait plus performant que les logiciels existants, spécifiquement pour l'identification de protéines avec un haut degré d'homologie. Les travaux menèrent à l'implémentation de kAAmer, un logiciel permettant de créer des bases de données de protéines où la recherche de séquence se fait par association exacte de k-mers tout en supportant l'alignement de séquences. KAAmer s'est avéré très efficace pour la recherche de séquences de protéines avec des performances surpassant même, dans la majorité des scénarios, les aligneurs de séquences les plus rapides. D'autres fonctionnalités intéressantes sont aussi offertes par kAAmer, tel que la possibilité d'héberger une base de données en tant que service de manière permanente. Enfin, la troisième et dernière hypothèse de recherche visait à valider si les deux logiciels développés durant le projet de doctorat (Ray Surveyor et kAAmer) produiraient des résultats viables dans une analyse métagénomique du microbiote intestinal en lien avec l'obésité. Les profilages taxonomique et fonctionnel furent donc réalisés avec kAAmer et la comparaison de novo des métagénomes investiguée avec Ray Surveyor. Les résultats obtenus se sont avérés significatifs et ont démontrés, entre autres, une tendance vers une abondance relative plus élevée pour le phylum Bacteroidetes et moins élevée pour les phyla Firmicutes et Acinetobacteria chez les sujets obèses. Une multitude de fonctions métaboliques se sont aussi avérées significativement différentes dans les conditions normales et d'obésités des métagénomes, avec une mention particulière à celles reliées au métabolisme des acides gras à chaîne courte qui sont reconnues pour être associées à l'obésité.The fields of genomics and metagenomics have provided immeasurable support to the advancement of our knowledge of bacterial genetics. Pathogenic bacteria are now routinely sequenced and analyzed to identify the factors causing their virulence or antibiotic resistance as well as their ability to transmit genetic elements. Commensal bacteria are increasingly associated with human health and are being studied using metagenomics to counter the issues associated with their culture due to their wide range of metabolic needs. Next generation sequencing enabled us to mass-produce these DNA sequences for characterization and comparison purposes in order to elucidate questions related to human health. Improvement in genomics and metagenomics studies required bio-informatics software that are able to manage and adapt to an increasing availability of biological sequences data. The first two hypotheses of this thesis include the development of efficient and flexible methods for the analysis of bacterial genomes and metagenomes. Several bio-informatics analysis methods were explored and led to the implementation of two software to support the research hypotheses: Ray Surveyor and kAAmer. The first research hypothesis was to test the possibility of obtaining a comparison of genomes, from their simple DNA k-mers content, with results analogous to standard genomic comparisons such as average nucleotide identity or phylogenetic trees, but without the need for sequence alignments. Using Ray Surveyor software and several bacterial genomic and metagenomic analyses, we have demonstrated that it is possible to obtain such comparisons using k-mers from DNA sequences. In the study that presented the results of the research hypothesis, we also estimated the genotypic propensity of several bacterial species to clinically relevant phenotypes using specialized gene databases. The second hypothesis was to test the possibility of developing a software for protein sequence identification, based on amino acid k-mers, which would be more efficient than existing software, specifically for the identification of proteins with a high degree of homology. The work led to the implementation of kAAmer, a software solution that allows the creation of protein databases where the sequence search is done by exact match of k-mers, while supporting sequence alignment. KAAmer has proven to be very efficient for protein sequence search with performances surpassing even the fastest sequence aligners in most scenarios. Other interesting features are also offered by kAAmer, such as the possibility to host a database as a service on a permanent basis. Finally, the third and last research hypothesis aimed to test the capacity the two software developed during the PhD project (Ray Surveyor and kAAmer) to produce viable results in a metagenomic analysis of the gut microbiota in relation to obesity. Taxonomic and functional profiling was performed with kAAmer as the de novo comparison of metagenomes with Ray Surveyor. The results obtained were significant and showed, among others, a trend towards higher relative abundance of the Bacteroidetes phylum and lower relative abundance of the Firmicutes and Acinetobacteria phyla in obese subjects. Several metabolic functions were also found to be significantly different in the normal and obese conditions, with a particular mention to the metabolism of short-chain fatty acids (SCFA) that are known to be associated with obesity
    corecore