163,909 research outputs found

    Bioinformatics Databases: State of the Art and Research Perspectives

    Get PDF
    Bioinformatics or computational biology, i.e. the application of mathematical and computer science methods to solving problems in molecular biology that require large scale data, computation, and analysis, is a research area currently receiving a considerable attention. Databases play an essential role in molecular biology and consequently in bioinformatics. molecular biology data are often relatively cheap to produce, leading to a proliferation of databases: the number of bioinformatics databases accessible worldwide probably lies between 500 and 1.000. Not only molecular biology data, but also molecular biology literature and literature references are stored in databases. Bioinformatics databases are often very large (e.g. the sequence database GenBank contains more than 4 × 10 6 nucleotide sequences) and in general grows rapidly (e.g. about 8000 abstracts are added every month to the literature database PubMed). Bioinformatics databases are heterogeneous in their data, in their data modeling paradigms, in their management systems, and in the data analysis tools they supports. Furthermore, bioinformatics databases are often implemented, queried, updated, and managed using methods rarely applied for other databases. This presentation aims at introducing in current bioinformatics databases, stressing their aspects departing from conventional databases. A more detailed survey can be found in [1] upon which thi

    Bioinformatics: Basics, Development, and Future

    Get PDF
    Bioinformatics is an interdisciplinary scientific field of life sciences. Bioinformatics research and application include the analysis of molecular sequence and genomics data; genome annotation, gene/protein prediction, and expression profiling; molecular folding, modeling, and design; building biological networks; development of databases and data management systems; development of software and analysis tools; bioinformatics services and workflow; mining of biomedical literature and text; and bioinformatics education and training. Astronomical accumulation of genomics, proteomics, and metabolomics data as well as a need for their storage, analysis, annotation, organization, systematization, and integration into biological networks and database systems were the main driving forces for the emergence and development of bioinformatics. Current critical needs for bioinformatics among others highlighted in this chapter, however, are to understand basics and specifics of bioinformatics as well as to prepare new generation scientists and specialists with integrated, interdisciplinary, and multilingual knowledge who can use modern bioinformatics resources powered with sophisticated operating systems, software, and database/networking technologies. In this introductory chapter, I aim to give an overall picture on basics and developments of the bioinformatics field for readers with some future perspectives, highlighting chapters published in this book

    Predictive and Personalized Medicine with Systems Biology Solutions

    Get PDF
    poster abstractSystems biology refers to the use of systems engineering and systems science techniques to the understanding of biological systems. At Indiana Center for Systems Biology and Personalized Medicine (ICSBPM), we are particularly interested in developing systems biology techniques that can help shorten the gaps between basic biomedical research and clinical applications of genome sciences toward predictive and personalized medicine. In the past several years, ICSBPM has developed many critical informatics resources for the systems biology and personalized medicine community. The database and software tools that we developed have promoted systems biology and personalized medicine research communities at the national scale. These tools include: HPD, an integrated human pathway database and analysis tool (Chowbina et al., in BMC Bioinformatics 2009, 10(S11): S5); HAPPI, a human annotated and predicted protein interaction database (Chen et al., in BMC Genomics 2009, 10(S1):S16); HIP2, a Database of Healthy Human Individual's Integrated Plasma Proteome (Saha et al., in BMC Medical Genomics 2008, 1(1):12); PEPPI, a Peptidomic Database of Protein Isoforms (Zhou et al., in BMC bioinformatics 2010, 11(S6), S7); ProteoLens, a multi-scale network visualization and data mining tool (Huan et al., in BMC bioinformatics 2008, 9(S9):S5); GeneTerrain, a visual exploration tool for network-organized expression panel biomarker development (You et al., in Information Visualization 2010, 9(1)), and C-Maps, comprehensive molecular connectivity maps between disease-specific proteins and drugs (Li et al., in PLoS Computational Biology, 5(7), e1000450). These tools has been demonstrated to help improve tumor classifications, understand cancer biological systems at the systems scale, tackle biomarker discovery challenges, and facilitate clinical adoption of predictive models developed from computational techniques. We hope that our experience and resources can cement collaborative translational medicine research towards predictive and personalized medicine applications

    ChlamyCyc - a comprehensive database and web-portal centered on _Chlamydomonas reinhardtii_

    Get PDF
    *Background* - The unicellular green alga _Chlamydomonas reinhardtii_ is an important eukaryotic model organism for the study of photosynthesis and growth, as well as flagella development and other cellular processes. In the era of high-throughput technologies there is an imperative need to integrate large-scale data sets from high-throughput experimental techniques using computational methods and database resources to provide comprehensive information about the whole cellular system of a single organism.
*Results* - In the framework of the German Systems Biology initiative GoFORSYS a pathway/genome database and web-portal for _Chlamydomonas reinhardtii_ (ChlamyCyc) was established, which currently features about 270 metabolic pathways with related genes, enzymes, and compound information. ChlamyCyc was assembled using an integrative approach combining the recently published genome sequence, bioinformatics methods, and experimental data from metabolomics and proteomics experiments. We analyzed and integrated a combination of primary and secondary database resources, such as existing genome annotations from JGI, EST collections, orthology information, and MapMan classification.
*Conclusion* - Chlamycyc provides a curated and integrated systems biology repository that will enable and assist in systematic studies of fundamental cellular processes in _Chlamydomonas reinhardtii_. The ChlamyCyc database and web-portal is freely available under http://chlamycyc.mpimp-golm.mpg.de

    BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments

    Get PDF
    Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing (HPC) techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems (SWfMS) and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process

    Novel developments in SBGN-ED and applications

    Get PDF
    Systems Biology Graphical Notation (SBGN, http://sbgn.org) [1] is an emerging standard for graphical representations of biochemical and cellular processes studied in systems biology. Three different views (Process Description, Entity Relationship, and Activity Flow) cover several aspects of the represented processes in different levels of detail. SBGN helps to communicate biological knowledge more efficient and accurate between different research communities in the life sciences. However, to support SBGN, methods and tools for editing, validating, and translating of SBGN maps are necessary.
We present methods for these tasks and novel developments in SBGN-ED (www.sbgn-ed.org) [2], a tool which allows to create all three types of SBGN maps from scratch, to validate these maps for syntactical and semantical correctness, to translate maps from the KEGG database into SBGN, and to export SBGN maps into several file and image formats. SBGN-ED is based on VANTED (Visualization and Analysis of NeTworks containing Experimental Data, http://www.vanted.org) [3].
As applications of SBGN and SBGN-ED we present furthermore MetaCrop (http://metacrop.ipk-gatersleben.de) [4], a database that summarizes diverse information about metabolic pathways in crop plants, and RIMAS (Regulatory Interaction Maps of Arabidopsis Seed Development, http://rimas.ipk-gatersleben.de) [5], an information portal that provides a comprehensive overview of regulatory pathways and genetic interactions during Arabidopsis embryo and seed development. 

[1] Le Novère, N. et al. (2009) The Systems Biology Graphical Notation. Nature Biotechnology, 27, 735-741.
[2] Czauderna, T., Klukas, C., Schreiber, F. (2010) Editing, validating, and translating of SBGN maps. Bioinformatics, 26 (18), 2340-2341.
[3] Junker, B.H., Klukas, C., Schreiber, F. (2006) VANTED: A system for advanced data analysis and visualization in the context of biological networks. BMC Bioinformatics, 7, 109+.
[4] Grafahrend-Belau, E., Weise, S., Koschützki, D., Scholz, U., Junker, B.H., Schreiber, F. (2008) MetaCrop - A detailed database of crop plant metabolism. Nucleic Acids Research, 36, D954-D958.
[5] Junker, A., Hartmann, A., Schreiber, F., Bäumlein, H. (2010) An engineer's view on regulation of seed development. Trends in Plant Science, 15(6), 303-307.
&#xa

    Towards Novel Nonparametric Statistical Methods and Bioinformatics Tools for Clinical and Translational Sciences

    Get PDF
    As the field of functional genetics and genomics is beginning to mature, we become confronted with new challenges. The constant drop in price for sequencing and gene expression profiling as well as the increasing number of genetic and genomic variables that can be measured makes it feasible to address more complex questions. The success with rare diseases caused by single loci or genes has provided us with a proof-of-concept that new therapies can be developed based on functional genomics and genetics. Common diseases, however, typically involve genetic epistasis, genomic pathways, and proteomic pattern. Moreover, to better understand the underlying biologi-cal systems, we often need to integrate information from several of these sources. Thus, as the field of clinical research moves toward complex diseases, the demand for modern data base systems and advanced statistical methods increases. The traditional statistical methods implemented in most of the bioinformatics tools currently used in the novel field of genetics and functional genomics are based on the linear model and, thus, have shortcomings when applied to nonlinear biological systems. The previous work on partially ordered data (Wittkowski 1988; 1992), when combined with theoretical results (Hoeffding 1948) and computational strategies (Deuchler 1914) has opened a new field of nonparametric statistics. With grid technology, new tools are now feasible when screening for interactions between genetics (Wittkowski, Liu 2002) and functional genomics (Wittkowski, Lee 2004). Having more complex study designs and more specific methods available increases the demand for decision support when selecting appropriate bioinformatics tools. With the advent of rapid prototyping systems for Web based database application, we have recently begun to complement previous work on knowledge based systems with graphical Web-based tools for acquisition of DESIGN and MODEL knowledge.Biostatistics Bioinformatics NIH NCRR ROADMAP

    SYSTOMONAS — an integrated database for systems biology analysis of Pseudomonas

    Get PDF
    To provide an integrated bioinformatics platform for a systems biology approach to the biology of pseudomonads in infection and biotechnology the database SYSTOMONAS (SYSTems biology of pseudOMONAS) was established. Besides our own experimental metabolome, proteome and transcriptome data, various additional predictions of cellular processes, such as gene-regulatory networks were stored. Reconstruction of metabolic networks in SYSTOMONAS was achieved via comparative genomics. Broad data integration is realized using SOAP interfaces for the well established databases BRENDA, KEGG and PRODORIC. Several tools for the analysis of stored data and for the visualization of the corresponding results are provided, enabling a quick understanding of metabolic pathways, genomic arrangements or promoter structures of interest. The focus of SYSTOMONAS is on pseudomonads and in particular Pseudomonas aeruginosa, an opportunistic human pathogen. With this database we would like to encourage the Pseudomonas community to elucidate cellular processes of interest using an integrated systems biology strategy. The database is accessible at

    Enhancing GO for the sake of clinical bioinformatics

    Get PDF
    Recent work on the quality assurance of the Gene Ontology (GO, Gene Ontology Consortium 2004) from the perspective of both linguistic and ontological organization has made it clear that GO lacks the kind of formalism needed to support logic-based reasoning. At the same time it is no less clear that GO has proven itself to be an excellent terminological resource that can serve to combine together a variety of biomedical database and information systems. Given the strengths of GO, it is worth investigating whether, by overcoming some of its weaknesses from the point of view of formal-ontological principles, we might not be able to enhance a version of GO which can come even closer to serving the needs of the various communities of biomedical researchers and practitioners. It is accepted that clinical and bioinformatics need to find common ground if the results of data-intensive biomedical research are to be harvested to the full. It is also widely accepted that no single method will be sufficient to create the needed common framework. We believe that the principles-based approach to life-science data integration and knowledge representation must be one of the methods applied. Indeed in dealing with the ontological representation of carcinomas, and specifically of colon carcinomas, we have established that, had GO (and related biomedical ontologies) followed some of the basic formal-ontological principles we have identified (Smith et al. 2004, Ceusters et al. 2004), then the effort required to navigate successfully between clinical and bioinformatics systems would have been reduced. We point here to the sources of ontologically-related errors in GO, and also provide arguments as to why and how such errors need to be resolved
    • …
    corecore