1,128 research outputs found
Work flows in life science
The introduction of computer science technology in the life science domain has resulted in a new life science discipline called bioinformatics. Bioinformaticians are biologists who know how to apply computer science technology to perform computer based experiments, also known as in-silico or dry lab experiments. Various tools, such as databases, web applications and scripting languages, are used to design and run in-silico experiments. As the size and complexity of these experiments grow, new types of tools are required to design and execute the experiments and to analyse the results. Workflow systems promise to fulfill this role. The bioinformatician composes an experiment by using tools and web services as building blocks, and connecting them, often through a graphical user interface. Workflow systems, such as Taverna, provide access to up to a few thousand resources in a uniform way. Although workflow systems are intended to make the bioinformaticians' work easier, bioinformaticians experience difficulties in using them. This thesis is devoted to find out which problems bioinformaticians experience using workflow systems and to provide solutions for these problems.\u
Biases in the Experimental Annotations of Protein Function and their Effect on Our Understanding of Protein Function Space
The ongoing functional annotation of proteins relies upon the work of
curators to capture experimental findings from scientific literature and apply
them to protein sequence and structure data. However, with the increasing use
of high-throughput experimental assays, a small number of experimental studies
dominate the functional protein annotations collected in databases. Here we
investigate just how prevalent is the "few articles -- many proteins"
phenomenon. We examine the experimentally validated annotation of proteins
provided by several groups in the GO Consortium, and show that the distribution
of proteins per published study is exponential, with 0.14% of articles
providing the source of annotations for 25% of the proteins in the UniProt-GOA
compilation. Since each of the dominant articles describes the use of an assay
that can find only one function or a small group of functions, this leads to
substantial biases in what we know about the function of many proteins.
Mass-spectrometry, microscopy and RNAi experiments dominate high throughput
experiments. Consequently, the functional information derived from these
experiments is mostly of the subcellular location of proteins, and of the
participation of proteins in embryonic developmental pathways. For some
organisms, the information provided by different studies overlap by a large
amount. We also show that the information provided by high throughput
experiments is less specific than those provided by low throughput experiments.
Given the experimental techniques available, certain biases in protein function
annotation due to high-throughput experiments are unavoidable. Knowing that
these biases exist and understanding their characteristics and extent is
important for database curators, developers of function annotation programs,
and anyone who uses protein function annotation data to plan experiments.Comment: Accepted to PLoS Computational Biology. Press embargo applies. v4:
text corrected for style and supplementary material inserte
Making open data work for plant scientists
Despite the clear demand for open data sharing, its implementation within plant science is still limited. This is, at least in part, because open data-sharing raises several unanswered questions and challenges to current research practices. In this commentary, some of the challenges encountered by plant researchers at the bench when generating, interpreting, and attempting to disseminate their data have been highlighted. The difficulties involved in sharing sequencing, transcriptomics, proteomics, and metabolomics data are reviewed. The benefits and drawbacks of three data-sharing venues currently available to plant scientists are identified and assessed: (i) journal publication; (ii) university repositories; and (iii) community and project-specific databases. It is concluded that community and project-specific databases are the most useful to researchers interested in effective data sharing, since these databases are explicitly created to meet the researchers’ needs, support extensive curation, and embody a heightened awareness of what it takes to make data reuseable by others. Such bottom-up and community-driven approaches need to be valued by the research community, supported by publishers, and provided with long-term sustainable support by funding bodies and government. At the same time, these databases need to be linked to generic databases where possible, in order to be discoverable to the majority of researchers and thus promote effective and efficient data sharing. As we look forward to a future that embraces open access to data and publications, it is essential that data policies, data curation, data integration, data infrastructure, and data funding are linked together so as to foster data access and research productivity
Smart Environments for Collaborative Design, Implementation, and Interpretation of Scientific Experiments
Ambient intelligence promises to enable humans to smoothly interact with their environment, mediated by computer technology. In the literature on ambient intelligence, empirical scientists are not often mentioned. Yet they form an interesting target group for this technology. In this position paper, we describe a project aimed at realising an ambient intelligence environment for face-to-face meetings of researchers with different academic backgrounds involved in molecular biology “omics” experiments. In particular, microarray experiments are a focus of attention because these experiments require multidisciplinary collaboration for their design, analysis, and interpretation. Such an environment is characterised by a high degree of complexity that has to be mitigated by ambient intelligence technology. By experimenting in a real-life setting, we will learn more about life scientists as a user group
Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences
Increased reliance on computational approaches in the life sciences has revealed grave concerns about how accessible and reproducible computation-reliant results truly are. Galaxy http://usegalaxy.org, an open web-based platform for genomic research, addresses these problems. Galaxy automatically tracks and manages data provenance and provides support for capturing the context and intent of computational methods. Galaxy Pages are interactive, web-based documents that provide users with a medium to communicate a complete computational analysis
Protein Bioinformatics Infrastructure for the Integration and Analysis of Multiple High-Throughput “omics” Data
High-throughput “omics” technologies bring new opportunities for biological and biomedical researchers to ask complex questions and gain new scientific insights. However, the voluminous, complex, and context-dependent data being maintained in heterogeneous and distributed environments plus the lack of well-defined data standard and standardized nomenclature imposes a major challenge which requires advanced computational methods and bioinformatics infrastructures for integration, mining, visualization, and comparative analysis to facilitate data-driven hypothesis generation and biological knowledge discovery. In this paper, we present the challenges in high-throughput “omics” data integration and analysis, introduce a protein-centric approach for systems integration of large and heterogeneous high-throughput “omics” data including microarray, mass spectrometry, protein sequence, protein structure, and protein interaction data, and use scientific case study to illustrate how one can use varied “omics” data from different laboratories to make useful connections that could lead to new biological knowledge
Automatic annotation of bioinformatics workflows with biomedical ontologies
Legacy scientific workflows, and the services within them, often present
scarce and unstructured (i.e. textual) descriptions. This makes it difficult to
find, share and reuse them, thus dramatically reducing their value to the
community. This paper presents an approach to annotating workflows and their
subcomponents with ontology terms, in an attempt to describe these artifacts in
a structured way. Despite a dearth of even textual descriptions, we
automatically annotated 530 myExperiment bioinformatics-related workflows,
including more than 2600 workflow-associated services, with relevant
ontological terms. Quantitative evaluation of the Information Content of these
terms suggests that, in cases where annotation was possible at all, the
annotation quality was comparable to manually curated bioinformatics resources.Comment: 6th International Symposium on Leveraging Applications (ISoLA 2014
conference), 15 pages, 4 figure
- …