303 research outputs found
A Linked Data Approach to Sharing Workflows and Workflow Results
A bioinformatics analysis pipeline is often highly elaborate, due to the inherent complexity of biological systems and the variety and size of datasets. A digital equivalent of the ‘Materials and Methods’ section in wet laboratory publications would be highly beneficial to bioinformatics, for evaluating evidence and examining data across related experiments, while introducing the potential to find associated resources and integrate them as data and services. We present initial steps towards preserving bioinformatics ‘materials and methods’ by exploiting the workflow paradigm for capturing the design of a data analysis pipeline, and RDF to link the workflow, its component services, run-time provenance, and a personalized biological interpretation of the results. An example shows the reproduction of the unique graph of an analysis procedure, its results, provenance, and personal interpretation of a text mining experiment. It links data from Taverna, myExperiment.org, BioCatalogue.org, and ConceptWiki.org. The approach is relatively ‘light-weight’ and unobtrusive to bioinformatics users
Evaluating the semantic web: a task-based approach
The increased availability of online knowledge has led to the design of several algorithms that solve a variety of tasks by harvesting the Semantic Web, i.e. by dynamically selecting and exploring a multitude of online ontologies. Our hypothesis is that the performance of such novel algorithms implicity provides an insight into the quality of the used ontologies and thus opens the way to a task-based evaluation of the Semantic Web. We have investigated this hypothesis by studying the lessons learnt about online ontologies when used to solve three tasks: ontology matching, folksonomy enrichment, and word sense disambiguation. Our analysis leads to a suit of conclusions about the status of the Semantic Web, which highlight a number of strengths and weaknesses of the semantic information available online and complement the findings of other analysis of the Semantic Web landscape
rEHR: An R package for manipulating and analysing Electronic Health Record data
Research with structured Electronic Health Records (EHRs) is expanding as data becomes more accessible; analytic methods advance; and the scientific validity of such studies is increasingly accepted. However, data science methodology to enable the rapid searching/extraction, cleaning and analysis of these large, often complex, datasets is less well developed. In addition, commonly used software is inadequate, resulting in bottlenecks in research workflows and in obstacles to increased transparency and reproducibility of the research. Preparing a research-ready dataset from EHRs is a complex and time consuming task requiring substantial data science skills, even for simple designs. In addition, certain aspects of the workflow are computationally intensive, for example extraction of longitudinal data and matching controls to a large cohort, which may take days or even weeks to run using standard software. The rEHR package simplifies and accelerates the process of extracting ready-for-analysis datasets from EHR databases. It has a simple import function to a database backend that greatly accelerates data access times. A set of generic query functions allow users to extract data efficiently without needing detailed knowledge of SQL queries. Longitudinal data extractions can also be made in a single command, making use of parallel processing. The package also contains functions for cutting data by time-varying covariates, matching controls to cases, unit conversion and construction of clinical code lists. There are also functions to synthesise dummy EHR. The package has been tested with one for the largest primary care EHRs, the Clinical Practice Research Datalink (CPRD), but allows for a common interface to other EHRs. This simplified and accelerated work flow for EHR data extraction results in simpler, cleaner scripts that are more easily debugged, shared and reproduced
A Digital Repository and Execution Platform for Interactive Scholarly Publications in Neuroscience
The CARMEN Virtual Laboratory (VL) is a cloud-based platform which allows neuroscientists to store, share, develop, execute, reproduce and publicise their work. This paper describes new functionality in the CARMEN VL: an interactive publications repository. This new facility allows users to link data and software to publications. This enables other users to examine data and software associated with the publication and execute the associated software within the VL using the same data as the authors used in the publication. The cloud-based architecture and SaaS (Software as a Service) framework allows vast data sets to be uploaded and analysed using software services. Thus, this new interactive publications facility allows others to build on research results through reuse. This aligns with recent developments by funding agencies, institutions, and publishers with a move to open access research. Open access provides reproducibility and verification of research resources and results. Publications and their associated data and software will be assured of long-term preservation and curation in the repository. Further, analysing research data and the evaluations described in publications frequently requires a number of execution stages many of which are iterative. The VL provides a scientific workflow environment to combine software services into a processing tree. These workflows can also be associated with publications and executed by users. The VL also provides a secure environment where users can decide the access rights for each resource to ensure copyright and privacy restrictions are met
Three Essential Ribonucleases—RNase Y, J1, and III—Control the Abundance of a Majority of Bacillus subtilis mRNAs
Bacillus subtilis possesses three essential enzymes thought to be involved in mRNA decay to varying degrees, namely RNase Y, RNase J1, and RNase III. Using recently developed high-resolution tiling arrays, we examined the effect of depletion of each of these enzymes on RNA abundance over the whole genome. The data are consistent with a model in which the degradation of a significant number of transcripts is dependent on endonucleolytic cleavage by RNase Y, followed by degradation of the downstream fragment by the 5′–3′ exoribonuclease RNase J1. However, many full-size transcripts also accumulate under conditions of RNase J1 insufficiency, compatible with a model whereby RNase J1 degrades transcripts either directly from the 5′ end or very close to it. Although the abundance of a large number of transcripts was altered by depletion of RNase III, this appears to result primarily from indirect transcriptional effects. Lastly, RNase depletion led to the stabilization of many low-abundance potential regulatory RNAs, both in intergenic regions and in the antisense orientation to known transcripts
Global Regulatory Functions of the Staphylococcus aureus Endoribonuclease III in Gene Expression
RNA turnover plays an important role in both virulence and adaptation to stress in the Gram-positive human pathogen Staphylococcus aureus. However, the molecular players and mechanisms involved in these processes are poorly understood. Here, we explored the functions of S. aureus endoribonuclease III (RNase III), a member of the ubiquitous family of double-strand-specific endoribonucleases. To define genomic transcripts that are bound and processed by RNase III, we performed deep sequencing on cDNA libraries generated from RNAs that were co-immunoprecipitated with wild-type RNase III or two different cleavage-defective mutant variants in vivo. Several newly identified RNase III targets were validated by independent experimental methods. We identified various classes of structured RNAs as RNase III substrates and demonstrated that this enzyme is involved in the maturation of rRNAs and tRNAs, regulates the turnover of mRNAs and non-coding RNAs, and autoregulates its synthesis by cleaving within the coding region of its own mRNA. Moreover, we identified a positive effect of RNase III on protein synthesis based on novel mechanisms. RNase III–mediated cleavage in the 5′ untranslated region (5′UTR) enhanced the stability and translation of cspA mRNA, which encodes the major cold-shock protein. Furthermore, RNase III cleaved overlapping 5′UTRs of divergently transcribed genes to generate leaderless mRNAs, which constitutes a novel way to co-regulate neighboring genes. In agreement with recent findings, low abundance antisense RNAs covering 44% of the annotated genes were captured by co-immunoprecipitation with RNase III mutant proteins. Thus, in addition to gene regulation, RNase III is associated with RNA quality control of pervasive transcription. Overall, this study illustrates the complexity of post-transcriptional regulation mediated by RNase III
- …