5 research outputs found

    Ontology-Based Queries over Cancer Data

    Get PDF
    The ever-increasing amount of data in biomedical research, and in cancer research in particular, needs to be managed to support efficient data access, exchange and integration. Existing software infrastructures, such as caGrid, support access to distributed information annotated with a domain ontology. However, caGrid's current querying functionality depends on the structure of individual data resources without exploiting the semantic annotations. In this paper, we present the design and development of an ontology-based querying functionality that consists of: the generation of OWL2 ontologies from the underlying data resources’ metadata and a query rewriting and translation process based on reasoning, which converts a query at the domain ontology level into queries at the software infrastructure level. We present a detailed analysis of our approach as well as an extensive performance evaluation. While the implementation and evaluation was performed for the caGrid infrastructure, the approach could be applicable to other model and metadata-driven environments for data sharing

    The emergence of openness in open source projects : the case of openEHR

    Get PDF
    The meaning of openness in open source is both intrinsically unstable and dynamic, and tends to fluctuate with time and context. We draw on a very particular open-source project primarily concerned with building rigorous clinical concepts to be used in electronic health records called openEHR. openEHR explains how openness is a concept that is purposely engaged with, and how, in this process of engagement, the very meaning of open matures and evolves within the project. Drawing on rich longitudinal data related to openEHR we theorise the evolving nature of openness and how this idea emerges through two intertwined processes of maturation and metamorphosis. While metamorphosis allows us to trace and interrogate the mutational evolution in openness, maturation analyses the small, careful changes crafted to build a very particular understanding of openness. Metamorphosis is less managed and controlled, whereas maturation is representative of highly precise work carried out in controlled form. Both processes work together in open-source projects and reinforce each other. Our study reveals that openness emerges and evolves in open-source projects where it can be understood to mean rigour; ability to participate; open implementation; and an open process. Our work contributes to a deepening in the theorisation of what it means to be an open-source project. The multiple and co-existing meanings of ‘open’ imply that open-source projects evolve in nonlinear ways where each critical meaning of openness causes a reflective questioning by the community of its continued status and existence

    Data integration strategies for informing computational design in synthetic biology

    Get PDF
    PhD ThesisThe potential design space for biological systems is complex, vast and multidimensional. Therefore, effective large-scale synthetic biology requires computational design and simulation. By constraining this design space, the time- and cost-efficient design of biological systems can be facilitated. One way in which a tractable design space can be achieved is to use the extensive and growing amount of biological data available to inform the design process. By using existing knowledge design efforts can be focused on biologically plausible areas of design space. However, biological data is large, incomplete, heterogeneous, and noisy. Data must be integrated in a systematic fashion in order to maximise its benefit. To date, data integration has not been widely applied to design in synthetic biology. The aim of this project is to apply data integration techniques to facilitate the efficient design of novel biological systems. The specific focus is on the development and application of integration techniques for the design of genetic regulatory networks in the model bacterium Bacillus subtilis. A dataset was constructed by integrating data from a range of sources in order to capture existing knowledge about B. subtilis 168. The dataset is represented as a computationally-accessible, semantically-rich network which includes information concerning biological entities and their relationships. Also included are sequence-based features mined from the B. subtilis genome, which are a useful source of parts for synthetic biology. In addition, information about the interactions of these parts has been captured, in order to facilitate the construction of circuits with desired behaviours. This dataset was also modelled in the form of an ontology, providing a formal specification of parts and their interactions. The ontology is a major step towards the unification of the data required for modelling with a range of part catalogues specifically designed for synthetic biology. The data from the ontology is available to existing reasoners for implicit knowledge extraction. The ontology was applied to the automated identification of promoters, operators and coding sequences. Information from the ontology was also used to generate dynamic models of parts. The work described here contributed to the development of a formalism called Standard Virtual Parts (SVPs), which aims to represent models of biological parts in a standardised manner. SVPs comprise a mapping between biological parts and modular computational models. A genetic circuit designed at a part-level abstraction can be investigated in detail by analysing a circuit model composed of SVPs. The ontology was used to construct SVPs in the form of standard Systems Biology Markup Language models. These models are publicly available from a computationally-accessible repository, and include metadata which facilitates the computational composition of SVPs in order to create models of larger biological systems. To test a genetic circuit in vitro or in vivo, the genetics elements necessary to encode the enitites in the in silico model, and their associated behaviour, must be derived. Ultimately, this process results in the specification for synthesisable DNA sequence. For large models, particularly those that are produced computationally, the transformation process is challenging. To automate this process, a model-to-sequence conversion algorithm was developed. The algorithm was implemented as a Java application called MoSeC. Using MoSeC, both CellML and SBML models built with SVPs can be converted into DNA sequences ready to synthesise. Selection of the host bacterial cell for a synthetic genetic circuit is very important. In order not to interfere with the existing cellular machinery, orthogonal parts from other species are used since these parts are less likely to have undesired interactions with the host. In order to find orthogonal transcription factors (OTFs), and their target binding sequences, a subset of the data from the integrated B. subtilis dataset was used. B. subtilis gene regulatory networks were used to re-construct regulatory networks in closely related Bacillus species. The system, called BacillusRegNet, stores both experimental data for B. subtilis and homology predictions in other species. BacillusRegNet was mined to extract OTFs and their binding sequences, in order to facilitate the engineering of novel regulatory networks in other Bacillus species. Although the techniques presented here were demonstrated using B. subtilis, they can be applied to any other organism. The approaches and tools developed as part of this project demonstrate the utility of this novel integrated approach to synthetic biology.EPSRC: NSF: The Newcastle University School of Computing Science

    Ontology-Based Queries over Cancer Data

    Full text link
    corecore