130 research outputs found

    Ontology based data integration in life sciences

    Get PDF
    El objetivo de la tesis es el desarrollo de una solución práctica y estándar para la integración semántica de los datos y servicios biológicos. La tesis estudia escenarios diferentes en los cuales las ontologías pueden beneficiar el desarrollo de los servicios web, su búsqueda y su visibilidad. A pesar de que las ontologías son ampliamente utilizadas en la biología, su uso habitualmente se limita a la definición de las jerarquías taxonómicas. La tesis examina la utilidad de las ontologías para la integración de los datos en el desarrollo de los servicios web semánticos. Las ontologías que definen los tipos de datos biológicos tienen un gran valor para la integración de los datos, especialmente ante un cambio continuo de los estándares. La tesis evalúa la ontología BioMoby para la generación de los servicios web conforme con las especificaciones WS-I y los servicios REST. Otro aspecto muy importante de la tesis es el uso de las ontologías para la descripción de los servicios web. La tesis evalúa la ontología WSDL promovida por el consorcio W3C para la descripción de los servicios y su búsqueda. Finalmente, se considera la integración con las plataformas modernas de la ejecución de los flujos de trabajo como Taverna y Galaxy. A pesar de la creciente popularidad del formato JSON, los servicios web dependen mucho del XML. La herramienta OWL2XS facilita el desarrollo de los servicios web semánticos generando un esquema XML a partir de una ontología OWL 2. La integración de los servicios web es difícil de conseguir sin una adaptación de los estándares. La aplicación BioNemus genera de manera automática servicios web estándar a partir de las ontologías BioMoby. La representación semántica de los servicios web simplifica su búsqueda y anotación. El Registro Semántico de Servicios Web (BioSWR) está basado en la ontología WSDL del W3C y proporciona una representación en distintos formatos: OWL 2, WSDL 1.1, WSDL 2.0 y WADL. Para demostrar los beneficios de la descripción semántica de los servicios web se ha desarrollado un plugin para Taverna. También se ha implementado una nueva librería experimental que ha sido usada en la aplicación Galaxy Gears, la cual permite la integración de los servicios web en Galaxy. La tesis explora el alcance de la aplicación de las ontologías para la integración de los datos y los servicios biológicos, proporcionando un amplio conjunto de nuevas aplicaciones.The aim of this thesis is to develop standard and practical approaches for the semantic integration of biological data and services. The thesis considers various scenarios where ontologies may benefit bioinformatics web services development, integration and provenance. In spite of the broad use of ontologies in biology, their usage is usually limited to a definition of taxonomic hierarchies. This thesis examines the utility of ontologies for data integration in context of semantic web services development. The biological datatypes ontologies are very valuable for the data integration, especially in a context of continuous standards changes. The thesis evaluates the outdated BioMoby ontology for the generation of modern WS-I and RESTful web services. Another important aspect is the use of ontologies for the web services description. The thesis evaluates the W3C standard WSDL ontology for bioinformatics web services description and provenance. Finally, the integration with modern workflow execution platforms such as Taverna and Galaxy is also considered. Despite the growing popularity of JSON format, web services vastly depend on XML type system. The OWL2XS tool facilitates semantic web services development providing the automatic XML Schema generation from an appropriate OWL 2 datatype ontology. Web services integration is hardly achievable without a broad standard adoption. The BioNemus application automatically generates standard-based web services from BioMoby ontologies. Semantic representation of web services description simplifies web services search and annotation. Semantic Web Services Registry (BioSWR) is based on W3C WSDL ontology and provides a multifaceted web services view in different formats: OWL 2, WSDL 1.1, WSDL 2.0 and WADL. To demonstrate benefits of ontology-based web services descriptions, BioSWR Taverna OSGI plug-in has been developed. The new, experimental, Taverna WSDL generic library has been used in Galaxy Gears tool which allows integrating web services into the Galaxy workflows. The thesis explores the scopes of ontologies application for the biological data and services integration, providing a broad set of original tools

    XMPP for cloud computing in bioinformatics supporting discovery and invocation of asynchronous web services

    Get PDF
    Background: Life sciences make heavily use of the web for both data provision and analysis. However, the increasing amount of available data and the diversity of analysis tools call for machine accessible interfaces in order to be effective. HTTP-based Web service technologies, like the Simple Object Access Protocol (SOAP) and REpresentational State Transfer (REST) services, are today the most common technologies for this in bioinformatics. However, these methods have severe drawbacks, including lack of discoverability, and the inability for services to send status notifications. Several complementary workarounds have been proposed, but the results are ad-hoc solutions of varying quality that can be difficult to use. Results: We present a novel approach based on the open standard Extensible Messaging and Presence Protocol (XMPP), consisting of an extension (IO Data) to comprise discovery, asynchronous invocation, and definition of data types in the service. That XMPP cloud services are capable of asynchronous communication implies that clients do not have to poll repetitively for status, but the service sends the results back to the client upon completion. Implementations for Bioclipse and Taverna are presented, as are various XMPP cloud services in bio- and cheminformatics. Conclusion: XMPP with its extensions is a powerful protocol for cloud services that demonstrate several advantages over traditional HTTP-based Web services: 1) services are discoverable without the need of an external registry, 2) asynchronous invocation eliminates the need for ad-hoc solutions like polling, and 3) input and output types defined in the service allows for generation of clients on the fly without the need of an external semantics description. The many advantages over existing technologies make XMPP a highly interesting candidate for next generation online services in bioinformatics

    The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows. The DBCLS BioHackathon Consortium*

    Get PDF
    Web services have become a key technology for bioinformatics, since life science databases are globally decentralized and the exponential increase in the amount of available data demands for efficient systems without the need to transfer entire databases for every step of an analysis. However, various incompatibilities among database resources and analysis services make it difficult to connect and integrate these into interoperable workflows. To resolve this situation, we invited domain specialists from web service providers, client software developers, Open Bio* projects, the BioMoby project and researchers of emerging areas where a standard exchange data format is not well established, for an intensive collaboration entitled the BioHackathon 2008. The meeting was hosted by the Database Center for Life Science (DBCLS) and Computational Biology Research Center (CBRC) and was held in Tokyo from February 11th to 15th, 2008. In this report we highlight the work accomplished and the common issues arisen from this event, including the standardization of data exchange formats and services in the emerging fields of glycoinformatics, biological interaction networks, text mining, and phyloinformatics. In addition, common shared object development based on BioSQL, as well as technical challenges in large data management, asynchronous services, and security are discussed. Consequently, we improved interoperability of web services in several fields, however, further cooperation among major database centers and continued collaborative efforts between service providers and software developers are still necessary for an effective advance in bioinformatics web service technologies

    Querying and managing opm-compliant scientific workflow provenance

    Get PDF
    Provenance, the metadata that records the derivation history of scientific results, is important in scientific workflows to interpret, validate, and analyze the result of scientific computing. Recently, to promote and facilitate interoperability among heterogeneous provenance systems, the Open Provenance Model (OPM) has been proposed and has played an important role in the community. In this dissertation, to efficiently query and manage OPM-compliant provenance, we first propose a provenance collection framework that collects both prospective provenance, which captures an abstract workflow specification as a recipe for future data derivation and retrospective provenance, which captures past workflow execution and data derivation information. We then propose a relational database-based provenance system, called OPMPROV that stores, reasons, and queries prospective and retrospective provenance, which is OPM-compliant provenance. We finally propose OPQL, an OPM-level provenance query language, that is directly defined over the OPM model. An OPQL query takes an OPM graph as input and produces an OPM graph as output; therefore, OPQL queries are not tightly coupled to the underlying provenance storage strategies. Our provenance store, provenance collection framework, and provenance query language feature the native support of the OPM model

    Multi-level Meta-workflows: New Concept for Regularly Occurring Tasks in Quantum Chemistry

    Get PDF
    Background: In Quantum Chemistry, many tasks are reoccurring frequently, e.g. geometry optimizations, benchmarking series etc. Here, workflows can help to reduce the time of manual job definition and output extraction. These workflows are executed on computing infrastructures and may require large computing and data resources. Scientific workflows hide these infrastructures and the resources needed to run them. It requires significant efforts and specific expertise to design, implement and test these workflows. Significance: Many of these workflows are complex and monolithic entities that can be used for particular scientific experiments. Hence, their modification is not straightforward and it makes almost impossible to share them. To address these issues we propose developing atomic workflows and embedding them in meta-workflows. Atomic workflows deliver a well-defined research domain specific function. Publishing workflows in repositories enables workflow sharing inside and/or among scientific communities. We formally specify atomic and meta-workflows in order to define data structures to be used in repositories for uploading and sharing them. Additionally, we present a formal description focused at orchestration of atomic workflows into meta-workflows. Conclusions: We investigated the operations that represent basic functionalities in Quantum Chemistry and developed that relevant atomic workflows and combined them into meta-workflows. Having these workflows we defined the structure of the Quantum Chemistry workflow library and uploaded these workflows in the SHIWA Workflow Repository

    Scientific Workflows: Moving Across Paradigms

    Get PDF
    Modern scientific collaborations have opened up the opportunity to solve complex problems that require both multidisciplinary expertise and large-scale computational experiments. These experiments typically consist of a sequence of processing steps that need to be executed on selected computing platforms. Execution poses a challenge, however, due to (1) the complexity and diversity of applications, (2) the diversity of analysis goals, (3) the heterogeneity of computing platforms, and (4) the volume and distribution of data. A common strategy to make these in silico experiments more manageable is to model them as workflows and to use a workflow management system to organize their execution. This article looks at the overall challenge posed by a new order of scientific experiments and the systems they need to be run on, and examines how this challenge can be addressed by workflows and workflow management systems. It proposes a taxonomy of workflow management system (WMS) characteristics, including aspects previously overlooked. This frames a review of prevalent WMSs used by the scientific community, elucidates their evolution to handle the challenges arising with the emergence of the “fourth paradigm,” and identifies research needed to maintain progress in this area

    Distributed Management of Grid-based Scientific Workflows

    Get PDF
    Grids and service-oriented technologies are emerging as dominant approaches for distributed systems. With the evolution of these technologies, scientific workflows have been introduced as a tool for scientists to assemble highly specialized applications, and to exchange large heterogeneous datasets in order to automate and accelerate the accomplishment of complex scientific tasks. Several Scientific Workflow Management Systems (SWfMS) have already been designed to support the specification, execution, and monitoring of scientific workflows. Meanwhile, they still face key challenges from two different perspectives: system usability and system efficiency. From the system usability perspective, current SWfMS are not designed to be simple enough for scientists who have quite limited IT knowledge. What’s more, there is no easy mechanism by which scientists can share and re-use scientific experiments that have already been designed and proved by others. From the perspective of system efficiency, existing SWfMS are coordinating and executing workflows in a centralized fashion using a single scheduler and / or a workflow enactor. This creates a single point of failure, forms a scalability bottleneck, and enforces centralized fault handling. In addition, they don’t consider load balancing while mapping abstract jobs onto several computational nodes. Another important challenge exists due to the common nature of scientific workflow applications, that need to exchange a huge amount of data during the execution process. Some available SWfMS use a mediator-based approach for data transfer where data must be transferred first to a centralized data manager, which is completely inefficient. Other SWfMS apply a peer-to-peer approach via data references. Even this approach is not sufficient for scientific workflows as a single complex scientific activity can produce an extensive amount of data. In this thesis, we introduce SWIMS (Scientific Workflow Integration and Management System) framework. It employs the Web Services technology to originate a distributed management system for data-intensive scientific workflows. The purpose of SWIMS is to overcome the previously mentioned challenges through a set of salient features: i) Support for distributed execution and management of workflows, ii) diminution of communication traffic, iii) support for smart re-run, iv) distributed fault handling and load balancing, v) ease of use, and vi) extensive sharing of scientific workflows. We discuss the motivation, design, and implementation of the SWIMS framework. Then, we evaluate it through the Montage application from the astronomy domain

    Developing Materials Informatics Workbench for Expediting the Discovery of Novel Compound Materials

    Get PDF
    corecore