6,942 research outputs found

    Integrating Protein Data Resources through Semantic Web Services

    Get PDF
    Understanding the function of every protein is one major objective of bioinformatics. Currently, a large amount of information (e.g., sequence, structure and dynamics) is being produced by experiments and predictions that are associated with protein function. Integrating these diverse data about protein sequence, structure, dynamics and other protein features allows further exploration and establishment of the relationships between protein sequence, structure, dynamics and function, and thereby controlling the function of target proteins. However, information integration in protein data resources faces challenges at technology level for interfacing heterogeneous data formats and standards and at application level for semantic interpretation of dissimilar data and queries. In this research, a semantic web services infrastructure, called Web Services for Protein data resources (WSP), for flexible and user-oriented integration of protein data resources, is proposed. This infrastructure includes a method for modeling protein web services, a service publication algorithm, an efficient service discovery (matching) algorithm, and an optimal service chaining algorithm. Rather than relying on syntactic matching, the matching algorithm discovers services based on their similarity to the requested service. Therefore, users can locate services that semantically match their data requirements even if they are syntactically distinctive. Furthermore, WSP supports a workflow-based approach for service integration. The chaining algorithm is used to select and chain services, based on the criteria of service accuracy and data interoperability. The algorithm generates a web services workflow which automatically integrates the results from individual services.A number of experiments are conducted to evaluate the performance of the matching algorithm. The results reveal that the algorithm can discover services with reasonable performance. Also, a composite service, which integrates protein dynamics and conservation, is experimented using the WSP infrastructure

    Interoperability and FAIRness through a novel combination of Web technologies

    Get PDF
    Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved atthe level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs

    Automated syntactic mediation for Web service integration

    No full text
    As the Web Services and Grid community adopt Semantic Web technology, we observe a shift towards higher-level workflow composition and service discovery practices. While this provides excellent functionality to non-expert users, more sophisticated middleware is required to hide the details of service invocation and service integration. An investigation of a common Bioinformatics use case reveals that the execution of high-level workflow designs requires additional processing to harmonise syntactically incompatible service interfaces. In this paper, we present an architecture to support the automatic reconciliation of data formats in such Web Service worklflows. The mediation of data is driven by ontologies that encapsulate the information contained in heterogeneous data structures supplying a common, conceptual data representation. Data conversion is carried out by a Configurable Mediator component, consuming mappings between \xml schemas and \owl ontologies. We describe our system and give examples of our mapping language against the background of a Bioinformatics use case

    CYCLONE Unified Deployment and Management of Federated, Multi-Cloud Applications

    Full text link
    Various Cloud layers have to work in concert in order to manage and deploy complex multi-cloud applications, executing sophisticated workflows for Cloud resource deployment, activation, adjustment, interaction, and monitoring. While there are ample solutions for managing individual Cloud aspects (e.g. network controllers, deployment tools, and application security software), there are no well-integrated suites for managing an entire multi cloud environment with multiple providers and deployment models. This paper presents the CYCLONE architecture that integrates a number of existing solutions to create an open, unified, holistic Cloud management platform for multi-cloud applications, tailored to the needs of research organizations and SMEs. It discusses major challenges in providing a network and security infrastructure for the Intercloud and concludes with the demonstration how the architecture is implemented in a real life bioinformatics use case

    Biodiversity informatics: the challenge of linking data and the role of shared identifiers

    Get PDF
    A major challenge facing biodiversity informatics is integrating data stored in widely distributed databases. Initial efforts have relied on taxonomic names as the shared identifier linking records in different databases. However, taxonomic names have limitations as identifiers, being neither stable nor globally unique, and the pace of molecular taxonomic and phylogenetic research means that a lot of information in public sequence databases is not linked to formal taxonomic names. This review explores the use of other identifiers, such as specimen codes and GenBank accession numbers, to link otherwise disconnected facts in different databases. The structure of these links can also be exploited using the PageRank algorithm to rank the results of searches on biodiversity databases. The key to rich integration is a commitment to deploy and reuse globally unique, shared identifiers (such as DOIs and LSIDs), and the implementation of services that link those identifiers

    The Semantic Automated Discovery and Integration (SADI) Web service Design-Pattern, API and Reference Implementation

    Get PDF
    Background. 
The complexity and inter-related nature of biological data poses a difficult challenge for data and tool integration. There has been a proliferation of interoperability standards and projects over the past decade, none of which has been widely adopted by the bioinformatics community. Recent attempts have focused on the use of semantics to assist integration, and Semantic Web technologies are being welcomed by this community.

Description. 
SADI – Semantic Automated Discovery and Integration – is a lightweight set of fully standards-compliant Semantic Web service design patterns that simplify the publication of services of the type commonly found in bioinformatics and other scientific domains. Using Semantic Web technologies at every level of the Web services “stack”, SADI services consume and produce instances of OWL Classes following a small number of very straightforward best-practices. In addition, we provide codebases that support these best-practices, and plug-in tools to popular developer and client software that dramatically simplify deployment of services by providers, and the discovery and utilization of those services by their consumers.

Conclusions.
SADI Services are fully compliant with, and utilize only foundational Web standards; are simple to create and maintain for service providers; and can be discovered and utilized in a very intuitive way by biologist end-users. In addition, the SADI design patterns significantly improve the ability of software to automatically discover appropriate services based on user-needs, and automatically chain these into complex analytical workflows. We show that, when resources are exposed through SADI, data compliant with a given ontological model can be automatically gathered, or generated, from these distributed, non-coordinating resources - a behavior we have not observed in any other Semantic system. Finally, we show that, using SADI, data dynamically generated from Web services can be explored in a manner very similar to data housed in static triple-stores, thus facilitating the intersection of Web services and Semantic Web technologies

    Context-Aware Information Retrieval for Enhanced Situation Awareness

    No full text
    In the coalition forces, users are increasingly challenged with the issues of information overload and correlation of information from heterogeneous sources. Users might need different pieces of information, ranging from information about a single building, to the resolution strategy of a global conflict. Sometimes, the time, location and past history of information access can also shape the information needs of users. Information systems need to help users pull together data from disparate sources according to their expressed needs (as represented by system queries), as well as less specific criteria. Information consumers have varying roles, tasks/missions, goals and agendas, knowledge and background, and personal preferences. These factors can be used to shape both the execution of user queries and the form in which retrieved information is packaged. However, full automation of this daunting information aggregation and customization task is not possible with existing approaches. In this paper we present an infrastructure for context-aware information retrieval to enhance situation awareness. The infrastructure provides each user with a customized, mission-oriented system that gives access to the right information from heterogeneous sources in the context of a particular task, plan and/or mission. The approach lays on five intertwined fundamental concepts, namely Workflow, Context, Ontology, Profile and Information Aggregation. The exploitation of this knowledge, using appropriate domain ontologies, will make it feasible to provide contextual assistance in various ways to the work performed according to a user’s taskrelevant information requirements. This paper formalizes these concepts and their interrelationships

    Identity in research infrastructure and scientific communication: Report from the 1st IRISC workshop, Helsinki Sep 12-13, 2011

    Get PDF
    Motivation for the IRISC workshop came from the observation that identity and digital identification are increasingly important factors in modern scientific research, especially with the now near-ubiquitous use of the Internet as a global medium for dissemination and debate of scientific knowledge and data, and as a platform for scientific collaborations and large-scale e-science activities.

The 1 1/2 day IRISC2011 workshop sought to explore a series of interrelated topics under two main themes: i) unambiguously identifying authors/creators & attributing their scholarly works, and ii) individual identification and access management in the context of identity federations. Specific aims of the workshop included:

• Raising overall awareness of key technical and non-technical challenges, opportunities and developments.
• Facilitating a dialogue, cross-pollination of ideas, collaboration and coordination between diverse – and largely unconnected – communities.
• Identifying & discussing existing/emerging technologies, best practices and requirements for researcher identification.

This report provides background information on key identification-related concepts & projects, describes workshop proceedings and summarizes key workshop findings

    Applications of the ACGT Master Ontology on Cancer

    Get PDF
    In this paper we present applications of the ACGT Master Ontology (MO) which is a new terminology resource for a transnational network providing data exchange in oncology, emphasizing the integration of both clinical and molecular data. The development of a new ontology was necessary due to problems with existing biomedical ontologies in oncology. The ACGT MO is a test case for the application of best practices in ontology development. This paper provides an overview of the application of the ontology within the ACGT project thus far
    • 

    corecore