1,861,152 research outputs found

    Integration of linked open data in case-based reasoning systems

    Get PDF
    This paper discusses the opportunities of integrating Linked Open Data (LOD) resources into Case-Based Reasoning (CBR) systems. Upon the application domain travel medicine, we will exemplify how LOD can be used to fill three out of four knowledge containers a CBR system is based on. The paper also presents the applied techniques for the realization and demonstrates the performance gain of knowledge acquisition by the use of LOD

    Data Integration for Open Data on the Web

    Get PDF
    In this lecture we will discuss and introduce challenges of integrating openly available Web data and how to solve them. Firstly, while we will address this topic from the viewpoint of Semantic Web research, not all data is readily available as RDF or Linked Data, so we will give an introduction to different data formats prevalent on the Web, namely, standard formats for publishing and exchanging tabular, tree-shaped, and graph data. Secondly, not all Open Data is really completely open, so we will discuss and address issues around licences, terms of usage associated with Open Data, as well as documentation of data provenance. Thirdly, we will discuss issues connected with (meta-)data quality issues associated with Open Data on the Web and how Semantic Web techniques and vocabularies can be used to describe and remedy them. Fourth, we will address issues about searchability and integration of Open Data and discuss in how far semantic search can help to overcome these. We close with briefly summarizing further issues not covered explicitly herein, such as multi-linguality, temporal aspects (archiving, evolution, temporal querying), as well as how/whether OWL and RDFS reasoning on top of integrated open data could be help

    Publish-Time Data Integration for Open Data Platforms

    Get PDF
    Platforms for publication and collaborative management of data, such as Data.gov or Google Fusion Tables, are a new trend on the web. They manage very large corpora of datasets, but often lack an integrated schema, ontology, or even just common publication standards. This results in inconsistent names for attributes of the same meaning, which constrains the discovery of relationships between datasets as well as their reusability. Existing data integration techniques focus on reuse-time, i.e., they are applied when a user wants to combine a specific set of datasets or integrate them with an existing database. In contrast, this paper investigates a novel method of data integration at publish-time, where the publisher is provided with suggestions on how to integrate the new dataset with the corpus as a whole, without resorting to a manually created mediated schema or ontology for the platform. We propose data-driven algorithms that propose alternative attribute names for a newly published dataset based on attribute- and instance statistics maintained on the corpus. We evaluate the proposed algorithms using real-world corpora based on the Open Data Platform opendata.socrata.com and relational data extracted from Wikipedia. We report on the system's response time, and on the results of an extensive crowdsourcing-based evaluation of the quality of the generated attribute names alternatives

    Exposing WikiPathways as Linked Open Data

    Get PDF
    Biology has become a data intensive science. Discovery of new biological facts increasingly relies on the ability to find and match appropriate biological data. For instance for functional annotation of genes of interest or for identification of pathways affected by over-expressed genes. Functional and pathway information about genes and proteins is typically distributed over a variety of databases and the literature.

Pathways are a convenient, easy to interpret way to describe known biological interactions. WikiPathways provides community curated pathways. WikiPathways users integrate their knowledge with facts from the literature and biological databases. The curated pathway is then reviewed and possibly corrected or enriched. Different tools (e.g. Pathvisio and Cytoscape) support the integration of WikiPathways-knowledge for additional tasks, such as the integration with personal data sets. 

Data from WikiPathways is increasingly also used for advanced analysis where it is integrated or compared with other data, Currently, integration with data from different biological sources is mostly done manually. This can be a very time consuming task because the curator often first needs to find the available resources, needs to learn about their specific content and qualities and often spends a lot of time to technically combine the two. 

Semantic web and Linked Data technologies eliminate the barriers between database silos by relying on a set of standards and best practices for representing and describing data. The architecture of the semantic web relies on the architecture of the web itself for integrating and mapping universal resource identifiers (URI), coupled with basic inference mechanisms to enable matching concepts and properties across data sources. Semantic Web and Linked Data technologies are increasingly being successfully applied as integration engines for linking biological elements. 

Exposing WikiPathways content as Linked Open Data to the Semantic Web, enables rapid, semi-automated integration with a the growing amount of biological resources available from the linked open data cloud, it also allows really fast queries of WikiPathways itself. 

We have harmonised WikiPathways content according to a selected set of vocabularies (Biopax, CHEMBL, etc), common to resources already available as Linked Open Data. 
WikiPathways content is now available as Linked Open Data for dynamic querying through a SPARQL endpoint: http://semantics.bigcat.unimaas.nl:8000/sparql

    Machado: open source genomics data integration framework.

    Get PDF
    Abstract. Background: Genome projects and multiomics experiments generate huge volumes of data that must be stored, mined, and transformed into useful knowledge. All this information is supposed to be accessible and, if possible, browsable afterwards. Computational biologists have been dealing with this scenario for more than a decade and have been implementing software and databases to meet this challenge. The GMOD's (Generic Model Organism Database) biological relational database schema, known as Chado, is one of the few successful open source initiatives; it is widely adopted and many software packages are able to connect to it. Findings: We have been developing an open source software package named Machado, a genomics data integration framework implemented in Python, to enable research groups to both store and visualize genomics data. The framework relies on the Chado database schema and, therefore, should be very intuitive for current developers to adopt it or have it running on top of already existing databases. It has several data-loading tools for genomics and transcriptomics data and also for annotation results from tools such as BLAST, InterproScan, OrthoMCL, and LSTrAP. There is an API to connect to JBrowse, and a web visualization tool is implemented using Django Views and Templates. The Haystack library integrated with the ElasticSearch engine was used to implement a Google-like search, i.e., single auto-complete search box that provides fast results and filters. Conclusion: Machado aims to be a modern object-relational framework that uses the latest Python libraries to produce an effective open source resource for genomics research.Na publicação: Adhemar Zerlotini

    MarBEF publishing revisited

    Get PDF
    Networking and integration served within a partnership approach and covered with a delicious sauce of free and open access to data and information is MarBEF’s main dish, and it is this recipe that has helped MarBEF to successfully bring marine biodiversity research to a European level. Numerous meetings, workshops, training courses and Responsive Mode Projects (RMPs) have brought together many scientists. This integration has created endless new possibilities for new initiatives – the MarBEF Publication Series and the MarBEF Open Archive, to mention just two. So, is this having any effect on the way we publish as a network today

    Taming open/closed string duality with a Losev trick

    Get PDF
    A target space string field theory formulation for open and closed B-model is provided by giving a Batalin-Vilkovisky quantization of the holomorphic Chern-Simons theory with off-shell gravity background. The target space expression for the coefficients of the holomorphic anomaly equation for open strings are obtained. Furthermore, open/closed string duality is proved from a judicious integration over the open string fields. In particular, by restriction to the case of independence on continuous open moduli, the shift formulas of [7] are reproduced and shown therefore to encode the data of a closed string dual.Comment: 22 pages, no figures; v.2 Refs. and a comment added

    Open accessibility data interlinking

    No full text
    This paper presents the research of using Linked Open Data to enhance accessibility data for accessible travelling. Open accessibility data is the data related to the accessibility issues associated with geographical data, which could benefit people with disabilities and their special needs. With the aim of addressing the gap between users’ special needs and data, this paper presents the results of a survey of open accessibility data retrieved from four different sources in the UK. An ontology based data integration approach is proposed to interlink these datasets together to generate a linked open accessibility repository, which also links to other resources on the Linked Data Cloud. As a result, this research would not only enrich the open accessibility data, but also contribute to a novel framework to address accessibility information barriers by establishing a linked data repository for publishing, linking and consuming the open accessibility data

    Statistical analysis of the owl:sameAs network for aligning concepts in the linking open data cloud

    No full text
    The massively distributed publication of linked data has brought to the attention of scientific community the limitations of classic methods for achieving data integration and the opportunities of pushing the boundaries of the field by experimenting this collective enterprise that is the linking open data cloud. While reusing existing ontologies is the choice of preference, the exploitation of ontology alignments still is a required step for easing the burden of integrating heterogeneous data sets. Alignments, even between the most used vocabularies, is still poorly supported in systems nowadays whereas links between instances are the most widely used means for bridging the gap between different data sets. We provide in this paper an account of our statistical and qualitative analysis of the network of instance level equivalences in the Linking Open Data Cloud (i.e. the sameAs network) in order to automatically compute alignments at the conceptual level. Moreover, we explore the effect of ontological information when adopting classical Jaccard methods to the ontology alignment task. Automating such task will allow in fact to achieve a clearer conceptual description of the data at the cloud level, while improving the level of integration between datasets. <br/
    • …
    corecore