7 research outputs found
SPORTAL: Searching for Public SPARQL Endpoints
There are hundreds of SPARQL endpoints on the Web, but finding an endpoint relevant to a client s needs is difficult: each endpoint acts like a black box, often without a description of its content. Herein
we briefly describe Sportal: a system that collects meta-data about the content of endpoints and collects them into a central catalogue over which clients can search. Sportal sends queries to individual endpoints offline to learn about their content, generating a best-effort VoID description for each endpoint. These descriptions can then be searched and queried over by clients in the Sportal user interface, for example, to find endpoints
that contain instances of a given class, or triples with a given predicate, or more complex requests such as endpoints with at least 1,000 images of people. Herein we give a brief overview of Sportal, its design and functionality, and the features that shall be demoed at the conference
SPORTAL: Profiling the Content of Public SPARQL Endpoints
Access to hundreds of knowledge bases has been made available on the Web through public SPARQL endpoints. Unfortunately, few endpoints publish descriptions of their content (e.g., using VoID). It is thus unclear how agents can learn about the content of a given SPARQL endpoint or, relatedly, find SPARQL endpoints with content relevant to their needs. In this paper, the authors investigate the feasibility of a system that gathers information about public SPARQL endpoints by querying them directly about their own content. With the advent of SPARQL 1.1 and features such as aggregates, it is now possible to specify queries whose results would form a detailed profile of the content of the endpoint, comparable with a large subset of VoID. In theory it would thus be feasible to build a rich centralised catalogue describing the content indexed by individual endpoints by issuing them SPARQL (1.1) queries; this catalogue could then be searched and queried by agents looking for endpoints with content they are interested in. In practice, however, the coverage of the catalogue is bounded by the limitations of public endpoints themselves: some may not support SPARQL 1.1, some may return partial responses, some may throw exceptions for expensive aggregate queries, etc. The authors\u27 goal in this paper is thus twofold: (i) using VoID as a bar, to empirically investigate the extent to which public endpoints can describe their own content, and (ii) to build and analyse the capabilities of a best-effort online catalogue of current endpoints based on the (partial) results collected.This publication was supported in part by a research grant from Science Foundation Ireland (SFI)
under Grant Number SFI/12/RC/2289, by the Millennium Nucleus Center for Semantic Web Research
under Grant NC120004, and by Fondecyt Grant No. 1114090
PrEVIEw: Clustering and Visualising PubMed using Visual Interface
Abstract. The life sciences domain has been one of the early adopters of Open Data Initiative and a considerable portion of the Linked Open Data cloud is comprised of datasets from Life Sciences Linked Open Data (LSLOD). This deluge of biomedical data and active research over the past decade resulted in the flux of scientific publications in this domain. PubMed resource provides access to MEDLINE, NLM's database of citations and abstracts in the biomedical domain. PubMed Central provides links to full-text articles along with publisher web sites, and other related resources. In this paper we present PubMed Visual Interface (PrEVIEw)-a web based application to access information related to publication, research topic, author and institute through a visual interface. PrEVIEw not only provides useful information e.g. research topic of interest, research collaboration at personal or institute level etc, for the biomedical research community but also helpful for the working Data Scientist. We also evaluate the usability of our system by using the standard system usability scale as well as a custom questionnaire, particularly designed to test the usability of the interface. Our overall usability score of 83.69 suggests that web based interface is easy to learn, consistent, and adequate for frequent use
A Roadmap for navigating the Life Sciences Linked Open Data Cloud
Conference paperMultiple datasets that add high value to biomedical research have been exposed on the web as a part of the Life Sciences Linked Open Data (LSLOD) Cloud. The ability to easily navigate through these datasets is crucial for personalized medicine and the improvement of drug discovery process. However, navigating these multiple datasets is not trivial as most of these are only available as isolated SPARQL endpoints with very little vocabulary reuse. The content that is indexed through these endpoints is scarce, making the indexed dataset opaque for users. In this paper, we propose an approach for the creation of an active Linked Life Sciences Data Roadmap, a set of con gurable rules which can be used to discover links (roads) between biological entities (cities) in the LSLOD cloud. We have catalogued and linked concepts and properties from 137 public SPARQL endpoints. Our Roadmap is primarily used to dynamically assemble queries retrieving data from multiple SPARQL endpoints simultaneously. We also demonstrate its use in conjunction with other tools for selective SPARQL querying, semantic annotation of experimental datasets and the visualization of the LSLOD cloud. We have evaluated the performance of our approach in terms of the time taken and entity capture. Our approach, if generalized to encompass other domains, can be used for road-mapping the entire LOD cloud.Science Foundation Ireland - Grants # SFI/12/RC/2289 and SFI/08/CE/I1380 (Lion 2)
Biofed: federated query processing over life sciences linked open data
Background: Biomedical data, e. g. from knowledge bases and ontologies, is increasingly made available following open linked data principles, at best as RDF triple data. This is a necessary step towards unified access to biological data sets, but this still requires solutions to query multiple endpoints for their heterogeneous data to eventually retrieve all the meaningful information. Suggested solutions are based on query federation approaches, which require the submission of SPARQL queries to endpoints. Due to the size and complexity of available data, these solutions have to be optimised for efficient retrieval times and for users in life sciences research. Last but not least, over time, the reliability of data resources in terms of access and quality have to be monitored. Our solution (BioFed) federates data over 130 SPARQL endpoints in life sciences and tailors query submission according to the provenance information. BioFed has been evaluated against the state of the art solution FedX and forms an important benchmark for the life science domain.
Methods: The efficient cataloguing approach of the federated query processing system 'BioFed', the triple pattern wise source selection and the semantic source normalisation forms the core to our solution. It gathers and integrates data from newly identified public endpoints for federated access. Basic provenance information is linked to the retrieved data. Last but not least, BioFed makes use of the latest SPARQL standard (i. e., 1.1) to leverage the full benefits for query federation. The evaluation is based on 10 simple and 10 complex queries, which address data in 10 major and very popular data sources (e. g., Dugbank, Sider).
Results: BioFed is a solution for a single-point-of-access for a large number of SPARQL endpoints providing life science data. It facilitates efficient query generation for data access and provides basic provenance information in combination with the retrieved data. BioFed fully supports SPARQL 1.1 and gives access to the endpoint's availability based on the EndpointData graph. Our evaluation of BioFed against FedX is based on 20 heterogeneous federated SPARQL queries and shows competitive execution performance in comparison to FedX, which can be attributed to the provision of provenance information for the source selection.
Conclusion: Developing and testing federated query engines for life sciences data is still a challenging task. According to our findings, it is advantageous to optimise the source selection. The cataloguing of SPARQL endpoints, including type and property indexing, leads to efficient querying of data resources over the Web of Data. This could even be further improved through the use of ontologies, e. g., for abstract normalisation of query terms
Twelve-month observational study of children with cancer in 41 countries during the COVID-19 pandemic
Childhood cancer is a leading cause of death. It is unclear whether the COVID-19 pandemic has impacted childhood cancer mortality. In this study, we aimed to establish all-cause mortality rates for childhood cancers during the COVID-19 pandemic and determine the factors associated with mortality