1,710 research outputs found
Exposing The Cancer Genome Atlas (TCGA) as a SPARQL endpoint
Automated discovery of candidate biomarkers from multiple databases has been the central challenge in the Life Sciences in general and in the study of systemic processes such as those documented by The Cancer Genome Atlas (TCGA) in particular. 

The maturation of Semantic Web technologies offers solutions to those problems by allowing the query to be defined by navigating a normally represented domains of discourse instantiated by the data. 

We address the systems challenge of The Cancer 
Genome Atlas initiative (http://cancergenome.nih.gov/), which generates a large scale repository of high throughput molecular biology data generated and processed at 5 academic facilities across the USA
CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines
Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective.
The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines.
From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research
Quality Assessment of Linked Datasets using Probabilistic Approximation
With the increasing application of Linked Open Data, assessing the quality of
datasets by computing quality metrics becomes an issue of crucial importance.
For large and evolving datasets, an exact, deterministic computation of the
quality metrics is too time consuming or expensive. We employ probabilistic
techniques such as Reservoir Sampling, Bloom Filters and Clustering Coefficient
estimation for implementing a broad set of data quality metrics in an
approximate but sufficiently accurate way. Our implementation is integrated in
the comprehensive data quality assessment framework Luzzu. We evaluated its
performance and accuracy on Linked Open Datasets of broad relevance.Comment: 15 pages, 2 figures, To appear in ESWC 2015 proceeding
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data
To make digital resources on the web verifiable, immutable, and permanent, we
propose a technique to include cryptographic hash values in URIs. We call them
trusty URIs and we show how they can be used for approaches like
nanopublications to make not only specific resources but their entire reference
trees verifiable. Digital artifacts can be identified not only on the byte
level but on more abstract levels such as RDF graphs, which means that
resources keep their hash values even when presented in a different format. Our
approach sticks to the core principles of the web, namely openness and
decentralized architecture, is fully compatible with existing standards and
protocols, and can therefore be used right away. Evaluation of our reference
implementations shows that these desired properties are indeed accomplished by
our approach, and that it remains practical even for very large files.Comment: Small error corrected in the text (table data was correct) on page
13: "All average values are below 0.8s (0.03s for batch mode). Using Java in
batch mode even requires only 1ms per file.
Making Digital Artifacts on the Web Verifiable and Reliable
The current Web has no general mechanisms to make digital artifacts --- such
as datasets, code, texts, and images --- verifiable and permanent. For digital
artifacts that are supposed to be immutable, there is moreover no commonly
accepted method to enforce this immutability. These shortcomings have a serious
negative impact on the ability to reproduce the results of processes that rely
on Web resources, which in turn heavily impacts areas such as science where
reproducibility is important. To solve this problem, we propose trusty URIs
containing cryptographic hash values. We show how trusty URIs can be used for
the verification of digital artifacts, in a manner that is independent of the
serialization format in the case of structured data files such as
nanopublications. We demonstrate how the contents of these files become
immutable, including dependencies to external digital artifacts and thereby
extending the range of verifiability to the entire reference tree. Our approach
sticks to the core principles of the Web, namely openness and decentralized
architecture, and is fully compatible with existing standards and protocols.
Evaluation of our reference implementations shows that these design goals are
indeed accomplished by our approach, and that it remains practical even for
very large files.Comment: Extended version of conference paper: arXiv:1401.577
- …