22,082 research outputs found
An infrastructure for building semantic web portals
In this paper, we present our KMi semantic web portal infrastructure, which supports two important tasks of semantic web portals, namely metadata extraction and data querying. Central to our infrastructure are three components: i) an automated metadata extraction tool, ASDI, which supports the extraction of high quality metadata from heterogeneous sources, ii) an ontology-driven question answering tool, AquaLog, which makes use of the domain specific ontology and the semantic metadata extracted by ASDI to answers questions in natural language format, and iii) a semantic search engine, which enhances traditional
text-based searching by making use of the underlying ontologies and the extracted metadata. A semantic web portal application has been built, which illustrates the usage of this infrastructure
Digital Preservation Services : State of the Art Analysis
Research report funded by the DC-NET project.An overview of the state of the art in service provision for digital preservation and curation. Its focus is on the areas where bridging the gaps is needed between e-Infrastructures and efficient and forward-looking digital preservation services. Based on a desktop study and a rapid analysis of some 190 currently available tools and services for digital preservation, the deliverable provides a high-level view on the range of instruments currently on offer to support various functions within a preservation system.European Commission, FP7peer-reviewe
Access to recorded interviews: A research agenda
Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed
A geo-temporal information extraction service for processing descriptive metadata in digital libraries
In the context of digital map libraries, resources are usually described according to metadata records that define the relevant subject, location, time-span, format and keywords. On what concerns locations and time-spans, metadata records are often incomplete or they provide information in a way that is not machine-understandable (e.g. textual descriptions). This paper presents techniques for extracting geotemporal information from text, using relatively simple text mining methods that leverage on a Web gazetteer service. The idea is to go from human-made geotemporal referencing (i.e. using place and period names in textual expressions) into geo-spatial coordinates and time-spans. A prototype system, implementing the proposed methods, is described in detail. Experimental results demonstrate the efficiency and accuracy of the proposed approaches
Automatic annotation of bioinformatics workflows with biomedical ontologies
Legacy scientific workflows, and the services within them, often present
scarce and unstructured (i.e. textual) descriptions. This makes it difficult to
find, share and reuse them, thus dramatically reducing their value to the
community. This paper presents an approach to annotating workflows and their
subcomponents with ontology terms, in an attempt to describe these artifacts in
a structured way. Despite a dearth of even textual descriptions, we
automatically annotated 530 myExperiment bioinformatics-related workflows,
including more than 2600 workflow-associated services, with relevant
ontological terms. Quantitative evaluation of the Information Content of these
terms suggests that, in cases where annotation was possible at all, the
annotation quality was comparable to manually curated bioinformatics resources.Comment: 6th International Symposium on Leveraging Applications (ISoLA 2014
conference), 15 pages, 4 figure
Automatic Metadata Generation using Associative Networks
In spite of its tremendous value, metadata is generally sparse and
incomplete, thereby hampering the effectiveness of digital information
services. Many of the existing mechanisms for the automated creation of
metadata rely primarily on content analysis which can be costly and
inefficient. The automatic metadata generation system proposed in this article
leverages resource relationships generated from existing metadata as a medium
for propagation from metadata-rich to metadata-poor resources. Because of its
independence from content analysis, it can be applied to a wide variety of
resource media types and is shown to be computationally inexpensive. The
proposed method operates through two distinct phases. Occurrence and
co-occurrence algorithms first generate an associative network of repository
resources leveraging existing repository metadata. Second, using the
associative network as a substrate, metadata associated with metadata-rich
resources is propagated to metadata-poor resources by means of a discrete-form
spreading activation algorithm. This article discusses the general framework
for building associative networks, an algorithm for disseminating metadata
through such networks, and the results of an experiment and validation of the
proposed method using a standard bibliographic dataset
Automating Metadata Extraction: Genre Classification
A problem that frequently arises in the management and integration of scientific data is the lack of context and semantics that would link data encoded in disparate ways. To bridge the discrepancy, it often helps to mine scientific texts to aid the understanding of the database. Mining relevant text can be significantly aided by the availability of descriptive and semantic metadata. The Digital Curation Centre (DCC) has undertaken research to automate the extraction of metadata from documents in PDF([22]). Documents may include scientific journal papers, lab notes or even emails. We suggest genre classification as a first step toward automating metadata extraction. The classification method will be built on looking at the documents from five directions; as an object of specific visual format, a layout of strings with characteristic grammar, an object with stylo-metric signatures, an object with meaning and purpose, and an object linked to previously classified objects and external sources. Some results of experiments in relation to the first two directions are described here; they are meant to be indicative of the promise underlying this multi-faceted approach.
- …