9,690 research outputs found
Hybrid XML Retrieval: Combining Information Retrieval and a Native XML Database
This paper investigates the impact of three approaches to XML retrieval:
using Zettair, a full-text information retrieval system; using eXist, a native
XML database; and using a hybrid system that takes full article answers from
Zettair and uses eXist to extract elements from those articles. For the
content-only topics, we undertake a preliminary analysis of the INEX 2003
relevance assessments in order to identify the types of highly relevant
document components. Further analysis identifies two complementary sub-cases of
relevance assessments ("General" and "Specific") and two categories of topics
("Broad" and "Narrow"). We develop a novel retrieval module that for a
content-only topic utilises the information from the resulting answer list of a
native XML database and dynamically determines the preferable units of
retrieval, which we call "Coherent Retrieval Elements". The results of our
experiments show that -- when each of the three systems is evaluated against
different retrieval scenarios (such as different cases of relevance
assessments, different topic categories and different choices of evaluation
metrics) -- the XML retrieval systems exhibit varying behaviour and the best
performance can be reached for different values of the retrieval parameters. In
the case of INEX 2003 relevance assessments for the content-only topics, our
newly developed hybrid XML retrieval system is substantially more effective
than either Zettair or eXist, and yields a robust and a very effective XML
retrieval.Comment: Postprint version. The editor version can be accessed through the DO
The State-of-the-arts in Focused Search
The continuous influx of various text data on the Web requires search engines to improve their retrieval abilities for more specific information. The need for relevant results to a user’s topic of interest has gone beyond search for domain or type specific documents to more focused result (e.g. document fragments or answers to a query). The introduction of XML provides a format standard for data representation, storage, and exchange. It helps focused search to be carried out at different granularities of a structured document with XML markups. This report aims at reviewing the state-of-the-arts in focused search, particularly techniques for topic-specific document retrieval, passage retrieval, XML retrieval, and entity ranking. It is concluded with highlight of open problems
Hypermedia Learning Objects System - On the Way to a Semantic Educational Web
While eLearning systems become more and more popular in daily education,
available applications lack opportunities to structure, annotate and manage
their contents in a high-level fashion. General efforts to improve these
deficits are taken by initiatives to define rich meta data sets and a
semanticWeb layer. In the present paper we introduce Hylos, an online learning
system. Hylos is based on a cellular eLearning Object (ELO) information model
encapsulating meta data conforming to the LOM standard. Content management is
provisioned on this semantic meta data level and allows for variable,
dynamically adaptable access structures. Context aware multifunctional links
permit a systematic navigation depending on the learners and didactic needs,
thereby exploring the capabilities of the semantic web. Hylos is built upon the
more general Multimedia Information Repository (MIR) and the MIR adaptive
context linking environment (MIRaCLE), its linking extension. MIR is an open
system supporting the standards XML, Corba and JNDI. Hylos benefits from
manageable information structures, sophisticated access logic and high-level
authoring tools like the ELO editor responsible for the semi-manual creation of
meta data and WYSIWYG like content editing.Comment: 11 pages, 7 figure
Utilizing sub-topical structure of documents for information retrieval.
Text segmentation in natural language processing typically refers to the process of decomposing a document into constituent subtopics. Our work centers on the application of text segmentation techniques within information retrieval (IR) tasks. For example, for scoring a document by combining the retrieval scores of its constituent segments, exploiting the proximity of query terms in documents for ad-hoc search, and for question answering (QA), where retrieved passages from multiple documents are aggregated and presented as a single document to a searcher. Feedback in ad hoc IR task is shown to benefit from the use of extracted sentences instead of terms from the pseudo relevant documents for query expansion. Retrieval effectiveness for patent prior art search task is enhanced by applying text segmentation to the patent queries. Another aspect of our work involves augmenting text segmentation techniques to produce segments which are more readable with less unresolved anaphora. This is particularly useful for QA and snippet generation tasks where the objective is to aggregate relevant and novel information from multiple documents satisfying user information need on one hand, and ensuring that the automatically generated content presented to the user is easily readable without reference to the original source document
Data Model and Query Constructs for Versatile Web Query Languages
As the Semantic Web is gaining momentum, the need for
truly versatile query languages becomes increasingly apparent. A Web
query language is called versatile if it can access in the same query program
data in different formats (e.g. XML and RDF). Most query languages
are not versatile: they have not been specifically designed to cope
with both worlds, providing a uniform language and common constructs
to query and transform data in various formats. Moreover, most of them
do not provide a flexible data model that is powerful enough to naturally
convey both Semantic Web data formats (especially RDF and
Topic Maps) and XML. This article highlights challenges related to the
data model and language constructs for querying both standard Web
and Semantic Web data with an emphasis on facilitating sophisticated
reasoning. It is shown that Xcerpt’s data model and querying constructs
are particularly well-suited for the Semantic Web, but that some adjustments
of the Xcerpt syntax allow for even more effective and natural
querying of RDF and Topic Maps
A Framework for XML-based Integration of Data, Visualization and Analysis in a Biomedical Domain
Biomedical data are becoming increasingly complex and heterogeneous in nature. The data are stored in distributed information systems, using a variety of data models, and are processed by increasingly more complex tools that analyze and visualize them. We present in this paper our framework for integrating biomedical research data and tools into a unique Web front end. Our framework is applied to the University of Washington’s Human Brain Project. Specifically, we present solutions to four integration tasks: definition of complex mappings from relational sources to XML, distributed XQuery processing, generation of heterogeneous output formats, and the integration of heterogeneous data visualization and analysis tools
Extended RDF: Computability and Complexity Issues
ERDF stable model semantics is a recently proposed semantics for
ERDF ontologies and a faithful extension of RDFS semantics on RDF graphs.
In this paper, we elaborate on the computability and complexity issues of the
ERDF stable model semantics. Based on the undecidability result of ERDF
stable model semantics, decidability under this semantics cannot be achieved,
unless ERDF ontologies of restricted syntax are considered. Therefore, we
propose a slightly modified semantics for ERDF ontologies, called ERDF #n-
stable model semantics. We show that entailment under this semantics is, in
general, decidable and also extends RDFS entailment. Equivalence statements
between the two semantics are provided. Additionally, we provide algorithms
that compute the ERDF #n-stable models of syntax-restricted and general
ERDF ontologies. Further, we provide complexity results for the ERDF #nstable
model semantics on syntax-restricted and general ERDF ontologies.
Finally, we provide complexity results for the ERDF stable model semantics
on syntax-restricted ERDF ontologies
Multilingual search for cultural heritage archives via combining multiple translation resources
The linguistic features of material in Cultural Heritage (CH) archives may be in various languages requiring a facility for effective multilingual search. The specialised
language often associated with CH content introduces problems for automatic translation to support search applications. The MultiMatch project is focused on enabling
users to interact with CH content across different media types and languages. We present results from a MultiMatch study exploring various translation techniques for
the CH domain. Our experiments examine translation techniques for the English language CLEF 2006 Cross-Language
Speech Retrieval (CL-SR) task using Spanish, French and German queries. Results compare effectiveness of our query
translation against a monolingual baseline and show improvement when combining a domain-specific translation lexicon with a standard machine translation system
- …