9,690 research outputs found

    Hybrid XML Retrieval: Combining Information Retrieval and a Native XML Database

    Get PDF
    This paper investigates the impact of three approaches to XML retrieval: using Zettair, a full-text information retrieval system; using eXist, a native XML database; and using a hybrid system that takes full article answers from Zettair and uses eXist to extract elements from those articles. For the content-only topics, we undertake a preliminary analysis of the INEX 2003 relevance assessments in order to identify the types of highly relevant document components. Further analysis identifies two complementary sub-cases of relevance assessments ("General" and "Specific") and two categories of topics ("Broad" and "Narrow"). We develop a novel retrieval module that for a content-only topic utilises the information from the resulting answer list of a native XML database and dynamically determines the preferable units of retrieval, which we call "Coherent Retrieval Elements". The results of our experiments show that -- when each of the three systems is evaluated against different retrieval scenarios (such as different cases of relevance assessments, different topic categories and different choices of evaluation metrics) -- the XML retrieval systems exhibit varying behaviour and the best performance can be reached for different values of the retrieval parameters. In the case of INEX 2003 relevance assessments for the content-only topics, our newly developed hybrid XML retrieval system is substantially more effective than either Zettair or eXist, and yields a robust and a very effective XML retrieval.Comment: Postprint version. The editor version can be accessed through the DO

    The State-of-the-arts in Focused Search

    Get PDF
    The continuous influx of various text data on the Web requires search engines to improve their retrieval abilities for more specific information. The need for relevant results to a user’s topic of interest has gone beyond search for domain or type specific documents to more focused result (e.g. document fragments or answers to a query). The introduction of XML provides a format standard for data representation, storage, and exchange. It helps focused search to be carried out at different granularities of a structured document with XML markups. This report aims at reviewing the state-of-the-arts in focused search, particularly techniques for topic-specific document retrieval, passage retrieval, XML retrieval, and entity ranking. It is concluded with highlight of open problems

    Hypermedia Learning Objects System - On the Way to a Semantic Educational Web

    Full text link
    While eLearning systems become more and more popular in daily education, available applications lack opportunities to structure, annotate and manage their contents in a high-level fashion. General efforts to improve these deficits are taken by initiatives to define rich meta data sets and a semanticWeb layer. In the present paper we introduce Hylos, an online learning system. Hylos is based on a cellular eLearning Object (ELO) information model encapsulating meta data conforming to the LOM standard. Content management is provisioned on this semantic meta data level and allows for variable, dynamically adaptable access structures. Context aware multifunctional links permit a systematic navigation depending on the learners and didactic needs, thereby exploring the capabilities of the semantic web. Hylos is built upon the more general Multimedia Information Repository (MIR) and the MIR adaptive context linking environment (MIRaCLE), its linking extension. MIR is an open system supporting the standards XML, Corba and JNDI. Hylos benefits from manageable information structures, sophisticated access logic and high-level authoring tools like the ELO editor responsible for the semi-manual creation of meta data and WYSIWYG like content editing.Comment: 11 pages, 7 figure

    Utilizing sub-topical structure of documents for information retrieval.

    Get PDF
    Text segmentation in natural language processing typically refers to the process of decomposing a document into constituent subtopics. Our work centers on the application of text segmentation techniques within information retrieval (IR) tasks. For example, for scoring a document by combining the retrieval scores of its constituent segments, exploiting the proximity of query terms in documents for ad-hoc search, and for question answering (QA), where retrieved passages from multiple documents are aggregated and presented as a single document to a searcher. Feedback in ad hoc IR task is shown to benefit from the use of extracted sentences instead of terms from the pseudo relevant documents for query expansion. Retrieval effectiveness for patent prior art search task is enhanced by applying text segmentation to the patent queries. Another aspect of our work involves augmenting text segmentation techniques to produce segments which are more readable with less unresolved anaphora. This is particularly useful for QA and snippet generation tasks where the objective is to aggregate relevant and novel information from multiple documents satisfying user information need on one hand, and ensuring that the automatically generated content presented to the user is easily readable without reference to the original source document

    Data Model and Query Constructs for Versatile Web Query Languages

    Get PDF
    As the Semantic Web is gaining momentum, the need for truly versatile query languages becomes increasingly apparent. A Web query language is called versatile if it can access in the same query program data in different formats (e.g. XML and RDF). Most query languages are not versatile: they have not been specifically designed to cope with both worlds, providing a uniform language and common constructs to query and transform data in various formats. Moreover, most of them do not provide a flexible data model that is powerful enough to naturally convey both Semantic Web data formats (especially RDF and Topic Maps) and XML. This article highlights challenges related to the data model and language constructs for querying both standard Web and Semantic Web data with an emphasis on facilitating sophisticated reasoning. It is shown that Xcerpt’s data model and querying constructs are particularly well-suited for the Semantic Web, but that some adjustments of the Xcerpt syntax allow for even more effective and natural querying of RDF and Topic Maps

    A Framework for XML-based Integration of Data, Visualization and Analysis in a Biomedical Domain

    Get PDF
    Biomedical data are becoming increasingly complex and heterogeneous in nature. The data are stored in distributed information systems, using a variety of data models, and are processed by increasingly more complex tools that analyze and visualize them. We present in this paper our framework for integrating biomedical research data and tools into a unique Web front end. Our framework is applied to the University of Washington’s Human Brain Project. Specifically, we present solutions to four integration tasks: definition of complex mappings from relational sources to XML, distributed XQuery processing, generation of heterogeneous output formats, and the integration of heterogeneous data visualization and analysis tools

    Extended RDF: Computability and Complexity Issues

    Get PDF
    ERDF stable model semantics is a recently proposed semantics for ERDF ontologies and a faithful extension of RDFS semantics on RDF graphs. In this paper, we elaborate on the computability and complexity issues of the ERDF stable model semantics. Based on the undecidability result of ERDF stable model semantics, decidability under this semantics cannot be achieved, unless ERDF ontologies of restricted syntax are considered. Therefore, we propose a slightly modified semantics for ERDF ontologies, called ERDF #n- stable model semantics. We show that entailment under this semantics is, in general, decidable and also extends RDFS entailment. Equivalence statements between the two semantics are provided. Additionally, we provide algorithms that compute the ERDF #n-stable models of syntax-restricted and general ERDF ontologies. Further, we provide complexity results for the ERDF #nstable model semantics on syntax-restricted and general ERDF ontologies. Finally, we provide complexity results for the ERDF stable model semantics on syntax-restricted ERDF ontologies

    Multilingual search for cultural heritage archives via combining multiple translation resources

    Get PDF
    The linguistic features of material in Cultural Heritage (CH) archives may be in various languages requiring a facility for effective multilingual search. The specialised language often associated with CH content introduces problems for automatic translation to support search applications. The MultiMatch project is focused on enabling users to interact with CH content across different media types and languages. We present results from a MultiMatch study exploring various translation techniques for the CH domain. Our experiments examine translation techniques for the English language CLEF 2006 Cross-Language Speech Retrieval (CL-SR) task using Spanish, French and German queries. Results compare effectiveness of our query translation against a monolingual baseline and show improvement when combining a domain-specific translation lexicon with a standard machine translation system
    corecore