575 research outputs found

    Visual exploration and retrieval of XML document collections with the generic system X2

    Get PDF
    This article reports on the XML retrieval system X2 which has been developed at the University of Munich over the last five years. In a typical session with X2, the user first browses a structural summary of the XML database in order to select interesting elements and keywords occurring in documents. Using this intermediate result, queries combining structure and textual references are composed semiautomatically. After query evaluation, the full set of answers is presented in a visual and structured way. X2 largely exploits the structure found in documents, queries and answers to enable new interactive visualization and exploration techniques that support mixed IR and database-oriented querying, thus bridging the gap between these three views on the data to be retrieved. Another salient characteristic of X2 which distinguishes it from other visual query systems for XML is that it supports various degrees of detailedness in the presentation of answers, as well as techniques for dynamically reordering and grouping retrieved elements once the complete answer set has been computed

    Projector - a partially typed language for querying XML

    Get PDF
    We describe Projector, a language that can be used to perform a mixture of typed and untyped computation against data represented in XML. For some problems, notably when the data is unstructured or semistructured, the most desirable programming model is against the tree structure underlying the document. When this tree structure has been used to model regular data structures, then these regular structures themselves are a more desirable programming model. The language Projector, described here in outline, gives both models within a single partially typed algebra and is well suited for hybrid applications, for example when fragments of a known structure are embedded in a document whose overall structure is unknown. Projector is an extension of ECMA-262 (aka JavaScript), and therefore inherits an untyped DOM interface. To this has been added some static typing and a dynamic projection primitive, which can be used to assert the presence of a regular structure modelled within the XML. If this structure does exist, the data is extracted and presented as a typed value within the programming language

    Collaborative software agents support for the texpros document management system

    Get PDF
    This dissertation investigates the use of active rules that are embedded in markup documents. Active rules are used in a markup representation by integrating Collaborative Software Agents with TEXPROS (abbreviation for TEXt PROcessing System) [Liu and Ng 1996] to create a powerful distributed document management system. Such markup documents with embedded active rules are called Active Documents. For fast retrieval purposes, when we need to generate a customized Internet folder organization, we first define the Folder Organization Query Language (FO-QL) to solve data categorization problems. FO-QL defines the folder organization query process that automatically retrieves links of documents deposited into folders and then constructs a folder organization in either a centralized document repository or multiple distributed document repositories. Traditional documents are stored as static data that do not provide any dynamic capabilities for accessing or interacting with the document environment. The dynamic and distributed nature of both markup data and markup rules do not merely respond to requests for information, but intelligently anticipate, adapt, and actively seek ways to support the computing processes. This outcome feature conquers the static nature of the traditional documents. An Office Automation Definition Language (OADL) with active rules is defined for constructing the TEXPROS \u27s dual modeling approach and workflow events representation. Active Documents are such agent-supported OADL documents. With embedded rules and self-describing data features, Active Documents provide capability of collaborative interactions with software agents. Data transformation and data integration are both data processing problems but little research has focused on the markup documents to generate a versatile folder organization. Some of the research merely provides manual browsing in a document repository to find the right document. This browsing is time consuming and unrealistic, especially in multiple document repositories. With FO-QL, one can create a customized folder organization on demand

    Integrating data warehouses with web data : a survey

    Get PDF
    This paper surveys the most relevant research on combining Data Warehouse (DW) and Web data. It studies the XML technologies that are currently being used to integrate, store, query, and retrieve Web data and their application to DWs. The paper reviews different DW distributed architectures and the use of XML languages as an integration tool in these systems. It also introduces the problem of dealing with semistructured data in a DW. It studies Web data repositories, the design of multidimensional databases for XML data sources, and the XML extensions of OnLine Analytical Processing techniques. The paper addresses the application of information retrieval technology in a DW to exploit text-rich document collections. The authors hope that the paper will help to discover the main limitations and opportunities that offer the combination of the DW and the Web fields, as well as to identify open research line

    CRIS-IR 2006

    Get PDF
    The recognition of entities and their relationships in document collections is an important step towards the discovery of latent knowledge as well as to support knowledge management applications. The challenge lies on how to extract and correlate entities, aiming to answer key knowledge management questions, such as; who works with whom, on which projects, with which customers and on what research areas. The present work proposes a knowledge mining approach supported by information retrieval and text mining tasks in which its core is based on the correlation of textual elements through the LRD (Latent Relation Discovery) method. Our experiments show that LRD outperform better than other correlation methods. Also, we present an application in order to demonstrate the approach over knowledge management scenarios.Fundação para a Ciência e a Tecnologia (FCT) Denmark's Electronic Research Librar

    The Web as a Resource for Question Answering: Perspectives and Challenges

    Get PDF
    The vast amounts of information readily available on the World Wide Web can be effectively used for question answering in two fundamentally different ways. In the federated approach, techniques for handling semistructured data are applied to access Web sources as if they were databases, allowing large classes of common questions to be answered uniformly. In the distributed approach, largescale text-processing techniques are used to extract answers directly from unstructured Web documents. Because the Web is orders of magnitude larger than any human-collected corpus, question answering systems can capitalize on its unparalleled-levels of data redundancy. Analysis of real-world user questions reveals that the federated and distributed approaches complement each other nicely, suggesting a hybrid approach in future question answering systems
    • …
    corecore