6,917 research outputs found

    Integrating a Query Language for Structured and Semi-Structured Data and IR Techniques

    Get PDF

    Challenging Ubiquitous Inverted Files

    Get PDF
    Stand-alone ranking systems based on highly optimized inverted file structures are generally considered ‘the’ solution for building search engines. Observing various developments in software and hardware, we argue however that IR research faces a complex engineering problem in the quest for more flexible yet efficient retrieval systems. We propose to base the development of retrieval systems on ‘the database approach’: mapping high-level declarative specifications of the retrieval process into efficient query plans. We present the Mirror DBMS as a prototype implementation of a retrieval system based on this approach

    Content-Aware DataGuides for Indexing Large Collections of XML Documents

    Get PDF
    XML is well-suited for modelling structured data with textual content. However, most indexing approaches perform structure and content matching independently, combining the retrieved path and keyword occurrences in a third step. This paper shows that retrieval in XML documents can be accelerated significantly by processing text and structure simultaneously during all retrieval phases. To this end, the Content-Aware DataGuide (CADG) enhances the wellknown DataGuide with (1) simultaneous keyword and path matching and (2) a precomputed content/structure join. Extensive experiments prove the CADG to be 50-90% faster than the DataGuide for various sorts of query and document, including difficult cases such as poorly structured queries and recursive document paths. A new query classification scheme identifies precise query characteristics with a predominant influence on the performance of the individual indices. The experiments show that the CADG is applicable to many real-world applications, in particular large collections of heterogeneously structured XML documents

    Distributed Information Retrieval using Keyword Auctions

    Get PDF
    This report motivates the need for large-scale distributed approaches to information retrieval, and proposes solutions based on keyword auctions

    A Semantic Web of Know-How: Linked Data for Community-Centric Tasks

    Full text link
    This paper proposes a novel framework for representing community know-how on the Semantic Web. Procedural knowledge generated by web communities typically takes the form of natural language instructions or videos and is largely unstructured. The absence of semantic structure impedes the deployment of many useful applications, in particular the ability to discover and integrate know-how automatically. We discuss the characteristics of community know-how and argue that existing knowledge representation frameworks fail to represent it adequately. We present a novel framework for representing the semantic structure of community know-how and demonstrate the feasibility of our approach by providing a concrete implementation which includes a method for automatically acquiring procedural knowledge for real-world tasks.Comment: 6th International Workshop on Web Intelligence & Communities (WIC14), Proceedings of the companion publication of the 23rd International Conference on World Wide Web (WWW 2014

    A Database Approach to Content-based XML retrieval

    Get PDF
    This paper describes a rst prototype system for content-based retrieval from XML data. The system's design supports both XPath queries and complex information retrieval queries based on a language modelling approach to information retrieval. Evaluation using the INEX benchmark shows that it is beneficial if the system is biased to retrieve large XML fragments over small fragments

    Enrichment of the Phenotypic and Genotypic Data Warehouse analysis using Question Answering systems to facilitate the decision making process in cereal breeding programs

    Get PDF
    Currently there are an overwhelming number of scientific publications in Life Sciences, especially in Genetics and Biotechnology. This huge amount of information is structured in corporate Data Warehouses (DW) or in Biological Databases (e.g. UniProt, RCSB Protein Data Bank, CEREALAB or GenBank), whose main drawback is its cost of updating that makes it obsolete easily. However, these Databases are the main tool for enterprises when they want to update their internal information, for example when a plant breeder enterprise needs to enrich its genetic information (internal structured Database) with recently discovered genes related to specific phenotypic traits (external unstructured data) in order to choose the desired parentals for breeding programs. In this paper, we propose to complement the internal information with external data from the Web using Question Answering (QA) techniques. We go a step further by providing a complete framework for integrating unstructured and structured information by combining traditional Databases and DW architectures with QA systems. The great advantage of our framework is that decision makers can compare instantaneously internal data with external data from competitors, thereby allowing taking quick strategic decisions based on richer data.This paper has been partially supported by the MESOLAP (TIN2010-14860) and GEODAS-BI (TIN2012-37493-C03-03) projects from the Spanish Ministry of Education and Competitivity. Alejandro Maté is funded by the Generalitat Valenciana under an ACIF grant (ACIF/2010/298)
    corecore