8 research outputs found

    Kahden XML-kyselykielen vertaileva käyttäjätutkimus

    Get PDF
    XML-merkatun tiedon tehokasta käsittelyä ja hakua varten on kehitetty relaatiotietokantojen SQL-kieltä vastaavia kyselykieliä. Tutkielmassa tarkastellaan kahden, lähtökohdiltaan ja ilmaisuvoimaltaan erilaisen XML-kyselykielen soveltuvuutta vuorovaikutteiseen ad hoc -käyttöön. Vertailuparina käytetään XQuery ja XIL-kieliä, joiden kummankin suunnittelussa on otettu vaikutteita relaatiotietokantojen SQL-kyselykielestä. Kieliä tarkastellaan erityisesti dokumenttiorientoituneen XML-tiedonhaun näkökulmasta. Tutkielmassa selvitetään ad hoc -käyttötilannetta jäljittelevien käyttäjäkokeiden avulla, tarjoaako ad hoc -käyttöön suunniteltu XIL käyttäjälleen hyötyä verrattaessa sitä yleiskäyttöiseen XQuery-kieleen. Kieliä verrataan suhteessa koetilanteissa annettujen oikeiden vastauksien määrään sekä vastauksissa tehtyjen virheiden määrään ja laatuun. Käyttäjäkokeiden tulosten sekä tutkimuskirjallisuuden pohjalta esitetään XIL-kieltä koskevia kehitysehdotuksia. Saadut tulokset kuvaavat testatuilla kielillä saavutettavaa suoriutumistasoa käyttötilanteessa, jossa tiedonhakija joutuu muotoilemaan kyselyn muistinvaraisesti ilman käyttöliittymän tai dokumentaation tukea. Mikäli XIL-kieltä halutaan kehittää tässä tutkielmassa esitettyjen kehitysehdotusten pohjalta, käyttäjäkokeiden tulokset tulisi validoida uusintatestillä mahdollisimman todenmukaisessa käyttöliittymässä. Asiasanat:tiedonhakujärjestelmät, käyttäjäkokeet, kyselykielet, XML, loppukäyttäjäohjelmoint

    Articulating information needs in XML query languages

    No full text
    Document-centric XML is a mixture of text and structure. With the increased availability of document-centric XML documents comes a need for query facilities in which both structural constraints and constraints on the content of the documents can be expressed. How does the expressiveness of languages for querying XML documents help users to express their information needs? We address this question from both an experimental and a theoretical point of view. Our experimental analysis compares a structure-ignorant with a structure-aware retrieval approach using the test suite of the INEX XML retrieval evaluation initiative. Theoretically, we create two mathematical models of users ’ knowledge of a set of documents and define query languages which exactly fit these models. One of these languages corresponds to an XML version of fielded search, the other to the INEX query language. Our main experimental findings are: First, while structure is used in varying degrees of complexity, two thirds of the queries can be expressed in a fielded-search like format which does not use the hierarchical structure of the documents. Second, three quarters of the queries use constraints on the context of the elements to be returned; these contextual constraints cannot be captured by ordinary keyword queries. Third, structure is used as a search hint, and not as a strict requirement, when judged against the underlying information need. Fourth, the use of structure in queries functions as a precision enhancing device

    Compressing Labels of Dynamic XML Data using Base-9 Scheme and Fibonacci Encoding

    Get PDF
    The flexibility and self-describing nature of XML has made it the most common mark-up language used for data representation over the Web. XML data is naturally modelled as a tree, where the structural tree information can be encoded into labels via XML labelling scheme in order to permit answers to queries without the need to access original XML files. As the transmission of XML data over the Internet has become vibrant, it has also become necessary to have an XML labelling scheme that supports dynamic XML data. For a large-scale and frequently updated XML document, existing dynamic XML labelling schemes still suffer from high growth rates in terms of their label size, which can result in overflow problems and/or ambiguous data/query retrievals. This thesis considers the compression of XML labels. A novel XML labelling scheme, named “Base-9”, has been developed to generate labels that are as compact as possible and yet provide efficient support for queries to both static and dynamic XML data. A Fibonacci prefix-encoding method has been used for the first time to store Base-9’s XML labels in a compressed format, with the intention of minimising the storage space without degrading XML querying performance. The thesis also investigates the compression of XML labels using various existing prefix-encoding methods. This investigation has resulted in the proposal of a novel prefix-encoding method named “Elias-Fibonacci of order 3”, which has achieved the fastest encoding time of all prefix-encoding methods studied in this thesis, whereas Fibonacci encoding was found to require the minimum storage. Unlike current XML labelling schemes, the new Base-9 labelling scheme ensures the generation of short labels even after large, frequent, skewed insertions. The advantages of such short labels as those generated by the combination of applying the Base-9 scheme and the use of Fibonacci encoding in terms of storing, updating, retrieving and querying XML data are supported by the experimental results reported herein

    Evaluation of effective XML information retrieval

    Get PDF
    XML is being adopted as a common storage format in scientific data repositories, digital libraries, and on the World Wide Web. Accordingly, there is a need for content-oriented XML retrieval systems that can efficiently and effectively store, search and retrieve information from XML document collections. Unlike traditional information retrieval systems where whole documents are usually indexed and retrieved as information units, XML retrieval systems typically index and retrieve document components of varying granularity. To evaluate the effectiveness of such systems, test collections where relevance assessments are provided according to an XML-specific definition of relevance are necessary. Such test collections have been built during four rounds of the INitiative for the Evaluation of XML Retrieval (INEX). There are many different approaches to XML retrieval; most approaches either extend full-text information retrieval systems to handle XML retrieval, or use database technologies that incorporate existing XML standards to handle both XML presentation and retrieval. We present a hybrid approach to XML retrieval that combines text information retrieval features with XML-specific features found in a native XML database. Results from our experiments on the INEX 2003 and 2004 test collections demonstrate the usefulness of applying our hybrid approach to different XML retrieval tasks. A realistic definition of relevance is necessary for meaningful comparison of alternative XML retrieval approaches. The three relevance definitions used by INEX since 2002 comprise two relevance dimensions, each based on topical relevance. We perform an extensive analysis of the two INEX 2004 and 2005 relevance definitions, and show that assessors and users find them difficult to understand. We propose a new definition of relevance for XML retrieval, and demonstrate that a relevance scale based on this definition is useful for XML retrieval experiments. Finding the appropriate approach to evaluate XML retrieval effectiveness is the subject of ongoing debate within the XML information retrieval research community. We present an overview of the evaluation methodologies implemented in the current INEX metrics, which reveals that the metrics follow different assumptions and measure different XML retrieval behaviours. We propose a new evaluation metric for XML retrieval and conduct an extensive analysis of the retrieval performance of simulated runs to show what is measured. We compare the evaluation behaviour obtained with the new metric to the behaviours obtained with two of the official INEX 2005 metrics, and demonstrate that the new metric can be used to reliably evaluate XML retrieval effectiveness. To analyse the effectiveness of XML retrieval in different application scenarios, we use evaluation measures in our new metric to investigate the behaviour of XML retrieval approaches under the following two scenarios: the ad-hoc retrieval scenario, exploring the activities carried out as part of the INEX 2005 Ad-hoc track; and the multimedia retrieval scenario, exploring the activities carried out as part of the INEX 2005 Multimedia track. For both application scenarios we show that, although different values for retrieval parameters are needed to achieve the optimal performance, the desired textual or multimedia information can be effectively located using a combination of XML retrieval approaches
    corecore