16 research outputs found

    Building simulated queries for known-item topics: an analysis using six european languages

    Get PDF
    There has been increased interest in the use of simulated queries for evaluation and estimation purposes in Information Retrieval. However, there are still many unaddressed issues regarding their usage and impact on evaluation because their quality, in terms of retrieval performance, is unlike real queries. In this paper, we focus on methods for building simulated known-item topics and explore their quality against real known-item topics. Using existing generation models as our starting point, we explore factors which may influence the generation of the known-item topic. Informed by this detailed analysis (on six European languages) we propose a model with improved document and term selection properties, showing that simulated known-item topics can be generated that are comparable to real known-item topics. This is a significant step towards validating the potential usefulness of simulated queries: for evaluation purposes, and because building models of querying behavior provides a deeper insight into the querying process so that better retrieval mechanisms can be developed to support the user

    An overview of the linguistic resources used in cross-language question answering systems in CLEF Conference

    Get PDF
    The development of the Semantic Web requires great economic and human effort. Consequently, it is very useful to create mechanisms and tools that facilitate its expansion. From the standpoint of information retrieval (hereafter IR), access to the contents of the Semantic Web can be favored by the use of natural language, as it is much simpler and faster for the user to engage in his habitual form of expression. The growing popularity of Internet and the wide availability of web informative resources for general audiences are a fairly recent phenomenon, although man´s need to hurdle the language barrier and communicate with others is as old as the history of mankind. The World Wide Web, also known as WWW, together with the growing globalization of companies and organizations, and the increase of the non-English speaking audience, entails the demand for tools allowing users to secure information from a wide range of resources. Yet the underlying linguistic restrictions are often overlooked by researchers and designers. Against this background, a key characteristic to be evaluated in terms of the efficiency of IR systems is its capacity to allow users find a corpus of documents in different languages, and to facilitate the relevant information despite limited linguistic competence regarding the target language

    An overview of the linguistic resources used in cross-language question answering systems in CLEF Conference

    Get PDF
    The development of the Semantic Web requires great economic and human effort. Consequently, it is very useful to create mechanisms and tools that facilitate its expansion. From the standpoint of information retrieval (hereafter IR), access to the contents of the Semantic Web can be favored by the use of natural language, as it is much simpler and faster for the user to engage in his habitual form of expression. The growing popularity of Internet and the wide availability of web informative resources for general audiences are a fairly recent phenomenon, although man´s need to hurdle the language barrier and communicate with others is as old as the history of mankind. The World Wide Web, also known as WWW, together with the growing globalization of companies and organizations, and the increase of the non-English speaking audience, entails the demand for tools allowing users to secure information from a wide range of resources. Yet the underlying linguistic restrictions are often overlooked by researchers and designers. Against this background, a key characteristic to be evaluated in terms of the efficiency of IR systems is its capacity to allow users find a corpus of documents in different languages, and to facilitate the relevant information despite limited linguistic competence regarding the target language

    An Overview of the Linguistic Resources used in Cross-Language Question Answering Systems in CLEF Conference

    Get PDF
    The development of the Semantic Web requires great economic and human effort. Consequently, it is very useful to create mechanisms and tools that facilitate its expansion. From the standpoint of information retrieval (hereafter IR), access to the contents of the Semantic Web can be favored by the use of natural language, as it is much simpler and faster for the user to engage in his habitual form of expression. The growing popularity of Internet and the wide availability of web informative resources for general audiences are a fairly recent phenomenon, although man´s need to hurdle the language barrier and communicate with others is as old as the history of mankind. The World Wide Web, also known as WWW, together with the growing globalization of companies and organizations, and the increase of the non-English speaking audience, entails the demand for tools allowing users to secure information from a wide range of resources. Yet the underlying linguistic restrictions are often overlooked by researchers and designers. Against this background, a key characteristic to be evaluated in terms of the efficiency of IR systems is its capacity to allow users find a corpus of documents in different languages, and to facilitate the relevant information despite limited linguistic competence regarding the target language

    Developing a distributed electronic health-record store for India

    Get PDF
    The DIGHT project is addressing the problem of building a scalable and highly available information store for the Electronic Health Records (EHRs) of the over one billion citizens of India

    Towards privacy-aware identity management

    Get PDF
    The overall goal of the PRIME project (Privacy and Identity Management for Europe) is the development of a privacy-enhanced identity management system that allows users to control the release of their personal information. The PRIME architecture includes an Access Control component allowing the enforcement of protection requirements on personal identifiable information (PII). The overall goal of the PRIME project (Privacy and Identity Management for Europe) is the development of a privacy-enhanced identity management system that allows users to control the release of their personal information. The PRIME architecture includes an Access Control component allowing the enforcement of protection requirements on personal identifiable information (PII)

    Learning to select for information retrieval

    Get PDF
    The effective ranking of documents in search engines is based on various document features, such as the frequency of the query terms in each document, the length, or the authoritativeness of each document. In order to obtain a better retrieval performance, instead of using a single or a few features, there is a growing trend to create a ranking function by applying a learning to rank technique on a large set of features. Learning to rank techniques aim to generate an effective document ranking function by combining a large number of document features. Different ranking functions can be generated by using different learning to rank techniques or on different document feature sets. While the generated ranking function may be uniformly applied to all queries, several studies have shown that different ranking functions favour different queries, and that the retrieval performance can be significantly enhanced if an appropriate ranking function is selected for each individual query. This thesis proposes Learning to Select (LTS), a novel framework that selectively applies an appropriate ranking function on a per-query basis, regardless of the given query's type and the number of candidate ranking functions. In the learning to select framework, the effectiveness of a ranking function for an unseen query is estimated from the available neighbouring training queries. The proposed framework employs a classification technique (e.g. k-nearest neighbour) to identify neighbouring training queries for an unseen query by using a query feature. In particular, a divergence measure (e.g. Jensen-Shannon), which determines the extent to which a document ranking function alters the scores of an initial ranking of documents for a given query, is proposed for use as a query feature. The ranking function which performs the best on the identified training query set is then chosen for the unseen query. The proposed framework is thoroughly evaluated on two different TREC retrieval tasks (namely, Web search and adhoc search tasks) and on two large standard LETOR feature sets, which contain as many as 64 document features, deriving conclusions concerning the key components of LTS, namely the query feature and the identification of neighbouring queries components. Two different types of experiments are conducted. The first one is to select an appropriate ranking function from a number of candidate ranking functions. The second one is to select multiple appropriate document features from a number of candidate document features, for building a ranking function. Experimental results show that our proposed LTS framework is effective in both selecting an appropriate ranking function and selecting multiple appropriate document features, on a per-query basis. In addition, the retrieval performance is further enhanced when increasing the number of candidates, suggesting the robustness of the learning to select framework. This thesis also demonstrates how the LTS framework can be deployed to other search applications. These applications include the selective integration of a query independent feature into a document weighting scheme (e.g. BM25), the selective estimation of the relative importance of different query aspects in a search diversification task (the goal of the task is to retrieve a ranked list of documents that provides a maximum coverage for a given query, while avoiding excessive redundancy), and the selective application of an appropriate resource for expanding and enriching a given query for document search within an enterprise. The effectiveness of the LTS framework is observed across these search applications, and on different collections, including a large scale Web collection that contains over 50 million documents. This suggests the generality of the proposed learning to select framework. The main contributions of this thesis are the introduction of the LTS framework and the proposed use of divergence measures as query features for identifying similar queries. In addition, this thesis draws insights from a large set of experiments, involving four different standard collections, four different search tasks and large document feature sets. This illustrates the effectiveness, robustness and generality of the LTS framework in tackling various retrieval applications

    Actas del Taller de Trabajo Zoco’08 / JISBD Integración de Aplicaciones Web : XIII Jornadas de Ingeniería del Software y Bases de Datos Gijón, 7 al 10 de Octubre de 2008

    Get PDF
    Ministerio de Educación y Ciencia TIN2007-64119Junta de Andalucía P07-TIC-0260
    corecore