145,195 research outputs found
Foundations research in information retrieval inspired by quantum theory
In the information age information is useless unless it can be found and used, search engines in our time thereby form a crucial component of research. For something so crucial, information retrieval (IR), the formal discipline investigating search, can be a confusing area of study. There is an underlying difficulty, with the very definition of information retrieval, and weaknesses in its operational method, which prevent it being called a 'science'. The work in this thesis aims to create a formal definition for search, scientific methods for evaluation and comparison of different search strategies, and methods for dealing with the uncertainty associated with user interactions; so that one has the necessary formal foundation to be able to perceive IR as "search science".
The key problems restricting a science of search pertain to the ambiguity in the current way in which search scenarios and concepts are specified. This especially affects evaluation of search systems since according to the traditional retrieval approach, evaluations are not repeatable, and thus not collectively verifiable. This is mainly due to the dependence on the method of user studies currently dominating evaluation methodology. This evaluation problem is related to the problem of not being able to formally define the users in user studies. The problem of defining users relates in turn to one of the main retrieval-specific motivations of the thesis, which can be understood by noticing that uncertainties associated with the interpretation of user interactions are collectively inscribed in a relevance concept, the representation and use of which defines the overall character of a retrieval model. Current research is limited in its understanding of how to best model relevance, a key factor restricting extensive formalization of the IR discipline as a whole. Thus, the problems of defining search systems and search scenarios are the principle issues preventing formal comparisons of systems and scenarios, in turn limiting the strength of experimental evaluation. Alternative models of search are proposed that remove the need for ambiguous relevance concepts and instead by arguing for use of simulation as a normative evaluation strategy for retrieval, some new concepts are introduced that can be employed in judging effectiveness of search systems. Included are techniques for simulating search, techniques for formal user modelling and techniques for generating measures of effectiveness for search models.
The problems of evaluation and of defining users are generalized by proposing that they are related to the need for an unified framework for defining arbitrary search concepts, search systems, user models, and evaluation strategies. It is argued that this framework depends on a re-interpretation of the concept of search accommodating the increasingly embedded and implicit nature of search on modern operating systems, internet and networks. The re-interpretation of the concept of search is approached by considering a generalization of the concept of ostensive retrieval producing definitions of search, information need, user and system that (formally) accommodates the perception of search as an abstract process that can be physical and/or computational.
The feasibility of both the mathematical formalism and physical conceptualizations of quantum theory (QT) are investigated for the purpose of modelling the this abstract search process as a physical process. Techniques for representing a search process by the Hilbert space formalism in QT are presented from which techniques are proposed for generating measures for effectiveness that combine static information such as term weights, and dynamically changing information such as probabilities of relevance. These techniques are used for deducing methods for modelling information need change. In mapping the 'macro level search' process to 'micro level physics' some generalizations were made to the use and interpretation of basic QT concepts such the wave function description of state and reversible evolution of states corresponding to the first and second postulates of quantum theory respectively. Several ways of expressing relevance (and other retrieval concepts) within the derived framework are proposed arguing that the increase in modelling power by use of QT provides effective ways to characterize this complex concept.
Mapping the mathematical formalism of search to that of quantum theory presented insightful perspectives about the nature of search. However, differences between the operational semantics of quantum theory and search restricted the usefulness of the mapping. In trying to resolve these semantic differences, a semi-formal framework was developed that is mid-way between a programmatic language, a state-based language resembling the way QT models states, and a process description language. By using this framework, this thesis attempts to intimately link the theory and practice of information retrieval and the evaluation of the retrieval process. The result is a novel, and useful way for formally discussing, modelling and evaluating search concepts, search systems and search processes
Searching Health Information in Question-Answering Systems
Question-Answering Systems (QA Systems) can be viewed as a new alternative to the more familiar Information Retrieval Systems. These systems try to offer detailed, understandable answers to factual questions, in order to retrieve a collection of documents related to a particular search (Jackson & Schilder, 2005). The authors carry out a study to evaluate the quality and efficiency of open- and restricted-domain QA systems as sources for physicians and users in general through one monolingual evaluation and another multilingual. Their objective led them to use definition-type questions in order to evaluate QA systems and determine if they are useful to retrieve medical information. In addition, they analyze and evaluate the results obtained, and identify the source or sources used by the systems and their procedure (Olvera-Lobo & Gutiérrez-Artacho, 2010, 2011)
Searching Health Information in Question-Answering Systems
Question-Answering Systems (QA Systems) can be viewed as a new alternative to the more familiar Information Retrieval Systems. These systems try to offer detailed, understandable answers to factual questions, in order to retrieve a collection of documents related to a particular search (Jackson & Schilder, 2005). The authors carry out a study to evaluate the quality and efficiency of open- and restricted-domain QA systems as sources for physicians and users in general through one monolingual evaluation and another multilingual. Their objective led them to use definition-type questions in order to evaluate QA systems and determine if they are useful to retrieve medical information. In addition, they analyze and evaluate the results obtained, and identify the source or sources used by the systems and their procedure (Olvera-Lobo & Gutiérrez-Artacho, 2010, 2011)
DCU and ISI@INEX 2010: Ad-hoc and data-centric tracks
We describe the participation of Dublin City University (DCU)and the Indian Statistical Institute (ISI) in INEX 2010. The main contributions of this paper are: i) a simplified version of Hierarchical Language Model (HLM) which involves scoring XML elements with a combined probability of generating the given query from itself and the top level article node, is shown to outperform the baselines of Language Model (LM) and Vector Space Model (VSM) scoring of XML elements; ii) the Expectation Maximization (EM) feedback in LM is shown to be the most effective on the domain specic collection of IMDB; iii) automated removal of sentences indicating aspects of irrelevance from the narratives
of INEX ad-hoc topics is shown to improve retrieval eectiveness
A Hybrid Model for Document Retrieval Systems.
A methodology for the design of document retrieval systems is presented. First, a composite index term weighting model is developed based on term frequency statistics, including document frequency, relative frequency within document and relative frequency within collection, which can be adjusted by selecting various coefficients to fit into different indexing environments. Then, a composite retrieval model is proposed to process a user\u27s information request in a weighted Phrase-Oriented Fixed-Level Expression (POFLE), which may apply more than Boolean operators, through two phases. That is, we have a search for documents which are topically relevant to the information request by means of a descriptor matching mechanism, which incorporate a partial matching facility based on a structurally-restricted relationship imposed by indexing model, and is more general than matching functions of the traditional Boolean model and vector space model, and then we have a ranking of these topically relevant documents, by means of two types of heuristic-based selection rules and a knowledge-based evaluation function, in descending order of a preference score which predicts the combined effect of user preference for quality, recency, fitness and reachability of documents
Robust question answering
A Question Answering (QA) system should provide a short and precise answer to a question in natural language, by searching a large knowledge base consisting of natural language text. The sources of the
knowledge base are widely available, for written natural language text is a preferential form of human communication. The information ranges from the more traditional edited texts, for example encyclopaedias or newspaper articles, to text obtained by modern automatic processes, as automatic speech recognizers.
The work developed in the present thesis focuses on the Portuguese language and open domain question answering, meaning that neither the questions nor the texts are restricted to a specific area, and it aims to
address both types of written text. Since information retrieval is essential for a QA system, a careful analysis of the current state-of-the-art in information retrieval and question answering components was conducted.
A complete, efficient and robust question answering system is developed in this thesis, consisting of new modules for information retrieval and question answering, that is competitive with current QA systems. The
system was evaluated at the Portuguese monolingual task of QA@CLEF 2008 and achieved the 3rd place in 6 Portuguese participants and 5th place among the 21 participants of 11 languages.
The system was also tested in Question Answering over Speech Transcripts (QAST), but outside the official evaluation QAST of QA@CLEF, since Portuguese was not among the available languages for this task. For
that reason, an entire test environment consisting of a corpus of transcribed broadcast news and a matching question set was built in the scope of this work, so that experiments could be made. The system proved to
be robust in the presence of automatically transcribed data, with results in line with the best reported at QAST.info:eu-repo/semantics/publishedVersio
MIR task and evaluation techniques
Existing tasks in MIREX have traditionally focused on low-level MIR tasks working with flat (usually DSP-only) ground-truth. These evaluation techniques, however, can not evaluate the increasing number of algorithms that utilize relational data and are not currently utilizing the state of the art in evaluating ranked or ordered output. This paper summarizes the state of the art in evaluating relational ground-truth. These components are then synthesized into novel evaluation techniques that are then applied to 14 concrete music document retrieval tasks, demonstrating how these evaluation techniques can be applied in a practical context
- …