212 research outputs found

    Web Query Reformulation via Joint Modeling of Latent Topic Dependency and Term Context

    Get PDF
    An important way to improve users’ satisfaction in Web search is to assist them by issuing more effective queries. One such approach is query reformulation, which generates new queries according to the current query issued by users. A common procedure for conducting reformulation is to generate some candidate queries first, then a scoring method is employed to assess these candidates. Currently, most of the existing methods are context based. They rely heavily on the context relation of terms in the history queries and cannot detect and maintain the semantic consistency of queries. In this article, we propose a graphical model to score queries. The proposed model exploits a latent topic space, which is automatically derived from the query log, to detect semantic dependency of terms in a query and dependency among topics. Meanwhile, the graphical model also captures the term context in the history query by skip-bigram and n-gram language models. In addition, our model can be easily extended to consider users’ history search interests when we conduct query reformulation for different users. In the task of candidate query generation, we investigate a social tagging data resource—Delicious bookmark—to generate addition and substitution patterns that are employed as supplements to the patterns generated from query log data

    Information retrieval (Part I):Introduction

    Get PDF

    Bubble World - A Novel Visual Information Retrieval Technique

    Get PDF
    With the tremendous growth of published electronic information sources in the last decade and the unprecedented reliance on this information to succeed in day-to-day operations, comes the expectation of finding the right information at the right time. Sentential interfaces are currently the only viable solution for searching through large infospheres of unstructured information, however, the simplistic nature of their interaction model and lack of cognitive amplification they can provide severely limit the performance of the interface. Visual information retrieval systems are emerging as possible candidate replacements for the more traditional interfaces, but many lack the cognitive framework to support the knowledge crystallization process found to be essential in information retrieval. This work introduces a novel visual information retrieval technique crafted from two distinct design genres: (1) the cognitive strategies of the human mind to solve problems and (2) observed interaction patterns with existing information retrieval systems. Based on the cognitive and interaction framework developed in this research, a functional prototype information retrieval system, called Bubble World, has been created to demonstrate that significant performance gains can be achieved using this technique when compared to more traditional text-based interfaces. Bubble World does this by successfully transforming the internal mental representation of the information retrieval problem to an efficient external view, and then through visual cues, provides cognitive amplification at key stages of the information retrieval process. Additionally, Bubble World provides the interaction model and the mechanisms to incorporate complex search schemas into the retrieval process either manually or automatically through the use of predefined ontological models

    Formal concept matching and reinforcement learning in adaptive information retrieval

    Get PDF
    The superiority of the human brain in information retrieval (IR) tasks seems to come firstly from its ability to read and understand the concepts, ideas or meanings central to documents, in order to reason out the usefulness of documents to information needs, and secondly from its ability to learn from experience and be adaptive to the environment. In this work we attempt to incorporate these properties into the development of an IR model to improve document retrieval. We investigate the applicability of concept lattices, which are based on the theory of Formal Concept Analysis (FCA), to the representation of documents. This allows the use of more elegant representation units, as opposed to keywords, in order to better capture concepts/ideas expressed in natural language text. We also investigate the use of a reinforcement leaming strategy to learn and improve document representations, based on the information present in query statements and user relevance feedback. Features or concepts of each document/query, formulated using FCA, are weighted separately with respect to the documents they are in, and organised into separate concept lattices according to a subsumption relation. Furthen-nore, each concept lattice is encoded in a two-layer neural network structure known as a Bidirectional Associative Memory (BAM), for efficient manipulation of the concepts in the lattice representation. This avoids implementation drawbacks faced by other FCA-based approaches. Retrieval of a document for an information need is based on concept matching between concept lattice representations of a document and a query. The learning strategy works by making the similarity of relevant documents stronger and non-relevant documents weaker for each query, depending on the relevance judgements of the users on retrieved documents. Our approach is radically different to existing FCA-based approaches in the following respects: concept formulation; weight assignment to object-attribute pairs; the representation of each document in a separate concept lattice; and encoding concept lattices in BAM structures. Furthermore, in contrast to the traditional relevance feedback mechanism, our learning strategy makes use of relevance feedback information to enhance document representations, thus making the document representations dynamic and adaptive to the user interactions. The results obtained on the CISI, CACM and ASLIB Cranfield collections are presented and compared with published results. In particular, the performance of the system is shown to improve significantly as the system learns from experience.The School of Computing, University of Plymouth, UK

    Term selection in information retrieval

    Get PDF
    Systems trained on linguistically annotated data achieve strong performance for many language processing tasks. This encourages the idea that annotations can improve any language processing task if applied in the right way. However, despite widespread acceptance and availability of highly accurate parsing software, it is not clear that ad hoc information retrieval (IR) techniques using annotated documents and requests consistently improve search performance compared to techniques that use no linguistic knowledge. In many cases, retrieval gains made using language processing components, such as part-of-speech tagging and head-dependent relations, are offset by significant negative effects. This results in a minimal positive, or even negative, overall impact for linguistically motivated approaches compared to approaches that do not use any syntactic or domain knowledge. In some cases, it may be that syntax does not reveal anything of practical importance about document relevance. Yet without a convincing explanation for why linguistic annotations fail in IR, the intuitive appeal of search systems that ‘understand’ text can result in the repeated application, and mis-application, of language processing to enhance search performance. This dissertation investigates whether linguistics can improve the selection of query terms by better modelling the alignment process between natural language requests and search queries. It is the most comprehensive work on the utility of linguistic methods in IR to date. Term selection in this work focuses on identification of informative query terms of 1-3 words that both represent the semantics of a request and discriminate between relevant and non-relevant documents. Approaches to word association are discussed with respect to linguistic principles, and evaluated with respect to semantic characterization and discriminative ability. Analysis is organised around three theories of language that emphasize different structures for the identification of terms: phrase structure theory, dependency theory and lexicalism. The structures identified by these theories play distinctive roles in the organisation of language. Evidence is presented regarding the value of different methods of word association based on these structures, and the effect of method and term combinations. Two highly effective, novel methods for the selection of terms from verbose queries are also proposed and evaluated. The first method focuses on the semantic phenomenon of ellipsis with a discriminative filter that leverages diverse text features. The second method exploits a term ranking algorithm, PhRank, that uses no linguistic information and relies on a network model of query context. The latter focuses queries so that 1-5 terms in an unweighted model achieve better retrieval effectiveness than weighted IR models that use up to 30 terms. In addition, unlike models that use a weighted distribution of terms or subqueries, the concise terms identified by PhRank are interpretable by users. Evaluation with newswire and web collections demonstrates that PhRank-based query reformulation significantly improves performance of verbose queries up to 14% compared to highly competitive IR models, and is at least as good for short, keyword queries with the same models. Results illustrate that linguistic processing may help with the selection of word associations but does not necessarily translate into improved IR performance. Statistical methods are necessary to overcome the limits of syntactic parsing and word adjacency measures for ad hoc IR. As a result, probabilistic frameworks that discover, and make use of, many forms of linguistic evidence may deliver small improvements in IR effectiveness, but methods that use simple features can be substantially more efficient and equally, or more, effective. Various explanations for this finding are suggested, including the probabilistic nature of grammatical categories, a lack of homomorphism between syntax and semantics, the impact of lexical relations, variability in collection data, and systemic effects in language systems

    Técnicas evolutivas para la extracción automática de conocimiento

    Get PDF
    Esta línea de investigación propone el diseño, desarrollo y evaluación de técnicas automáticas para extracción de conocimiento, de tal forma que sean capaces de sobrellevar la búsqueda dentro de grandes espacios de información. Para ello se propone, en primera instancia, la resolución de un problema de interés general: el de reformulación automática de consultas. Una resolución automática para este problema podría ser utilizada en diversas aplicaciones, tales como monitorear un tópico de interés, especificar trackers temáticos sobre redes sociales, identificar entidades y relaciones entre entidades en grandes corpus de documentos o recolectar material para portales temáticos. Por sus características (alta dimensionalidad del espacio de búsqueda, carencia de subestructura optima, posibilidad de aprovechamiento de múltiples soluciones) el uso de computación evolutiva parece adecuado para abordar su resolución. Un primer aporte de esta línea dentro del área radica en la consideración de la in- corporación de operadores booleanos y otro tipo de modificadores a las consultas reformuladas y el control de la diversidad, ambos pensados como un mecanismo para lograr mayor expresión en las consultas y, por lo tanto, mayor poder para expresar los conceptos de interés involucrados. El segundo aporte consiste en proponer un marco de evaluación adecuado para la metodología desarrollada y el estudio y comparación con otras técnicas. Por último, el aporte final aborda la aplicación de los métodos desarrollados en dominios específicos tales como bioinformática (e.g. para identificación de interacciones entre entidades biológicas) o redes sociales (e.g. para realizar minería de opiniones mediante trackers temáticos).Eje: Agentes y Sistemas InteligentesRed de Universidades con Carreras en Informática (RedUNCI
    • …
    corecore