36 research outputs found

    Wikipedia-Based Semantic Enhancements for Information Nugget Retrieval

    Get PDF
    When the objective of an information retrieval task is to return a nugget rather than a document, query terms that exist in a document often will not be used in the most relevant nugget in the document for the query. In this thesis a new method of query expansion is proposed based on the Wikipedia link structure surrounding the most relevant articles selected either automatically or by human assessors for the query. Evaluated with the Nuggeteer automatic scoring software, which we show to have a high correlation with human assessor scores for the ciQA 2006 topics, an increase in the F-scores is found from the TREC Complex Interactive Question Answering task when integrating this expansion into an already high-performing baseline system. In addition, the method for finding synonyms using Wikipedia is evaluated using more common synonym detection tasks

    Formulating Complex Queries Using Templates

    Get PDF
    While many users have relatively general information needs, users who are familiar with a certain topic may have more specific or complex information needs. Such users already have some knowledge of a subject and its concepts, and they need to find information on a specific aspect of a certain entity, such as its cause, effect, and relationships between entities. To successfully resolve this kind of complex information needs, in our study, we investigated the effectiveness of topic-independent query templates as a tool for assisting users in articulating their information needs. A set of query templates, which were written in the form of fill-in-the-blanks was designed to represent general semantic relationships between concepts, such as cause-effect and problem-solution. To conduct the research, we designed a control interface with a single query textbox and an experimental interface with the query templates. A user study was performed with 30 users. Okapi information retrieval system was used to retrieve documents in response to the users’ queries. The analysis in this paper indicates that while users found the template-based query formulation less easy to use, the queries written using templates performed better than the queries written using the control interface with one query textbox. Our analysis of a group of users and some specific topics demonstrates that the experimental interface tended to help users create more detailed search queries and the users were able to think about different aspects of their complex information needs and fill in many templates. In the future, an interesting research direction would be to tune the templates, adapting them to users’ specific query requests and avoiding showing non-relevant templates to users by automatically selecting related templates from a larger set of templates

    Take your time first, time your search later: How college students perceive time in Web searching

    Full text link
    This study explores people's perception of time during their Web searches. Time is a major component of the context for information behavior, but in empirical studies it has been implied rather than investigated explicitly. The data were collected from Web search experiments in which participants were asked to conduct searches on three given tasks under differing search time conditions. The paper reports on findings drawn primarily from the exit interviews of 45 undergraduate and graduate students on their perception of time in Web searching. Study results indicate that at the beginning of their searching activity, participants did not explicitly consider temporal issues. However, these issues usually surface with the passage of time, especially when searches fail to go as planned. Perception of time is closely entangled with familiarity and difficulty of search task. In general, participants enjoyed spending time in searching and were not excessively concerned about time constraints. On the other hand, participants' affective experiences were sometimes caused by temporal issues. In conclusion, the study results indicate that temporal issues interlace with other contextual and affective factors in the process of Web searching.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/78317/1/1450460253_ftp.pd

    Similarity measures and diversity rankings for query-focused sentence extraction

    Get PDF
    Query-focused sentence extraction generally refers to an extractive approach to select a set of sentences that responds to a specific information need. It is one of the major approaches employed in multi-document summarization, focused summarization, and complex question answering. The major advantage of most extractive methods over the natural language processing (NLP) intensive methods is that they are relatively simple, theoretically sound – drawing upon several supervised and unsupervised learning techniques, and often produce equally strong empirical performance. Many research areas, including information retrieval and text mining, have recently moved toward the extractive query-focused sentence generation as its outputs have great potential to support every day‟s information seeking activities. Particularly, as more information have been created and stored online, extractive-based summarization systems may quickly utilize several ubiquitous resources, such as Google search results and social medias, to extract summaries to answer users‟ queries.This thesis explores how the performance of sentence extraction tasks can be improved to create higher quality outputs. Specifically, two major areas are investigated. First, we examine the issue of natural language variation which affects the similarity judgment of sentences. As sentences are much shorter than documents, they generally contain fewer occurring words. Moreover, the similarity notions of sentences are different than those of documents as they tend to be very specific in meanings. Thus many document-level similarity measures are likely to perform well at this level. In this work, we address these issues in two application domains. First, we present a hybrid method, utilizing both unsupervised and supervised techniques, to compute the similarity of interrogative sentences for factoid question reuse. Next, we propose a novel structural similarity measure based on sentence semantics for paraphrase identification and textual entailment recognition tasks. The empirical evaluations suggest the effectiveness of the proposed methods in improving the accuracy of sentence similarity judgments.Furthermore, we examine the effects of the proposed similarity measure in two specific sentence extraction tasks, focused summarization and complex question answering. In conjunction with the proposed similarity measure, we also explore the issues of novelty, redundancy, and diversity in sentence extraction. To that end, we present a novel approach to promote diversity of extracted sets of sentences based on the negative endorsement principle. Negative-signed edges are employed to represent a redundancy relation between sentence nodes in graphs. Then, sentences are reranked according to the long-term negative endorsements from random walk. Additionally, we propose a unified centrality ranking and diversity ranking based on the aforementioned principle. The results from a comprehensive evaluation confirm that the proposed methods perform competitively, compared to many state-of-the-art methods.Ph.D., Information Science -- Drexel University, 201

    Finding Answers to Definition Questions Using Web Knowledge Bases

    Get PDF
    PACLIC 23 / City University of Hong Kong / 3-5 December 200

    Information fusion for automated question answering

    Get PDF
    Until recently, research efforts in automated Question Answering (QA) have mainly focused on getting a good understanding of questions to retrieve correct answers. This includes deep parsing, lookups in ontologies, question typing and machine learning of answer patterns appropriate to question forms. In contrast, I have focused on the analysis of the relationships between answer candidates as provided in open domain QA on multiple documents. I argue that such candidates have intrinsic properties, partly regardless of the question, and those properties can be exploited to provide better quality and more user-oriented answers in QA.Information fusion refers to the technique of merging pieces of information from different sources. In QA over free text, it is motivated by the frequency with which different answer candidates are found in different locations, leading to a multiplicity of answers. The reason for such multiplicity is, in part, the massive amount of data used for answering, and also its unstructured and heterogeneous content: Besides am¬ biguities in user questions leading to heterogeneity in extractions, systems have to deal with redundancy, granularity and possible contradictory information. Hence the need for answer candidate comparison. While frequency has proved to be a significant char¬ acteristic of a correct answer, I evaluate the value of other relationships characterizing answer variability and redundancy.Partially inspired by recent developments in multi-document summarization, I re¬ define the concept of "answer" within an engineering approach to QA based on the Model-View-Controller (MVC) pattern of user interface design. An "answer model" is a directed graph in which nodes correspond to entities projected from extractions and edges convey relationships between such nodes. The graph represents the fusion of information contained in the set of extractions. Different views of the answer model can be produced, capturing the fact that the same answer can be expressed and pre¬ sented in various ways: picture, video, sound, written or spoken language, or a formal data structure. Within this framework, an answer is a structured object contained in the model and retrieved by a strategy to build a particular view depending on the end user (or taskj's requirements.I describe shallow techniques to compare entities and enrich the model by discovering four broad categories of relationships between entities in the model: equivalence, inclusion, aggregation and alternative. Quantitatively, answer candidate modeling im¬ proves answer extraction accuracy. It also proves to be more robust to incorrect answer candidates than traditional techniques. Qualitatively, models provide meta-information encoded by relationships that allow shallow reasoning to help organize and generate the final output
    corecore