12 research outputs found

    Question Answering System for Yioop

    Get PDF
    Yioop is an open source search engine developed and managed by Dr. Christopher Pollett. Currently, Yioop returns the search results of the query in the form of list of URLs, just like other search engines (Google, Bing, DuckDuckGo, etc.) This paper created a new module for Yioop. This new module, known as the Question-Answering (QA) System, takes the search queries in the form of natural language questions and returns results in the form of a short answer that is appropriate to the question asked. This feature is achieved by implementing various functionalities of Natural Language Processing (NLP). By using NLP, the new Question-Answering (QA) System attempts to extract the necessary information from the query provided by the user and provides an appropriate answer from the available data

    Finding answers to questions, in text collections or web, in open domain or specialty domains

    Get PDF
    International audienceThis chapter is dedicated to factual question answering, i.e. extracting precise and exact answers to question given in natural language from texts. A question in natural language gives more information than a bag of word query (i.e. a query made of a list of words), and provides clues for finding precise answers. We will first focus on the presentation of the underlying problems mainly due to the existence of linguistic variations between questions and their answerable pieces of texts for selecting relevant passages and extracting reliable answers. We will first present how to answer factual question in open domain. We will also present answering questions in specialty domain as it requires dealing with semi-structured knowledge and specialized terminologies, and can lead to different applications, as information management in corporations for example. Searching answers on the Web constitutes another application frame and introduces specificities linked to Web redundancy or collaborative usage. Besides, the Web is also multilingual, and a challenging problem consists in searching answers in target language documents other than the source language of the question. For all these topics, we present main approaches and the remaining problems

    Answer Re-ranking with bilingual LDA and social QA forum corpus

    Get PDF
    One of the most important tasks for AI is to find valuable information from the Web. In this research, we develop a question answering system for retrieving answers based on a topic model, bilingual latent Dirichlet allocation (Bi-LDA), and knowledge from social question answering (SQA) forum, such as Yahoo! Answers. Regarding question and answer pairs from a SQA forum as a bilingual corpus, a shared topic over question and answer documents is assigned to each term so that the answer re-ranking system can infer the correlation of terms between questions and answers. A query expansion approach based on the topic model obtains a 9% higher top-150 mean reciprocal rank (MRR@150) and a 16% better geometric mean rank as compared to a simple matching system via Okapi/BM25. In addition, this thesis compares the performance in several experimental settings to clarify the factor of the result

    Integrating Web-based and corpus-based techniques for question answering

    No full text
    MIT CSAIL's entry in this year's TREC Question Answering track focused on integrating Web-based techniques with more traditional strategies based on document retrieval and named-entity detection. We believe that achieving high performance in the question answering task requires a combination of multiple strategies designed to capitalize on di#erent characteristics of various resources. The system we deployed for the TREC evaluation last year relied exclusively on the World Wide Web to answer factoid questions (Lin et al., 2002). The advantages that the Web o#ers are well known and have been exploited by previous systems (Brill et al., 2001; Clarke et al., 2001; Dumais et al., 2002). The immense amount of freely available unstructured text provides data redundancy, which can be leveraged with simple pattern matching techniques involving the expected answer formulations. In many ways, we can utilize huge quantities of data to overcome many thorny problems in natural langu

    Improvements to the complex question answering models

    Get PDF
    x, 128 leaves : ill. ; 29 cmIn recent years the amount of information on the web has increased dramatically. As a result, it has become a challenge for the researchers to find effective ways that can help us query and extract meaning from these large repositories. Standard document search engines try to address the problem by presenting the users a ranked list of relevant documents. In most cases, this is not enough as the end-user has to go through the entire document to find out the answer he is looking for. Question answering, which is the retrieving of answers to natural language questions from a document collection, tries to remove the onus on the end-user by providing direct access to relevant information. This thesis is concerned with open-domain complex question answering. Unlike simple questions, complex questions cannot be answered easily as they often require inferencing and synthesizing information from multiple documents. Hence, we considered the task of complex question answering as query-focused multi-document summarization. In this thesis, to improve complex question answering we experimented with both empirical and machine learning approaches. We extracted several features of different types (i.e. lexical, lexical semantic, syntactic and semantic) for each of the sentences in the document collection in order to measure its relevancy to the user query. We have formulated the task of complex question answering using reinforcement framework, which to our best knowledge has not been applied for this task before and has the potential to improve itself by fine-tuning the feature weights from user feedback. We have also used unsupervised machine learning techniques (random walk, manifold ranking) and augmented semantic and syntactic information to improve them. Finally we experimented with question decomposition where instead of trying to find the answer of the complex question directly, we decomposed the complex question into a set of simple questions and synthesized the answers to get our final result

    Answer extraction for simple and complex questions

    Get PDF
    xi, 214 leaves : ill. (some col.) ; 29 cm. --When a user is served with a ranked list of relevant documents by the standard document search engines, his search task is usually not over. He has to go through the entire document contents to find the precise piece of information he was looking for. Question answering, which is the retrieving of answers to natural language questions from a document collection, tries to remove the onus on the end-user by providing direct access to relevant information. This thesis is concerned with open-domain question answering. We have considered both simple and complex questions. Simple questions (i.e. factoid and list) are easier to answer than questions that have complex information needs and require inferencing and synthesizing information from multiple documents. Our question answering system for simple questions is based on question classification and document tagging. Question classification extracts useful information (i.e. answer type) about how to answer the question and document tagging extracts useful information from the documents, which is used in finding the answer to the question. For complex questions, we experimented with both empirical and machine learning approaches. We extracted several features of different types (i.e. lexical, lexical semantic, syntactic and semantic) for each of the sentences in the document collection in order to measure its relevancy to the user query. One hill climbing local search strategy is used to fine-tune the feature-weights. We also experimented with two unsupervised machine learning techniques: k-means and Expectation Maximization (EM) algorithms and evaluated their performance. For all these methods, we have shown the effects of different kinds of features

    Learning of a multilingual bitaxonomy of Wikipedia and its application to semantic predicates

    Get PDF
    The ability to extract hypernymy information on a large scale is becoming increasingly important in natural language processing, an area of the artificial intelligence which deals with the processing and understanding of natural language. While initial studies extracted this type of information from textual corpora by means of lexico-syntactic patterns, over time researchers moved to alternative, more structured sources of knowledge, such as Wikipedia. After the first attempts to extract is-a information fromWikipedia categories, a full line of research gave birth to numerous knowledge bases containing information which, however, is either incomplete or irremediably bound to English. To this end we put forward MultiWiBi, the first approach to the construction of a multilingual bitaxonomy which exploits the inner connection between Wikipedia pages and Wikipedia categories to induce a wide-coverage and fine-grained integrated taxonomy. A series of experiments show state-of-the-art results against all the available taxonomic resources available in the literature, also with respect to two novel measures of comparison. Another dimension where existing resources usually fall short is their degree of multilingualism. While knowledge is typically language agnostic, currently resources are able to extract relevant information only in languages providing highquality tools. In contrast, MultiWiBi does not leave any language behind: we show how to taxonomize Wikipedia in an arbitrary language and in a way that is fully independent of additional resources. At the core of our approach lies, in fact, the idea that the English version of Wikipedia can be linguistically exploited as a pivot to project the taxonomic information extracted from English to any other Wikipedia language in order to have a bitaxonomy in a second, arbitrary language; as a result, not only concepts which have an English equivalent are covered, but also those concepts which are not lexicalized in the source language. We also present the impact of having the taxonomized encyclopedic knowledge offered by MultiWiBi embedded into a semantic model of predicates (SPred) which crucially leverages Wikipedia to generalize collections of related noun phrases to infer a probability distribution over expected semantic classes. We applied SPred to a word sense disambiguation task and show that, when MultiWiBi is plugged in to replace an internal component, SPred’s generalization power increases as well as its precision and recall. Finally, we also published MultiWiBi as linked data, a paradigm which fosters interoperability and interconnection among resources and tools through the publication of data on the Web, and developed a public interface which lets the users navigate through MultiWiBi’s taxonomic structure in a graphical, captivating manner
    corecore