85 research outputs found

    A Comparative analysis: QA evaluation questions versus real-world queries

    Get PDF
    This paper presents a comparative analysis of user queries to a web search engine, questions to a Q&A service (answers.com), and questions employed in question answering (QA) evaluations at TREC and CLEF. The analysis shows that user queries to search engines contain mostly content words (i.e. keywords) but lack structure words (i.e. stopwords) and capitalization. Thus, they resemble natural language input after case folding and stopword removal. In contrast, topics for QA evaluation and questions to answers.com mainly consist of fully capitalized and syntactically well-formed questions. Classification experiments using a na¨ıve Bayes classifier show that stopwords play an important role in determining the expected answer type. A classification based on stopwords is considerably more accurate (47.5% accuracy) than a classification based on all query words (40.1% accuracy) or on content words (33.9% accuracy). To simulate user input, questions are preprocessed by case folding and stopword removal. Additional classification experiments aim at reconstructing the syntactic wh-word frame of a question, i.e. the embedding of the interrogative word. Results indicate that this part of questions can be reconstructed with moderate accuracy (25.7%), but for a classification problem with a much larger number of classes compared to classifying queries by expected answer type (2096 classes vs. 130 classes). Furthermore, eliminating stopwords can lead to multiple reconstructed questions with a different or with the opposite meaning (e.g. if negations or temporal restrictions are included). In conclusion, question reconstruction from short user queries can be seen as a new realistic evaluation challenge for QA systems

    Rule Generation Based On Structural Clustering For Automatic Question Answering

    Get PDF
    In rule-based methods for Question-Answering (QA) research, typical rule discovery techniques are based on structural pattern overlapping and lexical information. These usually result in rules that may require further interpretation and rules that may be redundant. To address these issues, an automatic structural rule generation algorithm is presented via clustering, where a center sentence-based clustering method is designed to automatically generate rules for QA systems

    The Web as a Resource for Question Answering: Perspectives and Challenges

    Get PDF
    The vast amounts of information readily available on the World Wide Web can be effectively used for question answering in two fundamentally different ways. In the federated approach, techniques for handling semistructured data are applied to access Web sources as if they were databases, allowing large classes of common questions to be answered uniformly. In the distributed approach, largescale text-processing techniques are used to extract answers directly from unstructured Web documents. Because the Web is orders of magnitude larger than any human-collected corpus, question answering systems can capitalize on its unparalleled-levels of data redundancy. Analysis of real-world user questions reveals that the federated and distributed approaches complement each other nicely, suggesting a hybrid approach in future question answering systems

    A rule-based question answering system on relevant documents of Indonesian Quran translation

    Get PDF
    This paper presents work in development of a question answering (QA) system by using a combination of two different architectures i.e. the one used relevant documents and another used rule-based method, which those two contribute for answer extraction. Base on previous researches testing result, it could be inferred that each of the methods could be a complement for another method in order to increase system performance. This QA was purposed to gather information from Indonesian Quran Translation. The new architecture was designed to gather relevant documents toward the keywords and be used subsequentially to gather answer candidates by using rule-based method. The initial results indicate that system still restricted with retrieved relevant documents, and caused delivering only 60% correct answers. This achievement is not better than the previous one that used rule-based method only

    Linguistische Aufbereitung von Personendaten und Repräsentation durch ein Question Answering System

    Get PDF
    Im Rahmen des 'Wortschatz Projektes' der Universität Leipzig wurden mit Hilfe von online-Zeitschriften und anderen Quellen Informationen über bekannte Persönlichkeiten gesammelt und verwaltet. Auf diesen Daten wurde ein Question Answering System aufgebaut, welches zu vorgegebenen Fragen automatisch generierte Antworten zurückliefert. Grundvoraussetzung war die Erzeugung einer neuen Datenbankstruktur und die Transformation der vorhandenen Daten in ein geeignetes Format. Namenskategorien wie Vornamen, Nachnamen, Spitznamen etc. sowie Unifizierungsmöglichkeiten für Personen mussten erkannt werden. Ebenfalls entwickelt wurde ein Verfahren zur automatischen Satzgenerierung basierend auf der Verarbeitung externer Daten und einer Grammatik vorgegebener Gestalt. Es stellte sich heraus, dass das vorgeschlagene System weitgehend sprachunabhängig arbeitet und nicht nur auf Personendaten angewendet werden kann

    Open-domain surface-based question answering system

    Get PDF
    This paper considers a surface-based question answering system for an open- domain solution. It analyzes the current progress that has been done in this area so far, while as well describes a methodology of answering questions by using information retrieved from very large collection of text. The solution proposed is based on indexing techniques and surface-based natural language processing that identify paragraphs from which an answer can be extracted. Although this approach would not solve all the problems associated with this task the objective is to provide a solution that is feasible, achieves reasonable accuracy and can return an answer in an acceptable time limit. Various techniques are discussed including question analysis, question reformulation, term extraction, answer extraction and other methods for answer pinpointing. Besides this further research in question answering is identified, especially in the area of handling answers that require reasoning.peer-reviewe
    corecore