    Finding Answers to Complex Questions

    In this chapter, we motivate one potential type of future QA system that deals with questions more complex than simple factoid questions and which provides answers with their supporting context. Our approach is based on the issues we faced when developing and delivering a QA system to deal with real time questions in the domain of RLVs within the larger field of aerospace engineering. This particular domain, the actual users of the system, and the questions asked, all demanded a change in our question-answering strategy. First, the chapter will present background on the project that provided the context and a description of the system that was deployed. Next, the chapter analyzes the questions put to the system by the users and discusses the implications that this analysis and the user evaluation study had on our design of a QA system of the future

    Empirical selection of NLP-driven document representations for text categorization

    Text Categorization is the task of assigning predefined labels to textual documents. Current research in this field has been focused on using word based document representations called bag-of-words (BOW) with strong statistical learners. Few studies have explored the use of more complex Natural Language Processing (NLP) driven representations based on phrases, proper names and word senses. None of these had definitive results on these features\u27 benefits for text categorization problems. This dissertation extensively studies the use of NLP-driven document representations captured at many different levels of language processing for text categorization, and shows that NLP-driven document representations improve text categorization. A methodology, called Empirical Selection Methodology for NLP-driven document representations , was developed to select document representations for each category in the categorization problem. A highly configurable software system was developed to create document representations and carry out experiments. The methodology has been tested on two widely used text categorization evaluation datasets, and showed that statistical learners generalize better with the help of NLP-driven document representations


    Bu çalışmada, bilgi erişimi araştırması için tasarlanmış açık kaynak kodlu bir araç olan Lemur kullanılarak, Türkçe dili için hazırlanmış TREC benzeri bir derlem üzerinde otomatik indeksleme ve geri getirme deneyleri gerçekleştirildi. Bilgi erişiminde dil modelleme yaklaşımı başta olmak üzere Lemur tarafından desteklenen üç geri getirme modeli ve dile özgü ön işleme teknikleri araştırıldı. Deneylerimiz, dile özgü ön işleme tekniklerinin tüm geri getirim modelleri için geri getirme performansını artırdığını gösterdi. Ayrıca Türkçe dili için en iyi performans dil modelleme yaklaşımından elde edildi

    Leveraging one-class SVM and semantic analysis to detect anomalous content

    Experiments were conducted to test several hypotheses on methods for improving document classification for the malicious insider threat problem within the Intelligence Community. Bag-of-words (BOW) representations of documents were compared to Natural Language Processing (NLP) based representations in both the typical and one-class classification problems using the Support Vector Machine algorithm. Results show that the NLP features significantly improved classifier performance over the BOW approach both in terms of precision and recall, while using many fewer features. The one-class algorithm using NLP features demonstrated robustness when tested on new domains.

    MetaExtract: an NLP system to automatically assign metadata

    We have developed MetaExtract, a system to automatically assign Dublin Core + GEM metadata using extraction techniques from our natural language processing research. MetaExtract is comprised of three distinct processes: eQuery and HTML-based Extraction modules and a Keyword Generator module. We conducted a Web-based survey to have users evaluate each metadata element’s quality. Only two of the elements, Title an

    Improved Document Representation for Classification Tasks For The Intelligence Community

    h monitors and detects anomalies in social behavior, and Composite, Role-based Monitoring which analyzes insider activity based on organizational, application, and operating system roles (DelZoppo et al., 2004). It is known from Subject Matter Experts (SMEs) from the IC that analysts operate within a mission-based context, focused mainly on specific topics of interest (TOIs) and geo-political areas of interest (AOIs) that are assigned based on their expertise and experience. The information that is accessed and/or produced by analysts ranges from news articles to analyst reports, official documents, email communications, query logs, etc, and the role and the task assigned to the analyst dictates their TOI / AOI, communication patterns, intelligence products and information systems needed, and the intelligence work products created. Within this mission-focused context, our hypothesis is that NLP-based semantic analysis of text, combined with ML-based text categorization of features pro

    What Do You Mean? Finding Answers to Complex Questions

    Get PDF
    This paper illustrates ongoing research and issues faced when dealing with real-time questions in the domain of Reusable Launch Vehicles (aerospace engineering). The question- answering system described in this paper is used in a collaborative learning environment with real users and live questions. The paper describes an analysis of these more complex questions as well as research to include the user in the question-answering process by implementing a question negotiation module based on the traditional reference interview