19 research outputs found

    The TechQA Dataset

    Full text link
    We introduce TechQA, a domain-adaptation question answering dataset for the technical support domain. The TechQA corpus highlights two real-world issues from the automated customer support domain. First, it contains actual questions posed by users on a technical forum, rather than questions generated specifically for a competition or a task. Second, it has a real-world size -- 600 training, 310 dev, and 490 evaluation question/answer pairs -- thus reflecting the cost of creating large labeled datasets with actual data. Consequently, TechQA is meant to stimulate research in domain adaptation rather than being a resource to build QA systems from scratch. The dataset was obtained by crawling the IBM Developer and IBM DeveloperWorks forums for questions with accepted answers that appear in a published IBM Technote---a technical document that addresses a specific technical issue. We also release a collection of the 801,998 publicly available Technotes as of April 4, 2019 as a companion resource that might be used for pretraining, to learn representations of the IT domain language.Comment: Long version of conference paper to be submitte

    Speech Communication

    Get PDF
    Contains reports on five research projects.C.J. Lebel FellowshipNational Institutes of Health (Grant 5 T32 NS07040)National Institutes of Health (Grant 5 R01 NS04332)National Science Foundation (Grant 1ST 80-17599)U.S. Navy - Naval Electronic Systems Command Contract (N00039-85-C-0254)U.S. Navy - Naval Electronic Systems Command Contract (N00039-85-C-0341)U.S. Navy - Naval Electronic Systems Command Contract (N00039-85-C-0290

    Speech Communication

    Get PDF
    Contains table of contents for Part IV, table of contents for Section 1 and reports on five research projects.Apple Computer, Inc.C.J. Lebel FellowshipNational Institutes of Health (Grant T32-NS07040)National Institutes of Health (Grant R01-NS04332)National Institutes of Health (Grant R01-NS21183)National Institutes of Health (Grant P01-NS23734)U.S. Navy / Naval Electronic Systems Command (Contract N00039-85-C-0254)U.S. Navy - Office of Naval Research (Contract N00014-82-K-0727

    Speech Communication

    Get PDF
    Contains reports on five research projects.C.J. Lebel FellowshipNational Institutes of Health (Grant 5 T32 NSO7040)National Institutes of Health (Grant 5 R01 NS04332)National Institutes of Health (Grant 5 R01 NS21183)National Institutes of Health (Grant 5 P01 NS13126)National Institutes of Health (Grant 1 PO1-NS23734)National Science Foundation (Grant BNS 8418733)U.S. Navy - Naval Electronic Systems Command (Contract N00039-85-C-0254)U.S. Navy - Naval Electronic Systems Command (Contract N00039-85-C-0341)U.S. Navy - Naval Electronic Systems Command (Contract N00039-85-C-0290)National Institutes of Health (Grant RO1-NS21183), subcontract with Boston UniversityNational Institutes of Health (Grant 1 PO1-NS23734), subcontract with the Massachusetts Eye and Ear Infirmar

    Creating word-level language models for large-vocabulary handwriting recognition

    No full text

    Confidence-Scoring Post-Processing for Off-Line Handwritten-Character Recognition Verification

    No full text
    We apply confidence-scoring techniques to verify the output of an off-line handwritten-character recognizer. We evaluate a variety of scoring functions, including likelihood ratios and estimated posterior probabilities of correctness, in a post-processing mode, to generate confidence scores. Using the post-processor in conjunction with a neural-netbased recognizer, on mixed-case letters, receiver-operatingcharacteristic (ROC) curves reveal that our post-processor is able to reject correctly 90% of recognizer errors while only falsely rejecting 18.6% of correctly-recognized letters. For isolated-digit recognition, we achieve a correct rejection rate of 95% while keeping false rejection down to 8.7%. 1

    Classifier combination techniques applied to coreference resolution

    No full text
    This paper examines the applicability of classifier combination approaches such as bagging and boosting for coreference resolution. To the best of our knowledge, this is the first effort that utilizes such techniques for coreference resolution. In this paper, we provide experimental evidence which indicates that the accuracy of the coreference engine can potentially be increased by use of bagging and boosting methods, without any additional features or training data. We implement and evaluate combination techniques at the mention, entity and document level, and also address issues like entity alignment, that are specific to coreference resolution.
    corecore