11,960 research outputs found

    Resource Constrained Structured Prediction

    Full text link
    We study the problem of structured prediction under test-time budget constraints. We propose a novel approach applicable to a wide range of structured prediction problems in computer vision and natural language processing. Our approach seeks to adaptively generate computationally costly features during test-time in order to reduce the computational cost of prediction while maintaining prediction performance. We show that training the adaptive feature generation system can be reduced to a series of structured learning problems, resulting in efficient training using existing structured learning algorithms. This framework provides theoretical justification for several existing heuristic approaches found in literature. We evaluate our proposed adaptive system on two structured prediction tasks, optical character recognition (OCR) and dependency parsing and show strong performance in reduction of the feature costs without degrading accuracy

    Parsing of Spoken Language under Time Constraints

    Get PDF
    Spoken language applications in natural dialogue settings place serious requirements on the choice of processing architecture. Especially under adverse phonetic and acoustic conditions parsing procedures have to be developed which do not only analyse the incoming speech in a time-synchroneous and incremental manner, but which are able to schedule their resources according to the varying conditions of the recognition process. Depending on the actual degree of local ambiguity the parser has to select among the available constraints in order to narrow down the search space with as little effort as possible. A parsing approach based on constraint satisfaction techniques is discussed. It provides important characteristics of the desired real-time behaviour and attempts to mimic some of the attention focussing capabilities of the human speech comprehension mechanism.Comment: 19 pages, LaTe

    Acquiring Word-Meaning Mappings for Natural Language Interfaces

    Full text link
    This paper focuses on a system, WOLFIE (WOrd Learning From Interpreted Examples), that acquires a semantic lexicon from a corpus of sentences paired with semantic representations. The lexicon learned consists of phrases paired with meaning representations. WOLFIE is part of an integrated system that learns to transform sentences into representations such as logical database queries. Experimental results are presented demonstrating WOLFIE's ability to learn useful lexicons for a database interface in four different natural languages. The usefulness of the lexicons learned by WOLFIE are compared to those acquired by a similar system, with results favorable to WOLFIE. A second set of experiments demonstrates WOLFIE's ability to scale to larger and more difficult, albeit artificially generated, corpora. In natural language acquisition, it is difficult to gather the annotated data needed for supervised learning; however, unannotated data is fairly plentiful. Active learning methods attempt to select for annotation and training only the most informative examples, and therefore are potentially very useful in natural language applications. However, most results to date for active learning have only considered standard classification tasks. To reduce annotation effort while maintaining accuracy, we apply active learning to semantic lexicons. We show that active learning can significantly reduce the number of annotated examples required to achieve a given level of performance

    Evaluation of the NLP Components of the OVIS2 Spoken Dialogue System

    Full text link
    The NWO Priority Programme Language and Speech Technology is a 5-year research programme aiming at the development of spoken language information systems. In the Programme, two alternative natural language processing (NLP) modules are developed in parallel: a grammar-based (conventional, rule-based) module and a data-oriented (memory-based, stochastic, DOP) module. In order to compare the NLP modules, a formal evaluation has been carried out three years after the start of the Programme. This paper describes the evaluation procedure and the evaluation results. The grammar-based component performs much better than the data-oriented one in this comparison.Comment: Proceedings of CLIN 9

    From Query to Usable Code: An Analysis of Stack Overflow Code Snippets

    Full text link
    Enriched by natural language texts, Stack Overflow code snippets are an invaluable code-centric knowledge base of small units of source code. Besides being useful for software developers, these annotated snippets can potentially serve as the basis for automated tools that provide working code solutions to specific natural language queries. With the goal of developing automated tools with the Stack Overflow snippets and surrounding text, this paper investigates the following questions: (1) How usable are the Stack Overflow code snippets? and (2) When using text search engines for matching on the natural language questions and answers around the snippets, what percentage of the top results contain usable code snippets? A total of 3M code snippets are analyzed across four languages: C\#, Java, JavaScript, and Python. Python and JavaScript proved to be the languages for which the most code snippets are usable. Conversely, Java and C\# proved to be the languages with the lowest usability rate. Further qualitative analysis on usable Python snippets shows the characteristics of the answers that solve the original question. Finally, we use Google search to investigate the alignment of usability and the natural language annotations around code snippets, and explore how to make snippets in Stack Overflow an adequate base for future automatic program generation.Comment: 13th IEEE/ACM International Conference on Mining Software Repositories, 11 page
    • …
    corecore