17,985 research outputs found

    Acquiring Correct Knowledge for Natural Language Generation

    Full text link
    Natural language generation (NLG) systems are computer software systems that produce texts in English and other human languages, often from non-linguistic input data. NLG systems, like most AI systems, need substantial amounts of knowledge. However, our experience in two NLG projects suggests that it is difficult to acquire correct knowledge for NLG systems; indeed, every knowledge acquisition (KA) technique we tried had significant problems. In general terms, these problems were due to the complexity, novelty, and poorly understood nature of the tasks our systems attempted, and were worsened by the fact that people write so differently. This meant in particular that corpus-based KA approaches suffered because it was impossible to assemble a sizable corpus of high-quality consistent manually written texts in our domains; and structured expert-oriented KA techniques suffered because experts disagreed and because we could not get enough information about special and unusual cases to build robust systems. We believe that such problems are likely to affect many other NLG systems as well. In the long term, we hope that new KA techniques may emerge to help NLG system builders. In the shorter term, we believe that understanding how individual KA techniques can fail, and using a mixture of different KA techniques with different strengths and weaknesses, can help developers acquire NLG knowledge that is mostly correct

    Effects of corpus-based instruction on phraseology in learner English

    Get PDF
    This study analyses the effects of data-driven learning (DDL) on the phraseology used by 223 English students at an Italian university. The students studied the genre of opinion survey reports through paper-based and hands-on exploration of a reference corpus. They then wrote their own report and a learner corpus of these texts was compiled. A contrastive interlanguage analysis approach (Granger, 2002) was adopted to compare the phraseology of key items in the learner corpus with that found in the reference corpus. Comparison is also made with a learner corpus of reports produced by a previous cohort of students who had not used the reference corpus. Students who had done DDL tasks used a wider range of genre-appropriate phraseology and produced a lower number of stock phrases than those who had not. The study also finds evidence that students use more phrases encountered in paper-based concordancing tasks than in hands-on tasks.Unlike in previous DDL studies, observations of the learning of a specific text-type through DDL in the present study are based on the comparison with both a control learner corpus and an expert corpus.The study also considers the use of DDL with a large class size

    Token-based typology and word order entropy: A study based on universal dependencies

    No full text
    The present paper discusses the benefits and challenges of token-based typology, which takes into account the frequencies of words and constructions in language use. This approach makes it possible to introduce new criteria for language classification, which would be difficult or impossible to achieve with the traditional, type-based approach. This point is illustrated by several quantitative studies of word order variation, which can be measured as entropy at different levels of granularity. I argue that this variation can be explained by general functional mechanisms and pressures, which manifest themselves in language use, such as optimization of processing (including avoidance of ambiguity) and grammaticalization of predictable units occurring in chunks. The case studies are based on multilingual corpora, which have been parsed using the Universal Dependencies annotation scheme

    Re-imagining French lexicography: The dictionnaire vivant de la langue française

    Get PDF
    The Dictionnaire vivant de la langue française (DVLF), developed by The ARTFL Project at the University of Chicago, represents an experimental, interactive, and community-based approach to French lexicography. The DVLF enables broad public access to a wide variety of linguistic tools and resources, with the goal of changing user interaction with dictionaries and providing better descriptions of emergent word use. In this article we describe the history of the DVLF and provide a survey of similar community-oriented electronic dictionaries. We then proceed to a presentation of the dictionary’s many features, including the variety of its definitions and mechanisms for user interaction. The article concludes with a discussion of ARTFL’s plans for the future developement of the DVLF
    • …
    corecore