17,985 research outputs found
Acquiring Correct Knowledge for Natural Language Generation
Natural language generation (NLG) systems are computer software systems that
produce texts in English and other human languages, often from non-linguistic
input data. NLG systems, like most AI systems, need substantial amounts of
knowledge. However, our experience in two NLG projects suggests that it is
difficult to acquire correct knowledge for NLG systems; indeed, every knowledge
acquisition (KA) technique we tried had significant problems. In general terms,
these problems were due to the complexity, novelty, and poorly understood
nature of the tasks our systems attempted, and were worsened by the fact that
people write so differently. This meant in particular that corpus-based KA
approaches suffered because it was impossible to assemble a sizable corpus of
high-quality consistent manually written texts in our domains; and structured
expert-oriented KA techniques suffered because experts disagreed and because we
could not get enough information about special and unusual cases to build
robust systems. We believe that such problems are likely to affect many other
NLG systems as well. In the long term, we hope that new KA techniques may
emerge to help NLG system builders. In the shorter term, we believe that
understanding how individual KA techniques can fail, and using a mixture of
different KA techniques with different strengths and weaknesses, can help
developers acquire NLG knowledge that is mostly correct
Effects of corpus-based instruction on phraseology in learner English
This study analyses the effects of data-driven learning (DDL) on the phraseology used by 223 English students at an Italian university. The students studied the genre of opinion survey reports through paper-based and hands-on exploration of a reference corpus. They then wrote their own report and a learner corpus of these texts was compiled. A contrastive interlanguage analysis approach (Granger, 2002) was adopted to compare the phraseology of key items in the learner corpus with that found in the reference corpus. Comparison is also made with a learner corpus of reports produced by a previous cohort of students who had not used the reference corpus. Students who had done DDL tasks used a wider range of genre-appropriate phraseology and produced a lower number of stock phrases than those who had not. The study also finds evidence that students use more phrases encountered in paper-based concordancing tasks than in hands-on tasks.Unlike in previous DDL studies, observations of the learning of a specific text-type through DDL in the present study are based on the comparison with both a control learner corpus and an expert corpus.The study also considers the use of DDL with a large class size
Token-based typology and word order entropy: A study based on universal dependencies
The present paper discusses the benefits and challenges of token-based typology, which takes into account the frequencies of words and constructions in language use. This approach makes it possible to introduce new criteria for language classification, which would be difficult or impossible to achieve with the traditional, type-based approach. This point is illustrated by several quantitative studies of word order variation, which can be measured as entropy at different levels of granularity. I argue that this variation can be explained by general functional mechanisms and pressures, which manifest themselves in language use, such as optimization of processing (including avoidance of ambiguity) and grammaticalization of predictable units occurring in chunks. The case studies are based on multilingual corpora, which have been parsed using the Universal Dependencies annotation scheme
Re-imagining French lexicography: The dictionnaire vivant de la langue française
The Dictionnaire vivant de la langue française (DVLF), developed by The ARTFL
Project at the University of Chicago, represents an experimental, interactive, and
community-based approach to French lexicography. The DVLF enables broad
public access to a wide variety of linguistic tools and resources, with the goal of
changing user interaction with dictionaries and providing better descriptions of
emergent word use. In this article we describe the history of the DVLF and provide
a survey of similar community-oriented electronic dictionaries. We then proceed
to a presentation of the dictionary’s many features, including the variety of
its definitions and mechanisms for user interaction. The article concludes with
a discussion of ARTFL’s plans for the future developement of the DVLF
- …