143,890 research outputs found
Acquiring Correct Knowledge for Natural Language Generation
Natural language generation (NLG) systems are computer software systems that
produce texts in English and other human languages, often from non-linguistic
input data. NLG systems, like most AI systems, need substantial amounts of
knowledge. However, our experience in two NLG projects suggests that it is
difficult to acquire correct knowledge for NLG systems; indeed, every knowledge
acquisition (KA) technique we tried had significant problems. In general terms,
these problems were due to the complexity, novelty, and poorly understood
nature of the tasks our systems attempted, and were worsened by the fact that
people write so differently. This meant in particular that corpus-based KA
approaches suffered because it was impossible to assemble a sizable corpus of
high-quality consistent manually written texts in our domains; and structured
expert-oriented KA techniques suffered because experts disagreed and because we
could not get enough information about special and unusual cases to build
robust systems. We believe that such problems are likely to affect many other
NLG systems as well. In the long term, we hope that new KA techniques may
emerge to help NLG system builders. In the shorter term, we believe that
understanding how individual KA techniques can fail, and using a mixture of
different KA techniques with different strengths and weaknesses, can help
developers acquire NLG knowledge that is mostly correct
Acquiring Word-Meaning Mappings for Natural Language Interfaces
This paper focuses on a system, WOLFIE (WOrd Learning From Interpreted
Examples), that acquires a semantic lexicon from a corpus of sentences paired
with semantic representations. The lexicon learned consists of phrases paired
with meaning representations. WOLFIE is part of an integrated system that
learns to transform sentences into representations such as logical database
queries. Experimental results are presented demonstrating WOLFIE's ability to
learn useful lexicons for a database interface in four different natural
languages. The usefulness of the lexicons learned by WOLFIE are compared to
those acquired by a similar system, with results favorable to WOLFIE. A second
set of experiments demonstrates WOLFIE's ability to scale to larger and more
difficult, albeit artificially generated, corpora. In natural language
acquisition, it is difficult to gather the annotated data needed for supervised
learning; however, unannotated data is fairly plentiful. Active learning
methods attempt to select for annotation and training only the most informative
examples, and therefore are potentially very useful in natural language
applications. However, most results to date for active learning have only
considered standard classification tasks. To reduce annotation effort while
maintaining accuracy, we apply active learning to semantic lexicons. We show
that active learning can significantly reduce the number of annotated examples
required to achieve a given level of performance
Acquiring and Using Limited User Models in NLG
It is a truism of NLG that good knowledge of the reader can improve the quality of generated texts, and many NLG systems have been developed that exploit detailed user models when generating texts. Unfortunately, it is very difficult in practice to obtain detailed information about users. In this paper we describe our experiences in acquiring and using limited user models for NLG in four different systems, each of which took a different approach to this issue. One general conclusion is that it is useful if imperfect user models are understandable to users or domain experts, and indeed perhaps can be directly edited by them; this agrees with recent thinking about user models in other applications such as intelligent tutoring systems (Kay, 2001)
Universal Grammar: Wittgenstein versus Chomsky
Daniele Moyal-Sharrock, āUniversal Grammar: Wittgenstein versus Chomskyā in M. A. Peters and J. Stickney, eds., A Companion to Wittgenstein on Education: Pedagogical Investigations (Singapore: Springer Verlag, 2017), ISBN: 9789811031342The motivations for the claim that language is innate are, for many, quite straightforward. The innateness of language is seen as the only way to solve the so-called 'logical problem of language acquisition': the mismatch between linguistic input and linguistic output. In this paper, I begin by unravelling several strands of the nativist argument, offering replies as I go along. I then give an outline of Wittgenstein's view of language acquisition, showing how it renders otiose problems posed by nativists like Chomsky ā not least by means of Wittgenstein's own brand of grammar which, unlike Chomsky's, does not reside in the brain, but in our practices.Peer reviewe
The adaptive advantage of symbolic theft over sensorimotor toil: Grounding language in perceptual categories
Using neural nets to simulate learning and the genetic algorithm to simulate evolution in a toy world of mushrooms and mushroom-foragers, we place two ways of acquiring categories into direct competition with one another: In (1) "sensorimotor toil,ā new categories are acquired through real-time, feedback-corrected, trial and error experience in sorting them. In (2) "symbolic theft,ā new categories are acquired by hearsay from propositions ā boolean combinations of symbols describing them. In competition, symbolic theft always beats sensorimotor toil. We hypothesize that this is the basis of the adaptive advantage of language. Entry-level categories must still be learned by toil, however, to avoid an infinite regress (the āsymbol grounding problemā). Changes in the internal representations of categories must take place during the course of learning by toil. These changes can be analyzed in terms of the compression of within-category similarities and the expansion of between-category differences. These allow regions of similarity space to be separated, bounded and named, and then the names can be combined and recombined to describe new categories, grounded recursively in the old ones. Such compression/expansion effects, called "categorical perception" (CP), have previously been reported with categories acquired by sensorimotor toil; we show that they can also arise from symbolic theft alone. The picture of natural language and its origins that emerges from this analysis is that of a powerful hybrid symbolic/sensorimotor capacity, infinitely superior to its purely sensorimotor precursors, but still grounded in and dependent on them. It can spare us from untold time and effort learning things the hard way, through direct experience, but it remain anchored in and translatable into the language of experience
An Infrastructure for acquiring high quality semantic metadata
Because metadata that underlies semantic web applications is gathered from distributed and heterogeneous data sources, it is important to ensure its quality (i.e., reduce duplicates, spelling errors, ambiguities). However, current infrastructures that acquire and integrate semantic data have only marginally addressed the issue of metadata quality. In this paper we present our metadata acquisition infrastructure, ASDI, which pays special attention to ensuring that high quality metadata is derived. Central to the architecture of ASDI is a erification engine that relies on several semantic web tools to check the quality of the derived data. We tested our prototype in the context of building a semantic web portal for our lab, KMi. An experimental evaluation omparing the automatically extracted data against manual annotations indicates that the verification engine enhances the quality of the extracted semantic metadata
Image annotation with Photocopain
Photo annotation is a resource-intensive task, yet is increasingly essential as image archives and personal photo collections grow in size. There is an inherent conflict in the process of describing and archiving personal experiences, because casual users are generally unwilling to expend large amounts of effort on creating the annotations which are required to organise their collections so that they can make best use of them. This paper describes the Photocopain system, a semi-automatic image annotation system which combines information about the context in which a photograph was captured with information from other readily available sources in order to generate outline annotations for that photograph that the user may further extend or amend
- ā¦