53,202 research outputs found

    Textual Stylistic Variation: Choices, Genres and Individuals

    Get PDF
    This chapter argues for more informed target metrics for the statistical processing of stylistic variation in text collections. Much as operationalized relevance proved a useful goal to strive for in information retrieval, research in textual stylistics, whether application oriented or philologically inclined, needs goals formulated in terms of pertinence, relevance, and utility — notions that agree with reader ex- perience of text. Differences readers are aware of are mostly based on utility — not on textual characteristics per se. Mostly, readers report stylistic differences in terms of genres. Genres, while vague and undefined, are well-established and talked about: very early on, readers learn to distinguish genres. This chapter discusses variation given by genre, and contrasts it to variation occasioned by individual choice

    Vocabulary size influences spontaneous speech in native language users: Validating the use of automatic speech recognition in individual differences research

    No full text
    Previous research has shown that vocabulary size affects performance on laboratory word production tasks. Individuals who know many words show faster lexical access and retrieve more words belonging to pre-specified categories than individuals who know fewer words. The present study examined the relationship between receptive vocabulary size and speaking skills as assessed in a natural sentence production task. We asked whether measures derived from spontaneous responses to every-day questions correlate with the size of participants’ vocabulary. Moreover, we assessed the suitability of automatic speech recognition for the analysis of participants’ responses in complex language production data. We found that vocabulary size predicted indices of spontaneous speech: Individuals with a larger vocabulary produced more words and had a higher speech-silence ratio compared to individuals with a smaller vocabulary. Importantly, these relationships were reliably identified using manual and automated transcription methods. Taken together, our results suggest that spontaneous speech elicitation is a useful method to investigate natural language production and that automatic speech recognition can alleviate the burden of labor-intensive speech transcription

    When a `sport' is a person and other issues for NMT of novels

    Get PDF

    A Corpus-Based, Pilot Study of Lexical Stress Variation in American English

    Get PDF
    Phonological free variation describes the phenomenon of there being more than one pronunciation for a word without any change in meaning (e.g. because, schedule, vehicle). The term also applies to words that exhibit different stress patterns (e.g. academic, resources, comparable) with no change in meaning or grammatical category. A corpus-based analysis of free variation is a useful tool for testing the validity of surveys of speakers' pronunciation preferences for certain variants. The current paper presents the results of a corpus-based pilot study of American English, in an attempt to replicate Mompéan's 2009 study of British English

    Three Approaches to Generating Texts in Different Styles

    Get PDF
    Natural Language Generation (nlg) systems generate texts in English and other human languages from non-linguistic input data. Usually there are a large number of possible texts that can communicate the input data, and nlg systems must choose one of these. We argue that style can be used by nlg systems to choose between possible texts, and explore how this can be done by (1) explicit stylistic parameters, (2) imitating a genre style, and (3) imitating an individual’s style

    Individual and Domain Adaptation in Sentence Planning for Dialogue

    Full text link
    One of the biggest challenges in the development and deployment of spoken dialogue systems is the design of the spoken language generation module. This challenge arises from the need for the generator to adapt to many features of the dialogue domain, user population, and dialogue context. A promising approach is trainable generation, which uses general-purpose linguistic knowledge that is automatically adapted to the features of interest, such as the application domain, individual user, or user group. In this paper we present and evaluate a trainable sentence planner for providing restaurant information in the MATCH dialogue system. We show that trainable sentence planning can produce complex information presentations whose quality is comparable to the output of a template-based generator tuned to this domain. We also show that our method easily supports adapting the sentence planner to individuals, and that the individualized sentence planners generally perform better than models trained and tested on a population of individuals. Previous work has documented and utilized individual preferences for content selection, but to our knowledge, these results provide the first demonstration of individual preferences for sentence planning operations, affecting the content order, discourse structure and sentence structure of system responses. Finally, we evaluate the contribution of different feature sets, and show that, in our application, n-gram features often do as well as features based on higher-level linguistic representations

    Can children with speech difficulties process an unfamiliar accent?

    Get PDF
    This study explores the hypothesis that children identified as having phonological processing problems may have particular difficulty in processing a different accent. Children with speech difficulties (n = 18) were compared with matched controls on four measures of auditory processing. First, an accent auditory lexical decision task was administered. In one condition, the children made lexical decisions about stimuli presented in their own accent (London). In the second condition, the stimuli were spoken in an unfamiliar accent (Glaswegian). The results showed that the children with speech difficulties had a specific deficit on the unfamiliar accent. Performance on the other auditory discrimination tasks revealed additional deficits at lower levels of input processing. The wider clinical implications of the findings are considered

    Automatic generation of large-scale paraphrases

    Get PDF
    Research on paraphrase has mostly focussed on lexical or syntactic variation within individual sentences. Our concern is with larger-scale paraphrases, from multiple sentences or paragraphs to entire documents. In this paper we address the problem of generating paraphrases of large chunks of texts. We ground our discussion through a worked example of extending an existing NLG system to accept as input a source text, and to generate a range of fluent semantically-equivalent alternatives, varying not only at the lexical and syntactic levels, but also in document structure and layout

    Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

    Get PDF
    This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of Natural Language Processing, with an emphasis on different evaluation methods and the relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118 pages, 8 figures, 1 tabl
    corecore