112 research outputs found

    Spin: Lexical Semantics, Transitivity, and the Identification of Implicit Sentiment

    Get PDF
    Current interest in automatic sentiment analysis is motivated by a variety of information requirements. The vast majority of work in sentiment analysis has been specifically targeted at detecting subjective statements and mining opinions. This dissertation focuses on a different but related problem that to date has received relatively little attention in NLP research: detecting implicit sentiment, or spin, in text. This text classification task is distinguished from other sentiment analysis work in that there is no assumption that the documents to be classified with respect to sentiment are necessarily overt expressions of opinion. They rather are documents that might reveal a perspective. This dissertation describes a novel approach to the identification of implicit sentiment, motivated by ideas drawn from the literature on lexical semantics and argument structure, supported and refined through psycholinguistic experimentation. A relationship predictive of sentiment is established for components of meaning that are thought to be drivers of verbal argument selection and linking and to be arbiters of what is foregrounded or backgrounded in discourse. In computational experiments employing targeted lexical selection for verbs and nouns, a set of features reflective of these components of meaning is extracted for the terms. As observable proxies for the underlying semantic components, these features are exploited using machine learning methods for text classification with respect to perspective. After initial experimentation with manually selected lexical resources, the method is generalized to require no manual selection or hand tuning of any kind. The robustness of this linguistically motivated method is demonstrated by successfully applying it to three distinct text domains under a number of different experimental conditions, obtaining the best classification accuracies yet reported for several sentiment classification tasks. A novel graph-based classifier combination method is introduced which further improves classification accuracy by integrating statistical classifiers with models of inter-document relationships

    Identifying indicators of vulnerability from short speech segments using acoustic and textual features

    Get PDF
    In order to protect vulnerable people in telemarketing, organisations have to investigate the speech recordings to identify them first. Typically, the investigation is manually conducted. As such, the procedure is costly and time-consuming. With an automatic vulnerability detection system, more vulnerable people can be identified and protected. A standard telephone conversation lasts around 5 minutes, the detection system is expected to be able to identify such a potential vulnerable speaker from speech segments. Due to the complexity of the vulnerability definition and the unavailable annotated vulnerability examples, this paper attempts to address the detection problem as three classification tasks: age classification, accent classification and patient/non-patient classification utilising publicly available datasets. In the proposed system, we trained three sub models using acoustic and textual features for each sub task. Each trained model was evaluated on multiple datasets and achieved competitive results compared to a strong baseline (i.e. in-dataset accuracy)

    The Association between Reading Comprehension and Prerequisite Skills for Children in Poverty

    Get PDF
    The purpose of this study was to investigate the relationship between reading comprehension and the prerequisite skills typically assessed by a school based speech pathologist with a focus on children raised in poverty. Based upon previous studies, three hypotheses were developed. First, children from low socioeconomic status (SES) backgrounds would not perform as well as children from the standardization sample on norm referenced language tests although the language tests would predict reading comprehension. Second, decoding would not be as good a predictor of reading comprehension for children from low SES backgrounds as it is in typically developing children from middle SES backgrounds because of differences in language ability. Third, processing dependent measures (working memory) would be more predictive of reading comprehension than nonverbal IQ testing for children from low SES backgrounds. Twenty six children between the ages of 7 and 10 years of age participated in this study. All of the participants were from low SES homes, were receiving instruction in a mainstream classroom, and did not have an Individualized Education Program. None of the children had a history of hearing, visual, neurological, emotional or behavioral problems. All participants were tested to evaluate reading comprehension, word decoding, receptive vocabulary, receptive and expressive language, narrative skills, cognitive ability and working memory. Children in this study performed significantly poorer than the standardization sample on all four language measures which included the Peabody Picture Vocabulary Test, 4th edition, the Clinical Evaluation of Language Fundamentals, 4th edition, the Narrative Scoring Scheme and Subordinate Index scores of the narrative task analyzed using the Systematic Analysis of Language Transcripts. Participants achieved similar scores on the Word Identification and Word Attack subtests of the Woodcock Reading Mastery Test, Revised, when compared to the standardization sample, while the average score on Passage Comprehension was below the expected mean. There were positive correlations between reading comprehension and decoding and language skills. Regression analyses showed that both decoding and language scores accounted for significant independent variance in reading comprehension beyond either decoding or language alone. In addition, all cognitive scores were significantly correlated with reading comprehension. Specifically, the word recall task of the Competing Language Processing Test explained a much higher proportion of the variance in Passage Comprehension than the Test of Nonverbal Intelligence, 3rd edition and the Nonword Repetition Task. These results are consistent with previous studies that show that poverty or low SES has a negative effect on language skills and that children from low SES families are more likely to experience limited language and cognitive stimulation from the home environment. Results show that these children have smaller vocabulary sizes, less complex syntactic knowledge and less sophisticated knowledge of story structure than normative populations. However, they are able to able to decode within the normal range of ability. Hierarchical multiple regression analyses showed that language played a more important role in passage comprehension than shown in previous studies, indicating that their poor reading comprehension skills are likely due to weaknesses in language skills. Study results also support evidence that verbal working memory is associated with language and reading comprehension, demonstrating that both verbal working memory and reading comprehension require efficient allocation of limited resources for storage and processing. The results of this study suggest that language intervention should be initiated in an effort to improve reading comprehension for children in poverty

    The language of instruction: Compensating for challenge in lectures.

    Get PDF
    Recent studies have used Coh-Metrix, an automated text analyzer, to assess differences in language characteristics across different genres and academic disciplines (Graesser, McNamara, & Kulikowich, 2011; McNamara, Graesser, McCarthy, & Cai, 2014). Coh-Metrix analyzes text on many constructs at different levels, including Word Concreteness (vs. abstractness), Narrativity (vs. informational), Deep Cohesion, Referential Cohesion, and Syntactic Simplicity. In previous research, texts in the natural sciences had lower Narrativity and Word Concreteness than texts in the language arts, but were higher in Syntactic Simplicity and Referential Cohesion. This pattern suggests a form of compensation in which difficulty on one dimension (e.g., Word Concreteness) is compensated for by increasing text ease on another dimension (e.g., Syntactic Simplicity). In the present study, we provide a further test of this compensation idea by analyzing oral language use across humanities and natural science lectures. We demonstrate that decreases in Word Concreteness across lectures are associated with increases in Narrativity, Deep Cohesion, and Syntactic Simplicity. In addition, within lectures, decreases in Word Concreteness are associated with increases in Syntactic Simplicity. Compensatory mechanisms are discussed in this article at different levels of language and discourse

    Re-examining Phonological and Lexical Correlates of Second Language Comprehensibility:The Role of Rater Experience

    Get PDF
    Few researchers and teachers would disagree that some linguistic aspects of second language (L2) speech are more crucial than others for successful communication. Underlying this idea is the assumption that communicative success can be broadly defined in terms of speakers’ ability to convey the intended meaning to the interlocutor, which is frequently captured through a listener-based rating of comprehensibility or ease of understanding (e.g. Derwing & Munro, 2009; Levis, 2005). Previous research has shown that communicative success – for example, as defined through comprehensible L2 speech – depends on several linguistic dimensions of L2 output, including its segmental and suprasegmental pronunciation, fluency-based characteristics, lexical and grammatical content, as well as discourse structure (e.g. Field, 2005; Hahn, 2004; Kang et al., 2010; Trofimovich & Isaacs, 2012). Our chief objective in the current study was to explore the L2 comprehensibility construct from a language assessment perspective (e.g. Isaacs & Thomson, 2013), by targeting rater experience as a possible source of variance influencing the degree to which raters use various characteristics of speech in judging L2 comprehensibility. In keeping with this objective, we asked the following question: What is the extent to which linguistic aspects of L2 speech contributing to comprehensibility ratings depend on raters’ experience

    What to talk about, and how: studies on prominence and patterns of coreference

    Get PDF
    The concept of prominence has been variously defined, and it overlaps with other ideas in both theoretical and cognitive linguistics, such as activation, emphasis, or accessibility. Moreover, prominence has an important role in the interpretation and production of language, influencing what anaphoric patterns are produced and/or seen as mostly likely, and what referring expressions are chosen to express coreference. This thesis presents psycholinguistic, crosslinguistic studies on prominence and coreference, grouping them in two parts respectively on the surface form and repercussions of prominence and on prominence as seen in different components of meaning. The first study, on English, surveys how prominence is expressed in cleft constructions by extracting emphasis markers and "formal" features within clefts from two corpora at different registers, exploring the patterns in which syntactic marking, graphical emphasis markers, and the variants of contraction, pronoun and complementiser are used in a synergy to express prominence. The second study uses the same structure of the cleft in Italian, and focusses on two factors affecting prominence: information structure and sentence boundary. It then analyses the next-mention choices that writers make, and how this choice is carried on with referring expressions. Moving to prominence in smaller linguistic components, the studies in the third section analyse event and entity coreference in English, French, German, Italian, and Spanish, using different referring expressions and features of the verb (aspect and causative-inchoative alternation) as proxies to manipulate the prominence of entities versus the events in which they are involved. Finally, the fourth and last section investigates number conceptualisation in named entities in the same five languages: in coreference, speakers have to choose whether to index the entity according to its morphosyntactic or notional number, marking agreement on the pronoun consequently. The prominence of grammatical and semantic number in the speakers' indexing of referents is shown to change crosslinguistically and with the formality of a text, as well as with features of the entity. Overall, the results of this research show a varied interplay between prominence and patterns of coreference, with different manifestations at different levels of linguistic structure and results that can sometimes be extended crosslinguistically

    Evaluating Information Presentation Strategies for Spoken Dialogue Systems

    Get PDF
    Institute for Communicating and Collaborative SystemsA common task for spoken dialogue systems (SDS) is to help users select a suitable option (e.g., flight, hotel, restaurant) from the set of options available. When the number of options is small, they can simply be presented sequentially. However, as the number of options increases, the system must have strategies for helping users browse the space of available options. In this thesis, I compare two approaches to information presentation in SDS: (1) the summarize and refine (SR) approach (Polifroni et al., 2003; Polifroni, 2008) in which the summaries are generated by clustering the options based on attributes that lead to the smallest number of clusters, and (2) the user-model based summarize and refine (UMSR) approach (Demberg, 2005; Demberg and Moore, 2006) which employs a user model to cluster options based on attributes that are relevant to the user and uses coherence markers (e.g., connectives, discourse cues, adverbials) to highlight the trade-offs among the presented items. Prior work has shown that users prefer approaches to information presentation that take the user’s preferences into account (e.g., Komatani et al., 2003;Walker et al., 2004; Demberg and Moore, 2006). However, due to the complexity of building a working end-to-end SDS, these studies employed an ”overhearer” evaluation methodology, in which participants read or listened to pre-prepared dialogues, thus limiting evaluation criteria to users’ perceptions (e.g., informativeness, overview of options, and so on). In order to examine whether users prefer presentations based on UMSR when they were actively interacting with a dialogue system, and to measure the effectiveness and efficiency of the two approaches, I compared them in a Wizard-of-Oz experiment. I found that in terms of both task success and dialogue efficiency the UMSR approach was superior to the SR approach. In addition, I found that users also preferred presentations based on UMSR in the interactive mode. SDS are typically developed for situations in which the user’s hands and eyes are busy. I hypothesized that the benefits of pointing out relationships among options (i.e., trade-offs) in information presentation messages outweighs the costs of processing more complex sentences. To test this hypothesis, I performed two dual task experiments comparing the two approaches to information presentation in terms of their effect on cognitive load. Again, participants performed better with presentations based on the UMSR algorithm in terms of both dialogue efficiency and task success, and I found no detrimental effect on performance of the primary task. Finally, I hypothesized that one of the main reasons why UMSR is more efficient is because it uses coherence markers to highlight relations (e.g., trade-offs) between options and attributes. To test this hypothesis, I performed an eye-tracking experiment in which participants read presentations with and without these linguistic devices, and answered evaluation and comparison questions to measure differences in item recall. In addition, I used reading times to examine comprehension differences between the two information presentation strategies. I found that the linguistic devices used in UMSR indeed facilitated item recall, with no penalty in terms of comprehension cost. Thus, in this thesis I showed that an approach to information presentation that employs a user model and uses linguistic devices such as coherence markers to highlight trade-offs among the presented items improves information browsing. User studies demonstrated that this finding also applies to situations where users are performing another demanding task simultaneously

    Metacognitive Reading Strategies Used by Native Speakers of Arabic When Reading Academic Texts in Arabic and English

    Get PDF
    Curriculum and Instructio

    Max Planck Institute for Psycholinguistics: Annual report 1996

    No full text
    corecore