2,887 research outputs found

    automaatne kõnepõhine emotsioonituvastus

    Get PDF
    The main objectives of affective computing is the study and creation of computer systems which can detect human affects. For speech-based emotion recognition, universal features offering the best performance for all languages have not yet been found. In this thesis, a speech-based emotion recognition system using a novel set of features is created. Support vector machines are used as classifiers in the offline system on Surrey Audio-Visual Expressed Emotion database, Berlin Database of Emotional Speech, Polish Emotional Speech database and Serbian emotional speech database. Average emotion recognition rates of 80.21%, 88.6%, 75.42% and 93.41% are achieved, respectively, with a total number of 87 features. The online system, which uses Random Forests as it’s classifier, consists of two models trained on reduced versions of the first and second database, with the first model trained on only male samples and the second trained on both. The main purpose of the online system was to test the features’ usability in real-life scenarios and to explore the effects of gender in speech-based emotion recognition. To test the online system, two female and two male non-native English speakers recorded emotionally spoken sentences and used these as inputs to the trained model. Averaging over all emotions and speakers per model, it is seen that the features offer better performance than random guessing, achieving 28% emotion recognition in both models. The average recognition rate for female speakers was 19% in the first and 29% in the second model. For male speakers, the rates were 36% and 28%, respectively. These results show how having more samples for training for a particular gender affects emotion recognition rates in a trained model

    The Processing of Accented Speech

    Get PDF
    This thesis examines the processing of accented speech in both infants and adults. Accents provide a natural and reasonably consistent form of inter-speaker variation in the speech signal, but it is not yet clear exactly what processes are used to normalise this form of variation, or when and how those processes develop. Two adult studies use ERP data to examine differences between the online processing of regional- and foreign-accented speech as compared to a baseline consisting of the listeners’ home accent. These studies demonstrate that the two types of accents recruit normalisation processes which are qualitatively, and not just quantitatively, different. This provided support for the hypothesis that foreign and regional accents require different mechanisms to normalise accent-based variation (Adank et al., 2009, Floccia et al., 2009), rather than for the hypothesis that different types of accents are normalised according to their perceptual distance from the listener’s own accent (Clarke & Garrett, 2004). They also provide support for the Abstract entry approach to lexical storage of variant forms, which suggests that variant forms undergo a process of prelexical normalisation, allowing access to a canonical lexical entry (Pallier et al., 2001), rather than for the Exemplar-based approach, which suggests that variant word-forms are individually represented in the lexicon (Johnson, 1997). Two further studies examined how infants segment words from continuous speech when presented with accented speakers. The first of these includes a set of behavioural experiments, which highlight some methodological issues in the existing literature and offer some potential explanations for conflicting evidence about the age at which infants are able to segment speech. The second uses ERP data to investigate segmentation within and across accents, and provides neurophysiological evidence that 11-month-olds are able to distinguish newly-segmented words at the auditory level even within a foreign accent, or across accents, but that they are more able to treat new word-forms as word-like in a familiar accent than a foreign accent

    A Cognitive Science Reasoning in Recognition of Emotions in Audio-Visual Speech

    Get PDF
    In this report we summarize the state-of-the-art of speech emotion recognition from the signal processing point of view. On the bases of multi-corporal experiments with machine-learning classifiers, the observation is made that existing approaches for supervised machine learning lead to database dependent classifiers which can not be applied for multi-language speech emotion recognition without additional training because they discriminate the emotion classes following the used training language. As there are experimental results showing that Humans can perform language independent categorisation, we made a parallel between machine recognition and the cognitive process and tried to discover the sources of these divergent results. The analysis suggests that the main difference is that the speech perception allows extraction of language independent features although language dependent features are incorporated in all levels of the speech signal and play as a strong discriminative function in human perception. Based on several results in related domains, we have suggested that in addition, the cognitive process of emotion-recognition is based on categorisation, assisted by some hierarchical structure of the emotional categories, existing in the cognitive space of all humans. We propose a strategy for developing language independent machine emotion recognition, related to the identification of language independent speech features and the use of additional information from visual (expression) features

    L2 Accentedness and Language Self-Esteem in Foreign Language Learning

    Get PDF
    Accentedness is associated with listeners’ evaluative judgements, which might affect an L2 speaker’s construction of an image about linguistic self-worth and competence, described as language (L2) self-esteem. This line of inquiry is pursued in the study presented in this paper, which investigates the relationship between L2 self-esteem and the extent to which a learner’s L2 pronunciation differs from a listener’s representation of it – accentedness. The results show that the level of L2 self-esteem correlates with accentedness, and the direction of this correlation is negative (r = -.51). The findings also reveal that the L2 self-esteem levels of the participants whose accentedness is closer to native-like are significantly higher than those of the individuals with strongly accented speech

    Phonetic convergence in the speech of Polish learners of English

    Get PDF
    This dissertation examines variability in the phonetic performance of L2 users of English and concentrates on speech convergence as a result of exposure to native and non-native pronunciation. The term speech convergence refers to a process during which speakers adapt their linguistic behaviour according to who they are talking or listening to. Previous studies show that the phenomenon may take place both in a speaker’s L1 (e.g. Giles, 1973; Coupland, 1984; Gregory and Webster, 1996; Pardo, 2006; Babel; 2010) and L2 (e.g. Beebe, 1977; Berkowitz, 1986; Lewandowski, 2012; Rojczyk, 2013; Trofimovich and Kennedy, 2014). Speech convergence can be subdivided into three types of linguistic behaviour: convergence (the process of making one’s speech more similar to that of another person), divergence (the process of moving away from the speech of another person) and maintenance (the process of maintaining one’s default linguistic behaviour in spite of exposure to the speech of another person). The dissertation consists of four chapters; the first two provide theoretical background, the next two describe the study and its findings. Chapter One is concerned with previous research on speech convergence. The chapter reviews the methodology and approaches used in previous work and discusses the range of factors that may affect convergence strategies. Chapter Two provides an overview of relevant studies in the field of L2 phonetics. It describes the structure and formation of the L2 sound system and the numerous socialpsychological, linguistic and psycholinguistic variables that may influence L2 phonetic performance. Chapter Three describes the study on speech convergence in the pronunciation of Polish learners of English, i.e. the aims, hypotheses, methodology and results. In Chapter Four, the results of the study on phonetic convergence in the speech of Polish learners of English are analysed and discussed. The phenomenon of speech convergence has been explored under different names and with the use of various frameworks and methodological procedures. Some researchers refer to the process as accommodation and investigate it by analysing spontaneous conversational data (e.g. Giles, 1973; Bourhis and Giles, 1977; Coupland, 1984; Gregory and Webster, 1996). Other researches use the term imitation and examine the phenomenon in socially minimal, laboratory-based settings (e.g. Goldinger, 1998; Schokley et al., 2004; Delvaux and Soquet, 2007; Nielsen, 2011). Irrespective of terminological and methodological differences, the results of previous studies on phonetic convergence indicate that the process is conditioned by 171 a variety of linguistic (e.g. Mitterer and Ernestus, 2008; Babel, 2009; Brouwer et al., 2010; Nielsen, 2011) and social-psychological factors (Giles, 1973; Bilous i Krauss, 1988; Gregory and Webster, 1996; Pardo, 2006; Babel, 2009, Yu et al., 2013) Research on L2 acquisition and non-native pronunciation shows that the development of the L2 sound system is a complex and dynamic process. It has been argued that the productions of L2 users are generated by interlanguage (IL), an independent linguistic system that encompasses elements of the learner’s L1 and L2 but does not correspond exactly to either the NL or the TL (e.g. Selinker, 1972; 1992). Importantly, previous findings indicate that the phonetic performance of non-native speakers is influenced not only by their L1 and L2 sound systems but also by a range of various psycholinguistic (e.g. Flege, 1987; Flege et al., 2003) and social-psychological factors (e.g. Taylor et al., 1971; Zuengler, 1982; Gatbonton et al., 2011). The process of adapting one’s pronunciation as a result of exposure to another person’s speech has been detected in the productions of L2 users (e.g. Beebe, 1977; Berkowitz, 1986; Lewandowski, 2012; Rojczyk, 2013; Trofimovich and Kennedy, 2014). Similarly as in the case of L1 speech convergence, previous studies show that the magnitude of L2 speech convergence may depend upon a variety of social-psychological and linguistic variables. An interesting aspect of L2 phonetic convergence that has not yet been thoroughly explored is the comparison of pronunciation shifts upon exposure to the speech of native speakers of the TL as compared with pronunciation shifts upon exposure to the speech of other learners. The aim of the study was to address this issue by investigating and comparing L2 convergence strategies upon exposure to native and non-native pronunciation. The study concentrated on the phonetic performance of advanced Polish learners of English, who were exposed to two pronunciation varieties: Polish-accented English and native English. The participants were 38 native speakers of Polish, majoring in English Studies and recruited from the University of Lodz. The subjects listened to pre-recorded productions provided by two model talkers/interlocutors: a native speaker of Standard Southern British English and a native speaker of Polish (a qualified phonetician imitating a heavy Polish accent in English). The phonetic variables under investigation were the following: aspiration in word-initial /p t k/, pre-voicing in word-initial /b d g/, vowel duration as a cue for consonant voicing in English /æ e ɪ iː/. The experimental procedure consisted of several phases. First, the informants were instructed to identify the target words in an auditory naming task (baseline condition). Next, they were asked to listen to pre-recorded English words provided by the two 172 model talkers/interlocutors and to identify the words by saying them out loud (imitation condition). Finally, the subjects were required to read the target words for the two model talkers/interlocutors to listen to at a later time (accommodation condition). Following the production stage of the experiment, the participants completed a questionnaire whose purpose was to gauge attitudes towards native and foreign-accented English. Three hypotheses were formulated to be tested in the course of the study. Hypothesis 1 predicted that convergence strategies following exposure to native and non-native English will vary as a function of model talker/interlocutor. Hypothesis 2 predicted that convergence strategies following exposure to native and non-native English will be affected by the subjects’ attitudes towards native and Polish-accented English. Hypothesis 3 predicted that convergence strategies following exposure to native and non-native English will differ as a function of phonetic context (place of articulation and vowel category). Acoustic and statistical analysis of the data revealed that the subjects modified their linguistic behaviour following exposure to the speech of the model talkers/interlocutors, which corroborates the claim that L2 speech convergence phenomena are present in nonnative pronunciation. Hypothesis 1 was partially supported by the results of the study. It was found that speech behaviour following exposure to native and non-native English varied as a function of model talker/interlocutor in all but two instances (accommodation on pre-voicing and imitation of vowel duration). The results suggests that when using a second language, speakers may use different convergence strategies depending on the native/non-native status of the model talker or interlocutor. Hypothesis 2 was partially supported by the data. The results indicate that a strong preference for target-like pronunciation may prompt learners to converge towards native speech and diverge from foreign-accented speech. However, the factor does not seem to operate if a learner has not succeeded in mastering a given TL pronunciation feature, i.e. the impact of attitudinal factors on the magnitude of convergence in non-native pronunciation appears to be conditioned by the stage of acquisition of a given TL phonetic feature. Hypothesis 3 was not borne out the results obtained in the study. It was found that convergence strategies following exposure to native and non-native English did not vary depending on phonetic context. Overall, the findings of the study provide support for the claim that the process of speech convergence operates in L2 pronunciation and imply that certain social-psychological and psycholinguistic factors may have an impact on learners’ convergence strategies

    Native Speaker Response to Non-Native Accent: A Review of Recent Research

    Get PDF
    Research has generally shown that without early exposure, non-native speakers cannot achieve a native-like accent in a foreign language (Gass & Selinker, 2001, p. 336). Differences in pronunciation, stress, rhythm, and intonation remain. Nevertheless, accent has been shown to affect how native speakers (NSs) evaluate non-native speakers (NNSs). This single speech characteristic has been openly cited as justification for much broader judgments about individuals. Lippi-Green (1997), for example, highlights several cases in the U.S. in which NNSs lost jobs due to their accents, such as that of an Indian woman (who had studied English for over 20 years) deemed unfit for a librarian’s position because of her “‘heavy accent’” and “‘speech patterns’” (p. 153). Matsuda (1991) reports on U.S. doctors who lost their malpractice insurance because the company felt accent would prevent them from successfully defending themselves in a lawsuit (p. 1346)

    Listeners’ perceptions of the certainty and honesty of a speaker are associated with a common prosodic signature

    Get PDF
    The success of human cooperation crucially depends on mechanisms enabling individuals to detect unreliability in their conspecifics. Yet, how such epistemic vigilance is achieved from naturalistic sensory inputs remains unclear. Here we show that listeners’ perceptions of the certainty and honesty of other speakers from their speech are based on a common prosodic signature. Using a data-driven method, we separately decode the prosodic features driving listeners’ perceptions of a speaker’s certainty and honesty across pitch, duration and loudness. We find that these two kinds of judgments rely on a common prosodic signature that is perceived independently from individuals’ conceptual knowledge and native language. Finally, we show that listeners extract this prosodic signature automatically, and that this impacts the way they memorize spoken words. These findings shed light on a unique auditory adaptation that enables human listeners to quickly detect and react to unreliability during linguistic interactions

    Negative vaccine voices in Swedish social media

    Get PDF
    Vaccinations are one of the most significant interventions to public health, but vaccine hesitancy creates concerns for a portion of the population in many countries, including Sweden. Since discussions on vaccine hesitancy are often taken on social networking sites, data from Swedish social media are used to study and quantify the sentiment among the discussants on the vaccination-or-not topic during phases of the COVID-19 pandemic. Out of all the posts analyzed a majority showed a stronger negative sentiment, prevailing throughout the whole of the examined period, with some spikes or jumps due to the occurrence of certain vaccine-related events distinguishable in the results. Sentiment analysis can be a valuable tool to track public opinions regarding the use, efficacy, safety, and importance of vaccination
    corecore