11 research outputs found

    Analysis of the acoustic properties of basic emotions simulated in early bilingual Spanish-Basque people

    Get PDF
    El objetivo de este trabajo es analizar las propiedades acústicas de las emociones básicas simuladas producidas por hablantes bilingües precoces de vasco y español. Con este objetivo, las emociones elegidas han sido la alegría, la tristeza, el enfado y la frase neutra. Aunque existen numerosos trabajos que se ocupan de la descripción acústica de las emociones en español realizadas desde diferentes puntos de vista, creemos que nuestro trabajo aporta una nueva perspectiva, ya que incluye el fenómeno del bilingüismo y del contacto de lenguas, en el caso que nos ocupa el español y el euskera. Además, hay que señalar que a pesar de que esta circunstancia se abordo de forma general en un trabajo anterior (Gaminde, 2010), nunca hasta ahora se había extendido a todo el ámbito de la lenguaThe aim of this paper is to analyze the acoustic properties of the basic emotions caused by simulated early bilingual speakers of Basque and Spanish. For this purpose, the emotions were chosen joy, sadness, anger and neutral phrase. Although there are numerous works dealing with an audio description of emotions in Spanish made from different points of view, we believe that our work provides a new perspective, as it includes the phenomenon of bilingualism and language contact, if we occupies the Spanish and Basque. It should also be noted that although this phenomenon was addressed generally in a previous paper (Gaminde, 2010), never before had spread to the whole field of language

    Altered Speech: A case-study of identity-driven speech in a Dissociative Identity Disorder system

    Get PDF
    The field of sociolinguistics has long been interested in how speech differs across groups. These studies have been focused on how demographic factors like class, race, and geographical region alter speech patterns. However, more recently, the agency of individuals to use language as a tool to construct a certain identity or persona has been highlighted (e.g., Podesva 2007; Eckert 1989; Eckert 2008). These studies are limited due to the nature of their methods, relying on either one individual with a limited scope of characteristics or on a larger group of people with many different variables at play other than identity. The present study aims to address these limitations by centering on a set of unique participants that allow for a more controlled study and larger scope of interest. Specifically, this paper examines identity’s role in the sociolinguistic variation of pitch, speech quality, speech rate, and distinct accent markers within one individual with multiple identities (a person with Dissociative Identity Disorder). Despite the clear linguistic differences that have been noted by many studying Dissociative Identity Disorder (DID), there have not been any studies that focus on the phonetic or phonological variables that differ in a single system. Through an examination of these variables, we propose that various elements of personal identity (including gender, age, and sexuality), as well as the alter’s function within the system, are what drive the linguistic decisions they make

    Let’s talk about pain and opioids: Low pitch and creak in medical consultations

    Get PDF
    In recent years, the opioid crisis in the United States has sparked significant discussion on doctor-patient interactions concerning chronic pain treatments, but little to no attention has been given to investigating the vocal aspects of patient talk. This exploratory sociolinguistic study intends to fill this knowledge gap by employing prosodic discourse analysis to examine context-specific linguistic features used by the interlocutors of two distinct medical interactions. We found that patients employed both low pitch and creak as linguistic resources when describing chronic pain, narrating symptoms, and requesting opioids. The situational use of both features informs us about the linguistic ways in which patients frame fraught issues like chronic pain in light of the current opioid crisis. This study expands the breadth of phonetic analysis within the domain of discourse analysis, serving to illuminate discussions surrounding the illocutionary role of the lower vocal tract in expressing emotions

    Speaker identification in conditions of emotional speech

    Get PDF
    The phenomenon of emotional speech is rarely modeled in up to date speaker recognition research...Fenomen emotivnog govora retko je modelovan u dosadašnjem istraživanju prepoznavanja govornika..

    Bag-of-words representations for computer audition

    Get PDF
    Computer audition is omnipresent in everyday life, in applications ranging from personalised virtual agents to health care. From a technical point of view, the goal is to robustly classify the content of an audio signal in terms of a defined set of labels, such as, e.g., the acoustic scene, a medical diagnosis, or, in the case of speech, what is said or how it is said. Typical approaches employ machine learning (ML), which means that task-specific models are trained by means of examples. Despite recent successes in neural network-based end-to-end learning, taking the raw audio signal as input, models relying on hand-crafted acoustic features are still superior in some domains, especially for tasks where data is scarce. One major issue is nevertheless that a sequence of acoustic low-level descriptors (LLDs) cannot be fed directly into many ML algorithms as they require a static and fixed-length input. Moreover, also for dynamic classifiers, compressing the information of the LLDs over a temporal block by summarising them can be beneficial. However, the type of instance-level representation has a fundamental impact on the performance of the model. In this thesis, the so-called bag-of-audio-words (BoAW) representation is investigated as an alternative to the standard approach of statistical functionals. BoAW is an unsupervised method of representation learning, inspired from the bag-of-words method in natural language processing, forming a histogram of the terms present in a document. The toolkit openXBOW is introduced, enabling systematic learning and optimisation of these feature representations, unified across arbitrary modalities of numeric or symbolic descriptors. A number of experiments on BoAW are presented and discussed, focussing on a large number of potential applications and corresponding databases, ranging from emotion recognition in speech to medical diagnosis. The evaluations include a comparison of different acoustic LLD sets and configurations of the BoAW generation process. The key findings are that BoAW features are a meaningful alternative to statistical functionals, offering certain benefits, while being able to preserve the advantages of functionals, such as data-independence. Furthermore, it is shown that both representations are complementary and their fusion improves the performance of a machine listening system.Maschinelles Hören ist im täglichen Leben allgegenwärtig, mit Anwendungen, die von personalisierten virtuellen Agenten bis hin zum Gesundheitswesen reichen. Aus technischer Sicht besteht das Ziel darin, den Inhalt eines Audiosignals hinsichtlich einer Auswahl definierter Labels robust zu klassifizieren. Die Labels beschreiben bspw. die akustische Umgebung der Aufnahme, eine medizinische Diagnose oder - im Falle von Sprache - was gesagt wird oder wie es gesagt wird. Übliche Ansätze hierzu verwenden maschinelles Lernen, d.h., es werden anwendungsspezifische Modelle anhand von Beispieldaten trainiert. Trotz jüngster Erfolge beim Ende-zu-Ende-Lernen mittels neuronaler Netze, in welchen das unverarbeitete Audiosignal als Eingabe benutzt wird, sind Modelle, die auf definierten akustischen Merkmalen basieren, in manchen Bereichen weiterhin überlegen. Dies gilt im Besonderen für Einsatzzwecke, für die nur wenige Daten vorhanden sind. Allerdings besteht dabei das Problem, dass Zeitfolgen von akustischen Deskriptoren in viele Algorithmen des maschinellen Lernens nicht direkt eingespeist werden können, da diese eine statische Eingabe fester Länge benötigen. Außerdem kann es auch für dynamische (zeitabhängige) Klassifikatoren vorteilhaft sein, die Deskriptoren über ein gewisses Zeitintervall zusammenzufassen. Jedoch hat die Art der Merkmalsdarstellung einen grundlegenden Einfluss auf die Leistungsfähigkeit des Modells. In der vorliegenden Dissertation wird der sogenannte Bag-of-Audio-Words-Ansatz (BoAW) als Alternative zum Standardansatz der statistischen Funktionale untersucht. BoAW ist eine Methode des unüberwachten Lernens von Merkmalsdarstellungen, die von der Bag-of-Words-Methode in der Computerlinguistik inspiriert wurde, bei der ein Textdokument als Histogramm der vorkommenden Wörter beschrieben wird. Das Toolkit openXBOW wird vorgestellt, welches systematisches Training und Optimierung dieser Merkmalsdarstellungen - vereinheitlicht für beliebige Modalitäten mit numerischen oder symbolischen Deskriptoren - erlaubt. Es werden einige Experimente zum BoAW-Ansatz durchgeführt und diskutiert, die sich auf eine große Zahl möglicher Anwendungen und entsprechende Datensätze beziehen, von der Emotionserkennung in gesprochener Sprache bis zur medizinischen Diagnostik. Die Auswertungen beinhalten einen Vergleich verschiedener akustischer Deskriptoren und Konfigurationen der BoAW-Methode. Die wichtigsten Erkenntnisse sind, dass BoAW-Merkmalsvektoren eine geeignete Alternative zu statistischen Funktionalen darstellen, gewisse Vorzüge bieten und gleichzeitig wichtige Eigenschaften der Funktionale, wie bspw. die Datenunabhängigkeit, erhalten können. Zudem wird gezeigt, dass beide Darstellungen komplementär sind und eine Fusionierung die Leistungsfähigkeit eines Systems des maschinellen Hörens verbessert

    The Perception of Emotion from Acoustic Cues in Natural Speech

    Get PDF
    Knowledge of human perception of emotional speech is imperative for the development of emotion in speech recognition systems and emotional speech synthesis. Owing to the fact that there is a growing trend towards research on spontaneous, real-life data, the aim of the present thesis is to examine human perception of emotion in naturalistic speech. Although there are many available emotional speech corpora, most contain simulated expressions. Therefore, there remains a compelling need to obtain naturalistic speech corpora that are appropriate and freely available for research. In that regard, our initial aim was to acquire suitable naturalistic material and examine its emotional content based on listener perceptions. A web-based listening tool was developed to accumulate ratings based on large-scale listening groups. The emotional content present in the speech material was demonstrated by performing perception tests on conveyed levels of Activation and Evaluation. As a result, labels were determined that signified the emotional content, and thus contribute to the construction of a naturalistic emotional speech corpus. In line with the literature, the ratings obtained from the perception tests suggested that Evaluation (or hedonic valence) is not identified as reliably as Activation is. Emotional valence can be conveyed through both semantic and prosodic information, for which the meaning of one may serve to facilitate, modify, or conflict with the meaning of the other—particularly with naturalistic speech. The subsequent experiments aimed to investigate this concept by comparing ratings from perception tests of non-verbal speech with verbal speech. The method used to render non-verbal speech was low-pass filtering, and for this, suitable filtering conditions were determined by carrying out preliminary perception tests. The results suggested that nonverbal naturalistic speech provides sufficiently discernible levels of Activation and Evaluation. It appears that the perception of Activation and Evaluation is affected by low-pass filtering, but that the effect is relatively small. Moreover, the results suggest that there is a similar trend in agreement levels between verbal and non-verbal speech. To date it still remains difficult to determine unique acoustical patterns for hedonic valence of emotion, which may be due to inadequate labels or the incorrect selection of acoustic parameters. This study has implications for the labelling of emotional speech data and the determination of salient acoustic correlates of emotion

    Investigating the phonetic and linguistic features used by speakers to communicate an intent to harm

    Get PDF
    This research aims to examine the phonetic and linguistic features which can be associated with a threatening intent. At present, there is a range of threat assessment resources and descriptions in legal cases which provide insight surrounding the content and production of threatening language. However, the veracity of these descriptions has not been thoroughly explored in empirical research. Through the examination of authentic and simulated threatening language data, this research provides a broad overview of the usage of phonetic and linguistic features to convey a threatening intent to harm. A set of 10 authentic speech recordings where a direct (or explicitly-worded) threat was present were analysed in relation to a sample of non-threatening speech. In addition, simulated threatening and non-threatening speech and texts were collected from 41 participants under experimental conditions. These threatening and non-threatening data were compared with respect to mean fundamental frequency, intensity, articulation rate and changes to vocal tract features and vocal settings. The simulated data were also examined for the use of lexical features which have previously been associated with the actualisation of harm. The results of these analyses suggest that there is no compelling evidence to support the assertion of a `threatening tone of voice'. There were, however, tendencies for these speakers to raise their mean fundamental frequency, intensity and articulation rate during threatening speech production relative to their non-threatening speech. There was also evidence to suggest that a number of lexical features used by these participants also corresponded to previous examinations of authentic threatening texts. It is suggested that on the basis of these findings, the production of threatening language is a considerably more complex and varied behaviour than might be expected. These findings have notable implications for the development of threat assessment tools, and for the description of a `threatening manner' in legal contexts

    Просодичне оформлення висловлень співчуття в англійському мовленні (експериментально-фонетичне дослідження) (дисертація)

    Get PDF
    Дисертаційна праця присвячена комплексному дослідженню особливостей просодичного оформлення висловлень співчуття, актуалізованих в англійському усному мовленн
    corecore