272 research outputs found

    Two-Staged Acoustic Modeling Adaption for Robust Speech Recognition by the Example of German Oral History Interviews

    Full text link
    In automatic speech recognition, often little training data is available for specific challenging tasks, but training of state-of-the-art automatic speech recognition systems requires large amounts of annotated speech. To address this issue, we propose a two-staged approach to acoustic modeling that combines noise and reverberation data augmentation with transfer learning to robustly address challenges such as difficult acoustic recording conditions, spontaneous speech, and speech of elderly people. We evaluate our approach using the example of German oral history interviews, where a relative average reduction of the word error rate by 19.3% is achieved.Comment: Accepted for IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, July 201

    Exploring the possibilities of Thomson’s fourth paradigm transformation—The case for a multimodal approach to digital oral history?

    Get PDF
    This article seeks to reorientate ‘digital oral history’ towards a new research paradigm, Multimodal Digital Oral History (MDOH), and in so doing it seeks to build upon Alistair Thomson’s (Thomson, A., 2007, Four paradigm transformations in oral history. Oral History Review, 34(1): 49–70.) characterization of a ‘dizzying digital revolution’ and paradigmatic transformation in oral history (OH). Calling for a recalibration of the current dominance of the textual transcript, and for active engagement with the oral, aural, and sonic affordances of both retro-digitized and born digital OH (DOH) collections, we call for a re-orientation of the digital from passive to generative and self-reflexive in the human–machine study of spoken word recordings. First, we take stock of the field of DOH as it is currently conceived and the ways in which it has or has not answered calls for a return to the orality of the interview by digital means. Secondly, we address the predominant trend of working with transcriptions in digital analysis of spoken word recordings and the tools being used by oral historians. Thirdly, we ask about the emerging possibilities—tools and experimental methodologies—for sonic analysis of spoken word collections within and beyond OH, looking to intersections with digital humanities, sociolinguistics, and sound studies. Lastly, we consider ethical questions and practicalities concomitant with data-driven methods, analyses and technologies like AI for the study of sonic research artefacts, reflections that dovetail with digital hermeneutics and digital tool criticism and point towards a new MDOH departure, a sub-field that has potential to inform the many fields that seek patterns in audio, audio-visual, and post-textual materials, serially and at scale

    Automatic Detection of Dementia and related Affective Disorders through Processing of Speech and Language

    Get PDF
    In 2019, dementia is has become a trillion dollar disorder. Alzheimer’s disease (AD) is a type of dementia in which the main observable symptom is a decline in cognitive functions, notably memory, as well as language and problem-solving. Experts agree that early detection is crucial to effectively develop and apply interventions and treatments, underlining the need for effective and pervasive assessment and screening tools. The goal of this thesis is to explores how computational techniques can be used to process speech and language samples produced by patients suffering from dementia or related affective disorders, to the end of automatically detecting them in large populations us- ing machine learning models. A strong focus is laid on the detection of early stage dementia (MCI), as most clinical trials today focus on intervention at this level. To this end, novel automatic and semi-automatic analysis schemes for a speech-based cogni- tive task, i.e., verbal fluency, are explored and evaluated to be an appropriate screening task. Due to a lack of available patient data in most languages, world-first multilingual approaches to detecting dementia are introduced in this thesis. Results are encouraging and clear benefits on a small French dataset become visible. Lastly, the task of detecting these people with dementia who also suffer from an affective disorder called apathy is explored. Since they are more likely to convert into later stage of dementia faster, it is crucial to identify them. These are the fist experiments that consider this task us- ing solely speech and language as inputs. Results are again encouraging, both using only speech or language data elicited using emotional questions. Overall, strong results encourage further research in establishing speech-based biomarkers for early detection and monitoring of these disorders to better patients’ lives.Im Jahr 2019 ist Demenz zu einer Billionen-Dollar-Krankheit geworden. Die Alzheimer- Krankheit (AD) ist eine Form der Demenz, bei der das Hauptsymptom eine Abnahme der kognitiven Funktionen ist, insbesondere des Gedächtnisses sowie der Sprache und des Problemlösungsvermögens. Experten sind sich einig, dass eine frühzeitige Erkennung entscheidend für die effektive Entwicklung und Anwendung von Interventionen und Behandlungen ist, was den Bedarf an effektiven und durchgängigen Bewertungsund Screening-Tools unterstreicht. Das Ziel dieser Arbeit ist es zu erforschen, wie computergest ützte Techniken eingesetzt werden können, um Sprach- und Sprechproben von Patienten, die an Demenz oder verwandten affektiven Störungen leiden, zu verarbeiten, mit dem Ziel, diese in großen Populationen mit Hilfe von maschinellen Lernmodellen automatisch zu erkennen. Ein starker Fokus liegt auf der Erkennung von Demenz im Frühstadium (MCI), da sich die meisten klinischen Studien heute auf eine Intervention auf dieser Ebene konzentrieren. Zu diesem Zweck werden neuartige automatische und halbautomatische Analyseschemata für eine sprachbasierte kognitive Aufgabe, d.h. die verbale Geläufigkeit, erforscht und als geeignete Screening-Aufgabe bewertet. Aufgrund des Mangels an verfügbaren Patientendaten in den meisten Sprachen werden in dieser Arbeit weltweit erstmalig mehrsprachige Ansätze zur Erkennung von Demenz vorgestellt. Die Ergebnisse sind ermutigend und es werden deutliche Vorteile an einem kleinen französischen Datensatz sichtbar. Schließlich wird die Aufgabe untersucht, jene Menschen mit Demenz zu erkennen, die auch an einer affektiven Störung namens Apathie leiden. Da sie mit größerer Wahrscheinlichkeit schneller in ein späteres Stadium der Demenz übergehen, ist es entscheidend, sie zu identifizieren. Dies sind die ersten Experimente, die diese Aufgabe unter ausschließlicher Verwendung von Sprache und Sprache als Input betrachten. Die Ergebnisse sind wieder ermutigend, sowohl bei der Verwendung von reiner Sprache als auch bei der Verwendung von Sprachdaten, die durch emotionale Fragen ausgelöst werden. Insgesamt sind die Ergebnisse sehr ermutigend und ermutigen zu weiterer Forschung, um sprachbasierte Biomarker für die Früherkennung und Überwachung dieser Erkrankungen zu etablieren und so das Leben der Patienten zu verbessern

    Del lenguaje oral al lenguaje escrito: la transcripción como documento de archivo

    Get PDF
    Les transcripcions dels documents sonors i audiovisuals que podem trobar a les institucions arxivístiques presenten desafiaments a l'hora de passar del llenguatge oral al llenguatge escrit. D'entre aquests documents, les entrevistes d'història oral són les que plantegen més dificultats en aquest canvi de medi i de codi perquè tant la seva forma com el seu contingut són rellevants. El present treball analitza quines condicions ha de complir una transcripció per ser un document d'arxiu, quines pautes es poden seguir per tal de realitzar-la correctament i quines possibilitats planteja l'ús de la tecnologia en el cas de la transcripció automàtica.Las transcripciones de los documentos sonoros y audiovisuales que se pueden encontrar en las instituciones archivísticas presentan desafíos a la hora de pasar del lenguaje oral al lenguaje escrito. De entre estos documentos, las entrevistas de historia oral son las que plantean más dificultades en ese cambio de medio y de código porque tanto su forma como su contenido son relevantes. El presente trabajo analiza qué condiciones debe cumplir una transcripción para ser un documento de archivo, qué pautas se pueden seguir para realizarla correctamente y qué posibilidades plantea el uso de la tecnología en el caso de la transcripción automática.Transcripts of sound and audiovisual documents that can be found in archival institutions present challenges when translating from oral language to written language. Among these documents, oral history interviews are the ones that present the most difficulties in this change of medium and code because both their form and their content are relevant. The present work analyses what conditions a transcription must meet to be an archival document, what guidelines can be followed to do it correctly and what possibilities the use of technology raises in the case of automatic transcription

    Pan European Voice Conference - PEVOC 11

    Get PDF
    The Pan European VOice Conference (PEVOC) was born in 1995 and therefore in 2015 it celebrates the 20th anniversary of its establishment: an important milestone that clearly expresses the strength and interest of the scientific community for the topics of this conference. The most significant themes of PEVOC are singing pedagogy and art, but also occupational voice disorders, neurology, rehabilitation, image and video analysis. PEVOC takes place in different European cities every two years (www.pevoc.org). The PEVOC 11 conference includes a symposium of the Collegium Medicorum Theatri (www.comet collegium.com

    Paradoxes of interactivity: perspectives for media theory, human-computer interaction, and artistic investigations

    Get PDF
    Current findings from anthropology, genetics, prehistory, cognitive and neuroscience indicate that human nature is grounded in a co-evolution of tool use, symbolic communication, social interaction and cultural transmission. Digital information technology has recently entered as a new tool in this co-evolution, and will probably have the strongest impact on shaping the human mind in the near future. A common effort from the humanities, the sciences, art and technology is necessary to understand this ongoing co- evolutionary process. Interactivity is a key for understanding the new relationships formed by humans with social robots as well as interactive environments and wearables underlying this process. Of special importance for understanding interactivity are human-computer and human-robot interaction, as well as media theory and New Media Art. "Paradoxes of Interactivity" brings together reflections on "interactivity" from different theoretical perspectives, the interplay of science and art, and recent technological developments for artistic applications, especially in the realm of sound

    Paradoxes of Interactivity

    Get PDF
    Current findings from anthropology, genetics, prehistory, cognitive and neuroscience indicate that human nature is grounded in a co-evolution of tool use, symbolic communication, social interaction and cultural transmission. Digital information technology has recently entered as a new tool in this co-evolution, and will probably have the strongest impact on shaping the human mind in the near future. A common effort from the humanities, the sciences, art and technology is necessary to understand this ongoing co- evolutionary process. Interactivity is a key for understanding the new relationships formed by humans with social robots as well as interactive environments and wearables underlying this process. Of special importance for understanding interactivity are human-computer and human-robot interaction, as well as media theory and New Media Art. »Paradoxes of Interactivity« brings together reflections on »interactivity« from different theoretical perspectives, the interplay of science and art, and recent technological developments for artistic applications, especially in the realm of sound
    corecore