Search CORE

6,204 research outputs found

Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI

Author: Fels Sidney
Saha Pramit
Srungarapu Praneeth
Publication venue
Publication date: 29/07/2018
Field of study

Vocal tract configurations play a vital role in generating distinguishable speech sounds, by modulating the airflow and creating different resonant cavities in speech production. They contain abundant information that can be utilized to better understand the underlying speech production mechanism. As a step towards automatic mapping of vocal tract shape geometry to acoustics, this paper employs effective video action recognition techniques, like Long-term Recurrent Convolutional Networks (LRCN) models, to identify different vowel-consonant-vowel (VCV) sequences from dynamic shaping of the vocal tract. Such a model typically combines a CNN based deep hierarchical visual feature extractor with Recurrent Networks, that ideally makes the network spatio-temporally deep enough to learn the sequential dynamics of a short video clip for video classification tasks. We use a database consisting of 2D real-time MRI of vocal tract shaping during VCV utterances by 17 speakers. The comparative performances of this class of algorithms under various parameter settings and for various classification tasks are discussed. Interestingly, the results show a marked difference in the model performance in the context of speech classification with respect to generic sequence or video classification tasks.Comment: To appear in the INTERSPEECH 2018 Proceeding

arXiv.org e-Print Archive

Crossref

Functional organization of human sensorimotor cortex for speech articulation.

Author: Bouchard Kristofer E
Chang Edward F
Johnson Keith
Mesgarani Nima
Publication venue: eScholarship, University of California
Publication date: 01/01/2013
Field of study

Speaking is one of the most complex actions that we perform, but nearly all of us learn to do it effortlessly. Production of fluent speech requires the precise, coordinated movement of multiple articulators (for example, the lips, jaw, tongue and larynx) over rapid time scales. Here we used high-resolution, multi-electrode cortical recordings during the production of consonant-vowel syllables to determine the organization of speech sensorimotor cortex in humans. We found speech-articulator representations that are arranged somatotopically on ventral pre- and post-central gyri, and that partially overlap at individual electrodes. These representations were coordinated temporally as sequences during syllable production. Spatial patterns of cortical activity showed an emergent, population-level representation, which was organized by phonetic features. Over tens of milliseconds, the spatial patterns transitioned between distinct representations for different consonants and vowels. These results reveal the dynamic organization of speech sensorimotor cortex during the generation of multi-articulator movements that underlies our ability to speak

CiteSeerX

Crossref

PubMed Central

eScholarship - University of California

Using multimedia interfaces for speech therapy

Author: George J.
Gnanayutham Paul
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Portsmouth University Research Portal (Pure)

Rapid dynamic speech imaging at 3 Tesla using combination of a custom vocal tract coil, variable density spirals and manifold regularization

Author: Ahmed Abdul Haseeb
Alam Wahidul
Howard David
Jacob Mathews
Kruger Stanley
Lingala Sajan Goud
Meyer David
Rusho Rushdi Zahid
Titze Ingo
Publication venue
Publication date: 06/09/2022
Field of study

Purpose: To improve dynamic speech imaging at 3 Tesla. Methods: A novel scheme combining a 16-channel vocal tract coil, variable density spirals (VDS), and manifold regularization was developed. Short readout duration spirals (1.3 ms long) were used to minimize sensitivity to off-resonance. The manifold model leveraged similarities between frames sharing similar vocal tract postures without explicit motion binning. Reconstruction was posed as a SENSE-based non-local soft weighted temporal regularization scheme. The self-navigating capability of VDS was leveraged to learn the structure of the manifold. Our approach was compared against low-rank and finite difference reconstruction constraints on two volunteers performing repetitive and arbitrary speaking tasks. Blinded image quality evaluation in the categories of alias artifacts, spatial blurring, and temporal blurring were performed by three experts in voice research. Results: We achieved a spatial resolution of 2.4mm2/pixel and a temporal resolution of 17.4 ms/frame for single slice imaging, and 52.2 ms/frame for concurrent 3-slice imaging. Implicit motion binning of the manifold scheme for both repetitive and fluent speaking tasks was demonstrated. The manifold scheme provided superior fidelity in modeling articulatory motion compared to low rank and temporal finite difference schemes. This was reflected by higher image quality scores in spatial and temporal blurring categories. Our technique exhibited faint alias artifacts, but offered a reduced interquartile range of scores compared to other methods in alias artifact category. Conclusion: Synergistic combination of a custom vocal-tract coil, variable density spirals and manifold regularization enables robust dynamic speech imaging at 3 Tesla.Comment: 30 pages, 10 figure

arXiv.org e-Print Archive

On the Similarities Between Native, Non-native and Translated Texts

Author: Nisioi Sergiu
Ordan Noam
Rabinovich Ella
Wintner Shuly
Publication venue
Publication date: 01/01/2016
Field of study

We present a computational analysis of three language varieties: native, advanced non-native, and translation. Our goal is to investigate the similarities and differences between non-native language productions and translations, contrasting both with native language. Using a collection of computational methods we establish three main results: (1) the three types of texts are easily distinguishable; (2) non-native language and translations are closer to each other than each of them is to native language; and (3) some of these characteristics depend on the source or native language, while others do not, reflecting, perhaps, unified principles that similarly affect translations and non-native language.Comment: ACL2016, 12 page

arXiv.org e-Print Archive

Crossref

A Formal Framework for Linguistic Annotation

Author: Bird Steven
Liberman Mark
Publication venue
Publication date: 01/01/1999
Field of study

`Linguistic annotation' covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions -- audio, video and/or physiological recordings -- or it may be textual. The added notations may include transcriptions of all sorts (from phonetic features to discourse structures), part-of-speech and sense tagging, syntactic analysis, `named entity' identification, co-reference annotation, and so on. While there are several ongoing efforts to provide formats and tools for such annotations and to publish annotated linguistic databases, the lack of widely accepted standards is becoming a critical problem. Proposed standards, to the extent they exist, have focussed on file formats. This paper focuses instead on the logical structure of linguistic annotations. We survey a wide variety of existing annotation formats and demonstrate a common conceptual core, the annotation graph. This provides a formal framework for constructing, maintaining and searching linguistic annotations, while remaining consistent with many alternative data structures and file formats.Comment: 49 page

arXiv.org e-Print Archive

CiteSeerX

ScholarlyCommons@Penn

A Python-based Brain-Computer Interface Package for Neural Data Analysis

Author: Anowar Md Hasan
Publication venue: ScholarWorks @ UTRGV
Publication date: 01/12/2020
Field of study

Anowar, Md Hasan, A Python-based Brain-Computer Interface Package for Neural Data Analysis. Master of Science (MS), December, 2020, 70 pp., 4 tables, 23 figures, 74 references. Although a growing amount of research has been dedicated to neural engineering, only a handful of software packages are available for brain signal processing. Popular brain-computer interface packages depend on commercial software products such as MATLAB. Moreover, almost every brain-computer interface software is designed for a specific neuro-biological signal; there is no single Python-based package that supports motor imagery, sleep, and stimulated brain signal analysis. The necessity to introduce a brain-computer interface package that can be a free alternative for commercial software has motivated me to develop a toolbox using the python platform. In this thesis, the structure of MEDUSA, a brain-computer interface toolbox, is presented. The features of the toolbox are demonstrated with publicly available data sources. The MEDUSA toolbox provides a valuable tool to biomedical engineers and computational neuroscience researchers

Scholarworks@UTRGV Univ. of Texas RioGrande Valley

Augmented Reality

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Augmented Reality (AR) is a natural development from virtual reality (VR), which was developed several decades earlier. AR complements VR in many ways. Due to the advantages of the user being able to see both the real and virtual objects simultaneously, AR is far more intuitive, but it's not completely detached from human factors and other restrictions. AR doesn't consume as much time and effort in the applications because it's not required to construct the entire virtual scene and the environment. In this book, several new and emerging application areas of AR are presented and divided into three sections. The first section contains applications in outdoor and mobile AR, such as construction, restoration, security and surveillance. The second section deals with AR in medical, biological, and human bodies. The third and final section contains a number of new and useful applications in daily living and learning

Directory of Open Access Books (DOAB)

A Practical and Configurable Lip Sync Method for Games

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Crossref