6,204 research outputs found
Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI
Vocal tract configurations play a vital role in generating distinguishable
speech sounds, by modulating the airflow and creating different resonant
cavities in speech production. They contain abundant information that can be
utilized to better understand the underlying speech production mechanism. As a
step towards automatic mapping of vocal tract shape geometry to acoustics, this
paper employs effective video action recognition techniques, like Long-term
Recurrent Convolutional Networks (LRCN) models, to identify different
vowel-consonant-vowel (VCV) sequences from dynamic shaping of the vocal tract.
Such a model typically combines a CNN based deep hierarchical visual feature
extractor with Recurrent Networks, that ideally makes the network
spatio-temporally deep enough to learn the sequential dynamics of a short video
clip for video classification tasks. We use a database consisting of 2D
real-time MRI of vocal tract shaping during VCV utterances by 17 speakers. The
comparative performances of this class of algorithms under various parameter
settings and for various classification tasks are discussed. Interestingly, the
results show a marked difference in the model performance in the context of
speech classification with respect to generic sequence or video classification
tasks.Comment: To appear in the INTERSPEECH 2018 Proceeding
Functional organization of human sensorimotor cortex for speech articulation.
Speaking is one of the most complex actions that we perform, but nearly all of us learn to do it effortlessly. Production of fluent speech requires the precise, coordinated movement of multiple articulators (for example, the lips, jaw, tongue and larynx) over rapid time scales. Here we used high-resolution, multi-electrode cortical recordings during the production of consonant-vowel syllables to determine the organization of speech sensorimotor cortex in humans. We found speech-articulator representations that are arranged somatotopically on ventral pre- and post-central gyri, and that partially overlap at individual electrodes. These representations were coordinated temporally as sequences during syllable production. Spatial patterns of cortical activity showed an emergent, population-level representation, which was organized by phonetic features. Over tens of milliseconds, the spatial patterns transitioned between distinct representations for different consonants and vowels. These results reveal the dynamic organization of speech sensorimotor cortex during the generation of multi-articulator movements that underlies our ability to speak
Rapid dynamic speech imaging at 3 Tesla using combination of a custom vocal tract coil, variable density spirals and manifold regularization
Purpose: To improve dynamic speech imaging at 3 Tesla.
Methods: A novel scheme combining a 16-channel vocal tract coil, variable
density spirals (VDS), and manifold regularization was developed. Short readout
duration spirals (1.3 ms long) were used to minimize sensitivity to
off-resonance. The manifold model leveraged similarities between frames sharing
similar vocal tract postures without explicit motion binning. Reconstruction
was posed as a SENSE-based non-local soft weighted temporal regularization
scheme. The self-navigating capability of VDS was leveraged to learn the
structure of the manifold. Our approach was compared against low-rank and
finite difference reconstruction constraints on two volunteers performing
repetitive and arbitrary speaking tasks. Blinded image quality evaluation in
the categories of alias artifacts, spatial blurring, and temporal blurring were
performed by three experts in voice research.
Results: We achieved a spatial resolution of 2.4mm2/pixel and a temporal
resolution of 17.4 ms/frame for single slice imaging, and 52.2 ms/frame for
concurrent 3-slice imaging. Implicit motion binning of the manifold scheme for
both repetitive and fluent speaking tasks was demonstrated. The manifold scheme
provided superior fidelity in modeling articulatory motion compared to low rank
and temporal finite difference schemes. This was reflected by higher image
quality scores in spatial and temporal blurring categories. Our technique
exhibited faint alias artifacts, but offered a reduced interquartile range of
scores compared to other methods in alias artifact category.
Conclusion: Synergistic combination of a custom vocal-tract coil, variable
density spirals and manifold regularization enables robust dynamic speech
imaging at 3 Tesla.Comment: 30 pages, 10 figure
On the Similarities Between Native, Non-native and Translated Texts
We present a computational analysis of three language varieties: native,
advanced non-native, and translation. Our goal is to investigate the
similarities and differences between non-native language productions and
translations, contrasting both with native language. Using a collection of
computational methods we establish three main results: (1) the three types of
texts are easily distinguishable; (2) non-native language and translations are
closer to each other than each of them is to native language; and (3) some of
these characteristics depend on the source or native language, while others do
not, reflecting, perhaps, unified principles that similarly affect translations
and non-native language.Comment: ACL2016, 12 page
A Formal Framework for Linguistic Annotation
`Linguistic annotation' covers any descriptive or analytic notations applied
to raw language data. The basic data may be in the form of time functions --
audio, video and/or physiological recordings -- or it may be textual. The added
notations may include transcriptions of all sorts (from phonetic features to
discourse structures), part-of-speech and sense tagging, syntactic analysis,
`named entity' identification, co-reference annotation, and so on. While there
are several ongoing efforts to provide formats and tools for such annotations
and to publish annotated linguistic databases, the lack of widely accepted
standards is becoming a critical problem. Proposed standards, to the extent
they exist, have focussed on file formats. This paper focuses instead on the
logical structure of linguistic annotations. We survey a wide variety of
existing annotation formats and demonstrate a common conceptual core, the
annotation graph. This provides a formal framework for constructing,
maintaining and searching linguistic annotations, while remaining consistent
with many alternative data structures and file formats.Comment: 49 page
A Python-based Brain-Computer Interface Package for Neural Data Analysis
Anowar, Md Hasan, A Python-based Brain-Computer Interface Package for Neural Data Analysis. Master of Science (MS), December, 2020, 70 pp., 4 tables, 23 figures, 74 references.
Although a growing amount of research has been dedicated to neural engineering, only a handful of software packages are available for brain signal processing. Popular brain-computer interface packages depend on commercial software products such as MATLAB. Moreover, almost every brain-computer interface software is designed for a specific neuro-biological signal; there is no single Python-based package that supports motor imagery, sleep, and stimulated brain signal analysis. The necessity to introduce a brain-computer interface package that can be a free alternative for commercial software has motivated me to develop a toolbox using the python platform. In this thesis, the structure of MEDUSA, a brain-computer interface toolbox, is presented. The features of the toolbox are demonstrated with publicly available data sources. The MEDUSA toolbox provides a valuable tool to biomedical engineers and computational neuroscience researchers
Augmented Reality
Augmented Reality (AR) is a natural development from virtual reality (VR), which was developed several decades earlier. AR complements VR in many ways. Due to the advantages of the user being able to see both the real and virtual objects simultaneously, AR is far more intuitive, but it's not completely detached from human factors and other restrictions. AR doesn't consume as much time and effort in the applications because it's not required to construct the entire virtual scene and the environment. In this book, several new and emerging application areas of AR are presented and divided into three sections. The first section contains applications in outdoor and mobile AR, such as construction, restoration, security and surveillance. The second section deals with AR in medical, biological, and human bodies. The third and final section contains a number of new and useful applications in daily living and learning
- …