Search CORE

2,716 research outputs found

Video augmentation for improving audio speech recognition under noise

Author: British Machine Vision Conference (BMVC)
Cavallaro A
Gong S
Pachoud S
Publication venue
Publication date: 23/02/2015
Field of study

Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed

Author: Beerends John G
Chung Joon Son
Cornu Thomas Le
Lan Yuxuan
Lee Daehyun
Ngiam Jiquan
Pachoud Samuel
Summerfield Quentin
Thiede Thilo
Zimmermann Marina
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/08/2018
Field of study

Speechreading or lipreading is the technique of understanding and getting phonetic features from a speaker's visual features such as movement of lips, face, teeth and tongue. It has a wide range of multimedia applications such as in surveillance, Internet telephony, and as an aid to a person with hearing impairments. However, most of the work in speechreading has been limited to text generation from silent videos. Recently, research has started venturing into generating (audio) speech from silent video sequences but there have been no developments thus far in dealing with divergent views and poses of a speaker. Thus although, we have multiple camera feeds for the speech of a user, but we have failed in using these multiple video feeds for dealing with the different poses. To this end, this paper presents the world's first ever multi-view speech reading and reconstruction system. This work encompasses the boundaries of multimedia research by putting forth a model which leverages silent video feeds from multiple cameras recording the same subject to generate intelligent speech for a speaker. Initial results confirm the usefulness of exploiting multiple camera views in building an efficient speech reading and reconstruction system. It further shows the optimal placement of cameras which would lead to the maximum intelligibility of speech. Next, it lays out various innovative applications for the proposed system focusing on its potential prodigious impact in not just security arena but in many other multimedia analytics problems.Comment: 2018 ACM Multimedia Conference (MM '18), October 22--26, 2018, Seoul, Republic of Kore

arXiv.org e-Print Archive

Crossref

Human scalp potentials reflect a mixture of decision-related signals during perceptual choices

Author: Heekeren Hauke
Philiastides Marios
Sajda Paul
Publication venue: 'Society for Neuroscience'
Publication date: 01/01/2014
Field of study

Single-unit animal studies have consistently reported decision-related activity mirroring a process of temporal accumulation of sensory evidence to a fixed internal decision boundary. To date, our understanding of how response patterns seen in single-unit data manifest themselves at the macroscopic level of brain activity obtained from human neuroimaging data remains limited. Here, we use single-trial analysis of human electroencephalography data to show that population responses on the scalp can capture choice-predictive activity that builds up gradually over time with a rate proportional to the amount of sensory evidence, consistent with the properties of a drift-diffusion-like process as characterized by computational modeling. Interestingly, at time of choice, scalp potentials continue to appear parametrically modulated by the amount of sensory evidence rather than converging to a fixed decision boundary as predicted by our model. We show that trial-to-trial fluctuations in these response-locked signals exert independent leverage on behavior compared with the rate of evidence accumulation earlier in the trial. These results suggest that in addition to accumulator signals, population responses on the scalp reflect the influence of other decision-related signals that continue to covary with the amount of evidence at time of choice

Improving Phoneme to Viseme Mapping for Indonesian Language

Author: Hidayat Risanuri
Nugroho Hanung Adi
Rachman Anung
Publication venue: 'Universitas Gadjah Mada'
Publication date: 09/09/2020
Field of study

The lip synchronization technology of animation can run automatically through the phoneme-to-viseme map. Since the complexity of facial muscles causes the shape of the mouth to vary greatly, phoneme-to-viseme mapping always has challenging problems. One of them is the allophone vowel problem. The resemblance makes many researchers clustering them into one class. This paper discusses the certainty of allophone vowels as a variable of the phoneme-to-viseme map. Vowel allophones pre-processing as a proposed method is carried out through formant frequency feature extraction methods and then compared by t-test to find out the significance of the difference. The results of pre-processing are then used to reference the initial data when building phoneme-to-viseme maps. This research was conducted on maps and allophones of the Indonesian language. Maps that have been built are then compared with other maps using the HMM method in the value of word correctness and accuracy. The results show that viseme mapping preceded by allophonic pre-processing makes map performance more accurate when compared to other maps

IJITEE (International Journal of Information Technology and Electrical Engineering)

The Rescaled Polya Urn: local reinforcement and chi-squared goodness of fit test

Author: Aletti Giacomo
Crimaldi Irene
Publication venue
Publication date: 15/07/2020
Field of study

Motivated by recent studies of big samples, this work aims at constructing a parametric model which is characterized by the following features: (i) a "local" reinforcement, i.e. a reinforcement mechanism mainly based on the last observations, (ii) a random persistent fluctuation of the predictive mean, and (iii) a long-term convergence of the empirical mean to a deterministic limit, together with a chi-squared goodness of fit result. This triple purpose has been achieved by the introduction of a new variant of the Eggenberger-Polya urn, that we call the "Rescaled" Polya urn. We provide a complete asymptotic characterization of this model, pointing out that, for a certain choice of the parameters, it has properties different from the ones typically exhibited from the other urn models in the literature. Therefore, beyond the possible statistical application, this work could be interesting for those who are concerned with stochastic processes with reinforcement.Comment: 28 pages, 1 figur

arXiv.org e-Print Archive

Orbit Characterization, Stabilization and Composition on 3D Underactuated Bipedal Walking via Hybrid Passive Linear Inverted Pendulum Model

Author: Ames Aaron D.
Xiong Xiaobin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2019
Field of study

A Hybrid passive Linear Inverted Pendulum (H-LIP) model is proposed for characterizing, stabilizing and composing periodic orbits for 3D underactuated bipedal walking. Specifically, Period-l (P1) and Period -2 (P2) orbits are geometrically characterized in the state space of the H-LIP. Stepping controllers are designed for global stabilization of the orbits. Valid ranges of the gains and their optimality are derived. The optimal stepping controller is used to create and stabilize the walking of bipedal robots. An actuated Spring-loaded Inverted Pendulum (aSLIP) model and the underactuated robot Cassie are used for illustration. Both the aSLIP walking with PI or P2 orbits and the Cassie walking with all 3D compositions of the PI and P2 orbits can be smoothly generated and stabilized from a stepping-in-place motion. This approach provides a perspective and a methodology towards continuous gait generation and stabilization for 3D underactuated walking robots

arXiv.org e-Print Archive

Crossref

Caltech Authors