Search CORE

23,472 research outputs found

A silent speech system based on permanent magnet articulography and direct synthesis

Author: Bai Jie
Cheah Lam A.
Ell Stephen R.
Gilbert James M.
Gonzalez Jose A.
Green Phil D.
Moore Roger K.
Publication venue: 'Elsevier BV'
Publication date: 14/03/2016
Field of study

In this paper we present a silent speech interface (SSI) system aimed at restoring speech communication for individuals who have lost their voice due to laryngectomy or diseases affecting the vocal folds. In the proposed system, articulatory data captured from the lips and tongue using permanent magnet articulography (PMA) are converted into audible speech using a speaker-dependent transformation learned from simultaneous recordings of PMA and audio signals acquired before laryngectomy. The transformation is represented using a mixture of factor analysers, which is a generative model that allows us to efficiently model non-linear behaviour and perform dimensionality reduction at the same time. The learned transformation is then deployed during normal usage of the SSI to restore the acoustic speech signal associated with the captured PMA data. The proposed system is evaluated using objective quality measures and listening tests on two databases containing PMA and audio recordings for normal speakers. Results show that it is possible to reconstruct speech from articulator movements captured by an unobtrusive technique without an intermediate recognition step. The SSI is capable of producing speech of sufficient intelligibility and naturalness that the speaker is clearly identifiable, but problems remain in scaling up the process to function consistently for phonetically rich vocabularies

Repository@Hull - Worktribe

Effectiveness of a Faith-placed Cardiovascular Health Promotion Intervention for Rural Adults

Author: Carnahan Leslie R
Chakraborty Apurba
Geller Stacie
Khare Manorama M
Molina Yamile
Risser Heather
Zimmermann Kristine
Publication venue: Digital Scholarship@UNLV
Publication date: 08/02/2020
Field of study

Introduction: Cardiovascular disease (CVD) is the leading cause of mortality in the US. Further, rural US adults experience disproportionately high CVD prevalence and mortality compared to non-rural. Cardiovascular risk-reduction interventions for rural adults have shown short-term effectiveness, but long-term maintenance of outcomes remains a challenge. Faith organizations offer promise as collaborative partners for translating evidence-based interventions to reduce CVD. Methods: We adapted and implemented a collaborative, faith-placed, CVD risk-reduction intervention in rural Illinois. We used a quasi-experimental, pre-post design to compare changes in dietary and physical activity among participants. Intervention components included Heart Smart for Women (HSFW), an evidence-based program implemented weekly for 12 weeks followed by Heart Smart Maintenance (HSM), implemented monthly for two years. Participants engaged in HSFW only, HSM only, or both. We used regression and generalized estimating equations models to examine changes in outcomes after one year. Results: Among participants who completed both baseline and one-year surveys (n = 131), HSFW+HSM participants had significantly higher vegetable consumption (p = .007) and combined fruit/vegetable consumption (p = .01) compared to the HSM-only group at one year. We found no differences in physical activity. Conclusion: Improving and maintaining CVD-risk behaviors is a persistent challenge in rural populations. Advancing research to improve our understanding of effective translation of CVD risk-reduction interventions in rural populations is critical

University of Nevada, Las Vegas Repository

Multichannel Attention Network for Analyzing Visual Behavior in Public Speaking

Author: Guha Tanaya
Sharma Gaurav
Sharma Rahul
Publication venue
Publication date: 21/07/2017
Field of study

Public speaking is an important aspect of human communication and interaction. The majority of computational work on public speaking concentrates on analyzing the spoken content, and the verbal behavior of the speakers. While the success of public speaking largely depends on the content of the talk, and the verbal behavior, non-verbal (visual) cues, such as gestures and physical appearance also play a significant role. This paper investigates the importance of visual cues by estimating their contribution towards predicting the popularity of a public lecture. For this purpose, we constructed a large database of more than

1800

TED talk videos. As a measure of popularity of the TED talks, we leverage the corresponding (online) viewers' ratings from YouTube. Visual cues related to facial and physical appearance, facial expressions, and pose variations are extracted from the video frames using convolutional neural network (CNN) models. Thereafter, an attention-based long short-term memory (LSTM) network is proposed to predict the video popularity from the sequence of visual features. The proposed network achieves state-of-the-art prediction accuracy indicating that visual cues alone contain highly predictive information about the popularity of a talk. Furthermore, our network learns a human-like attention mechanism, which is particularly useful for interpretability, i.e. how attention varies with time, and across different visual cues by indicating their relative importance

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

Perception of categories: from coding efficiency to reaction times

Author: Abbott
Abramson
Ashby
Ashby
Ashby
Ashby
Beale
Beck
Bialek
Bogacz
Bonnasse-Gahot
Bornstein
Britten
Brunel
Clarke
Cover
De Baene
Ecker
Freedman
Freedman
Freedman
Gold
Goldstone
Hallé
Haussler
Heekeren
Heeren
Herschkowitz
Huk
Jean-Pierre Nadal
Kim
Kriegeskorte
Kruschke
Kuhl
Kuhl
Kuhl
Laurent Bonnasse-Gahot
Liberman
Link
Link
McMurray
Meyers
Nosofsky
Ohl
Op de Beeck
Pisoni
Polka
Prather
Ratcliff
Ratcliff
Renart
Repp
Rissanen
Salinas
Seung
Shadlen
Shadlen
Sigala
Smith
Studdert-Kennedy
Usher
Vickers
Wald
Werker
Xu
Ylinen
Yoon
Zohary
Publication venue: 'Elsevier BV'
Publication date: 23/02/2011
Field of study

Reaction-times in perceptual tasks are the subject of many experimental and theoretical studies. With the neural decision making process as main focus, most of these works concern discrete (typically binary) choice tasks, implying the identification of the stimulus as an exemplar of a category. Here we address issues specific to the perception of categories (e.g. vowels, familiar faces, ...), making a clear distinction between identifying a category (an element of a discrete set) and estimating a continuous parameter (such as a direction). We exhibit a link between optimal Bayesian decoding and coding efficiency, the latter being measured by the mutual information between the discrete category set and the neural activity. We characterize the properties of the best estimator of the likelihood of the category, when this estimator takes its inputs from a large population of stimulus-specific coding cells. Adopting the diffusion-to-bound approach to model the decisional process, this allows to relate analytically the bias and variance of the diffusion process underlying decision making to macroscopic quantities that are behaviorally measurable. A major consequence is the existence of a quantitative link between reaction times and discrimination accuracy. The resulting analytical expression of mean reaction times during an identification task accounts for empirical facts, both qualitatively (e.g. more time is needed to identify a category from a stimulus at the boundary compared to a stimulus lying within a category), and quantitatively (working on published experimental data on phoneme identification tasks)

arXiv.org e-Print Archive

Crossref

Recommended from our members

Comparing the cost-effectiveness of methods for estimating population density for primates in the Amazon rainforest Peru

Author: Bowles Matthew David
Publication venue
Publication date: 21/05/2015
Field of study

With increasingly extreme fluctuations in flood levels in the Amazon basin (Malhi et al. 2008, Marengo et al. 2012, Bodmer et al. 2014) the future of its' fauna is becoming more uncertain. It is essential therefore that effective monitoring is in place in order to detect drops in population before irreversible damage is done. In developing countries such as the ones situated in the Amazon basin funding for conservation is very limited (Danielsen et al. 2003), it is therefore vital that cost effective methods of monitoring the wildlife of the Amazon are found. Three surveying techniques for monitoring primates are compared in this thesis to find the most cost effective method of estimating population densities of primate species local to the Amazon basin; these are terrestrial transects, aquatic transects and audio-playback point counts. Data was collected in the Pacaya-Samiria National Reserve using these methods over a period of four months, from January to May 2014. For both terrestrial and aquatic transects, transect lines were traversed and data was recorded every time an individual or group of the 7 primates species were spotted. Audio-playback point counts were used to record data for red howler monkeys (Alouatta seniculus)and brown capuchin monkeys (Cebus apella). This was done by mimicking primate vocalisations at a point and recording any resultant responses or sightings of the species under observation. Each survey technique was compared with regards to three qualities; precision, ability to react to change and cost. On average over all 7 species of primate aquatic transects produced the most precise estimations of population density with a mean estimation CV% (percentage coefficient of variance) of 36.35% in comparison the 47.3% averaged by terrestrial transects. Both methods failed to produce precise results for the two rarest species present, the monk saki monkey (Pithecia monachus) and the white fronted capuchin monkey (Cebus albifrons). Aquatic transects were also shown to react to sudden change in population levels. For the two species Alouatta seniculus and Cebus apella aquatic transects once again on average gained the most precise results with a mean estimation CV% value of 20.05% in comparison to the 31.08% of terrestrial transects and 36.35% for audio-playback point counts. The estimates created using audio-playback point counts used considerably less time and resources than the other two methods for single species and was also shown to be the quickest to react to immediate changes in population densities. Thus it was concluded that audio-playback point counts can produce relatively precise estimates that react to population changes at low cost. However only one species can be observed at a time using audio-playback point counts; when observing multiple species at one time it was concluded that aquatic transects are by far the cheapest survey technique and the method that produces precise estimates more consistently. I would therefore recommend for a monitoring effort of several primate species at one given time in the Amazon basin, that aquatic transects be used as it is the most cost-effective overall. However if a single species is a monitoring target, perhaps to be used as an indicator species or because the primate is of most concern, then audio-playback point counts be used as it is possible to gain relatively precise results at a minimal cost. I would also like to suggest that the use of audio-playback point counts be tested on rarer primate species in future as neither terrestrial transects nor aquatic transects could produce a useful estimate in a combined effort of 104 half days. If audio-playback point counts could be used to get good estimates for rare primate species then monitoring strategies could be developed combining the use of audio- playback point counts and aquatic transects to gain precise density estimates for all primate species in an area whilst keeping costs low. A generic decision tree is presented at the end of this thesis as a guideline to cost-effective primate monitoring for any seasonally flooding rainforest study site

Sussex Research Online

Proceedings of the 2011 New York Workshop on Computer, Earth and Space Science

Author: Naud Catherine
Way Michael J.
Publication venue
Publication date: 11/04/2011
Field of study

The purpose of the New York Workshop on Computer, Earth and Space Sciences is to bring together the New York area's finest Astronomers, Statisticians, Computer Scientists, Space and Earth Scientists to explore potential synergies between their respective fields. The 2011 edition (CESS2011) was a great success, and we would like to thank all of the presenters and participants for attending. This year was also special as it included authors from the upcoming book titled "Advances in Machine Learning and Data Mining for Astronomy". Over two days, the latest advanced techniques used to analyze the vast amounts of information now available for the understanding of our universe and our planet were presented. These proceedings attempt to provide a small window into what the current state of research is in this vast interdisciplinary field and we'd like to thank the speakers who spent the time to contribute to this volume.Comment: Author lists modified. 82 pages. Workshop Proceedings from CESS 2011 in New York City, Goddard Institute for Space Studie

arXiv.org e-Print Archive

CERN Document Server

Acoustical Ranging Techniques in Embedded Wireless Sensor Networked Devices

Author: Jha Sanjay
Kottege Navinda
Kusy Branislav
Misra Prasant
Ostry Diethelm
Publication venue
Publication date: 01/01/2013
Field of study

Location sensing provides endless opportunities for a wide range of applications in GPS-obstructed environments; where, typically, there is a need for higher degree of accuracy. In this article, we focus on robust range estimation, an important prerequisite for fine-grained localization. Motivated by the promise of acoustic in delivering high ranging accuracy, we present the design, implementation and evaluation of acoustic (both ultrasound and audible) ranging systems.We distill the limitations of acoustic ranging; and present efficient signal designs and detection algorithms to overcome the challenges of coverage, range, accuracy/resolution, tolerance to Doppler’s effect, and audible intensity. We evaluate our proposed techniques experimentally on TWEET, a low-power platform purpose-built for acoustic ranging applications. Our experiments demonstrate an operational range of 20 m (outdoor) and an average accuracy 2 cm in the ultrasound domain. Finally, we present the design of an audible-range acoustic tracking service that encompasses the benefits of a near-inaudible acoustic broadband chirp and approximately two times increase in Doppler tolerance to achieve better performance

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments

Author: Alameda-Pineda Xavier
Ban Yutong
Girin Laurent
Horaud Radu
Li Xiaofei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/02/2019
Field of study

We address the problem of online localization and tracking of multiple moving speakers in reverberant environments. The paper has the following contributions. We use the direct-path relative transfer function (DP-RTF), an inter-channel feature that encodes acoustic information robust against reverberation, and we propose an online algorithm well suited for estimating DP-RTFs associated with moving audio sources. Another crucial ingredient of the proposed method is its ability to properly assign DP-RTFs to audio-source directions. Towards this goal, we adopt a maximum-likelihood formulation and we propose to use an exponentiated gradient (EG) to efficiently update source-direction estimates starting from their currently available values. The problem of multiple speaker tracking is computationally intractable because the number of possible associations between observed source directions and physical speakers grows exponentially with time. We adopt a Bayesian framework and we propose a variational approximation of the posterior filtering distribution associated with multiple speaker tracking, as well as an efficient variational expectation-maximization (VEM) solver. The proposed online localization and tracking method is thoroughly evaluated using two datasets that contain recordings performed in real environments.Comment: IEEE Journal of Selected Topics in Signal Processing, 201

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server