23,472 research outputs found
A silent speech system based on permanent magnet articulography and direct synthesis
In this paper we present a silent speech interface (SSI) system aimed at restoring speech communication for individuals who have lost their voice due to laryngectomy or diseases affecting the vocal folds. In the proposed system, articulatory data captured from the lips and tongue using permanent magnet articulography (PMA) are converted into audible speech using a speaker-dependent transformation learned from simultaneous recordings of PMA and audio signals acquired before laryngectomy. The transformation is represented using a mixture of factor analysers, which is a generative model that allows us to efficiently model non-linear behaviour and perform dimensionality reduction at the same time. The learned transformation is then deployed during normal usage of the SSI to restore the acoustic speech signal associated with the captured PMA data. The proposed system is evaluated using objective quality measures and listening tests on two databases containing PMA and audio recordings for normal speakers. Results show that it is possible to reconstruct speech from articulator movements captured by an unobtrusive technique without an intermediate recognition step. The SSI is capable of producing speech of sufficient intelligibility and naturalness that the speaker is clearly identifiable, but problems remain in scaling up the process to function consistently for phonetically rich vocabularies
Effectiveness of a Faith-placed Cardiovascular Health Promotion Intervention for Rural Adults
Introduction: Cardiovascular disease (CVD) is the leading cause of mortality in the US. Further, rural US adults experience disproportionately high CVD prevalence and mortality compared to non-rural. Cardiovascular risk-reduction interventions for rural adults have shown short-term effectiveness, but long-term maintenance of outcomes remains a challenge. Faith organizations offer promise as collaborative partners for translating evidence-based interventions to reduce CVD.
Methods: We adapted and implemented a collaborative, faith-placed, CVD risk-reduction intervention in rural Illinois. We used a quasi-experimental, pre-post design to compare changes in dietary and physical activity among participants. Intervention components included Heart Smart for Women (HSFW), an evidence-based program implemented weekly for 12 weeks followed by Heart Smart Maintenance (HSM), implemented monthly for two years. Participants engaged in HSFW only, HSM only, or both. We used regression and generalized estimating equations models to examine changes in outcomes after one year.
Results: Among participants who completed both baseline and one-year surveys (n = 131), HSFW+HSM participants had significantly higher vegetable consumption (p = .007) and combined fruit/vegetable consumption (p = .01) compared to the HSM-only group at one year. We found no differences in physical activity.
Conclusion: Improving and maintaining CVD-risk behaviors is a persistent challenge in rural populations. Advancing research to improve our understanding of effective translation of CVD risk-reduction interventions in rural populations is critical
Multichannel Attention Network for Analyzing Visual Behavior in Public Speaking
Public speaking is an important aspect of human communication and
interaction. The majority of computational work on public speaking concentrates
on analyzing the spoken content, and the verbal behavior of the speakers. While
the success of public speaking largely depends on the content of the talk, and
the verbal behavior, non-verbal (visual) cues, such as gestures and physical
appearance also play a significant role. This paper investigates the importance
of visual cues by estimating their contribution towards predicting the
popularity of a public lecture. For this purpose, we constructed a large
database of more than TED talk videos. As a measure of popularity of the
TED talks, we leverage the corresponding (online) viewers' ratings from
YouTube. Visual cues related to facial and physical appearance, facial
expressions, and pose variations are extracted from the video frames using
convolutional neural network (CNN) models. Thereafter, an attention-based long
short-term memory (LSTM) network is proposed to predict the video popularity
from the sequence of visual features. The proposed network achieves
state-of-the-art prediction accuracy indicating that visual cues alone contain
highly predictive information about the popularity of a talk. Furthermore, our
network learns a human-like attention mechanism, which is particularly useful
for interpretability, i.e. how attention varies with time, and across different
visual cues by indicating their relative importance
Perception of categories: from coding efficiency to reaction times
Reaction-times in perceptual tasks are the subject of many experimental and
theoretical studies. With the neural decision making process as main focus,
most of these works concern discrete (typically binary) choice tasks, implying
the identification of the stimulus as an exemplar of a category. Here we
address issues specific to the perception of categories (e.g. vowels, familiar
faces, ...), making a clear distinction between identifying a category (an
element of a discrete set) and estimating a continuous parameter (such as a
direction). We exhibit a link between optimal Bayesian decoding and coding
efficiency, the latter being measured by the mutual information between the
discrete category set and the neural activity. We characterize the properties
of the best estimator of the likelihood of the category, when this estimator
takes its inputs from a large population of stimulus-specific coding cells.
Adopting the diffusion-to-bound approach to model the decisional process, this
allows to relate analytically the bias and variance of the diffusion process
underlying decision making to macroscopic quantities that are behaviorally
measurable. A major consequence is the existence of a quantitative link between
reaction times and discrimination accuracy. The resulting analytical expression
of mean reaction times during an identification task accounts for empirical
facts, both qualitatively (e.g. more time is needed to identify a category from
a stimulus at the boundary compared to a stimulus lying within a category), and
quantitatively (working on published experimental data on phoneme
identification tasks)
Recommended from our members
Comparing the cost-effectiveness of methods for estimating population density for primates in the Amazon rainforest Peru
With increasingly extreme fluctuations in flood levels in the Amazon basin (Malhi et al. 2008, Marengo et al. 2012, Bodmer et al. 2014) the future of its' fauna is becoming more uncertain. It is essential therefore that effective monitoring is in place in order to detect drops in population before irreversible damage is done. In developing countries such as the ones situated in the Amazon basin funding for conservation is very limited (Danielsen et al. 2003), it is therefore vital that cost effective methods of monitoring the wildlife of the Amazon are found. Three surveying techniques for monitoring primates are compared in this thesis to find the most cost effective method of estimating population densities of primate species local to the Amazon basin; these are terrestrial transects, aquatic transects and audio-playback point counts. Data was collected in the Pacaya-Samiria National Reserve using these methods over a period of four months, from January to May 2014.
For both terrestrial and aquatic transects, transect lines were traversed and data was recorded every time an individual or group of the 7 primates species were spotted. Audio-playback point counts were used to record data for red howler monkeys (Alouatta seniculus)and brown capuchin monkeys (Cebus apella). This was done by mimicking primate vocalisations at a point and recording any resultant responses or sightings of the species under observation. Each survey technique was compared with regards to three qualities; precision, ability to react to change and cost.
On average over all 7 species of primate aquatic transects produced the most precise estimations of population density with a mean estimation CV% (percentage coefficient of variance) of 36.35% in comparison the 47.3% averaged by terrestrial transects. Both methods failed to produce precise results for the two rarest species present, the monk saki monkey (Pithecia monachus) and the white fronted capuchin monkey (Cebus albifrons). Aquatic transects were also shown to react to sudden change in population levels. For the two species Alouatta seniculus and Cebus apella aquatic transects once again on average gained the most precise results with a mean estimation CV% value of 20.05% in comparison to the 31.08% of terrestrial transects and 36.35% for audio-playback point counts. The estimates created using audio-playback point counts used considerably less time and resources than the other two methods for single species and was also shown to be the quickest to react to immediate changes in population densities. Thus it was concluded that audio-playback point counts can produce relatively precise estimates that react to population changes at low cost. However only one species can be observed at a time using audio-playback point counts; when observing multiple species at one time it was concluded that aquatic transects are by far the cheapest survey technique and the method that produces precise estimates more consistently.
I would therefore recommend for a monitoring effort of several primate species at one given time in the Amazon basin, that aquatic transects be used as it is the most cost-effective overall. However if a single species is a monitoring target, perhaps to be used as an indicator species or because the primate is of most concern, then audio-playback point counts be used as it is possible to gain relatively precise results at a minimal cost. I would also like to suggest that the use of audio-playback point counts be tested on rarer primate species in future as neither terrestrial transects nor aquatic transects could produce a useful estimate in a combined effort of 104 half days. If audio-playback point counts could be used to get good estimates for rare primate species then monitoring strategies could be developed combining the use of audio- playback point counts and aquatic transects to gain precise density estimates for all primate species in an area whilst keeping costs low. A generic decision tree is presented at the end of this thesis as a guideline to cost-effective primate monitoring for any seasonally flooding rainforest study site
Proceedings of the 2011 New York Workshop on Computer, Earth and Space Science
The purpose of the New York Workshop on Computer, Earth and Space Sciences is
to bring together the New York area's finest Astronomers, Statisticians,
Computer Scientists, Space and Earth Scientists to explore potential synergies
between their respective fields. The 2011 edition (CESS2011) was a great
success, and we would like to thank all of the presenters and participants for
attending. This year was also special as it included authors from the upcoming
book titled "Advances in Machine Learning and Data Mining for Astronomy". Over
two days, the latest advanced techniques used to analyze the vast amounts of
information now available for the understanding of our universe and our planet
were presented. These proceedings attempt to provide a small window into what
the current state of research is in this vast interdisciplinary field and we'd
like to thank the speakers who spent the time to contribute to this volume.Comment: Author lists modified. 82 pages. Workshop Proceedings from CESS 2011
in New York City, Goddard Institute for Space Studie
Acoustical Ranging Techniques in Embedded Wireless Sensor Networked Devices
Location sensing provides endless opportunities for a wide range of applications in GPS-obstructed environments;
where, typically, there is a need for higher degree of accuracy. In this article, we focus on robust range
estimation, an important prerequisite for fine-grained localization. Motivated by the promise of acoustic in
delivering high ranging accuracy, we present the design, implementation and evaluation of acoustic (both
ultrasound and audible) ranging systems.We distill the limitations of acoustic ranging; and present efficient
signal designs and detection algorithms to overcome the challenges of coverage, range, accuracy/resolution,
tolerance to Doppler’s effect, and audible intensity. We evaluate our proposed techniques experimentally on
TWEET, a low-power platform purpose-built for acoustic ranging applications. Our experiments demonstrate
an operational range of 20 m (outdoor) and an average accuracy 2 cm in the ultrasound domain. Finally,
we present the design of an audible-range acoustic tracking service that encompasses the benefits of a near-inaudible
acoustic broadband chirp and approximately two times increase in Doppler tolerance to achieve better performance
Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments
We address the problem of online localization and tracking of multiple moving
speakers in reverberant environments. The paper has the following
contributions. We use the direct-path relative transfer function (DP-RTF), an
inter-channel feature that encodes acoustic information robust against
reverberation, and we propose an online algorithm well suited for estimating
DP-RTFs associated with moving audio sources. Another crucial ingredient of the
proposed method is its ability to properly assign DP-RTFs to audio-source
directions. Towards this goal, we adopt a maximum-likelihood formulation and we
propose to use an exponentiated gradient (EG) to efficiently update
source-direction estimates starting from their currently available values. The
problem of multiple speaker tracking is computationally intractable because the
number of possible associations between observed source directions and physical
speakers grows exponentially with time. We adopt a Bayesian framework and we
propose a variational approximation of the posterior filtering distribution
associated with multiple speaker tracking, as well as an efficient variational
expectation-maximization (VEM) solver. The proposed online localization and
tracking method is thoroughly evaluated using two datasets that contain
recordings performed in real environments.Comment: IEEE Journal of Selected Topics in Signal Processing, 201
- …