33,535 research outputs found

    Visual units and confusion modelling for automatic lip-reading

    Get PDF
    Automatic lip-reading (ALR) is a challenging task because the visual speech signal is known to be missing some important information, such as voicing. We propose an approach to ALR that acknowledges that this information is missing but assumes that it is substituted or deleted in a systematic way that can be modelled. We describe a system that learns such a model and then incorporates it into decoding, which is realised as a cascade of weighted finite-state transducers. Our results show a small but statistically significant improvement in recognition accuracy. We also investigate the issue of suitable visual units for ALR, and show that visemes are sub-optimal, not but because they introduce lexical ambiguity, but because the reduction in modelling units entailed by their use reduces accuracy

    Science is perception: what can our sense of smell tell us about ourselves and the world around us?

    Get PDF
    Human sensory processes are well understood: hearing, seeing, perhaps even tasting and touch—but we do not understand smell—the elusive sense. That is, for the others we know what stimuli causes what response, and why and how. These fundamental questions are not answered within the sphere of smell science; we do not know what it is about a molecule that … smells. I report, here, the status quo theories for olfaction, highlighting what we do not know, and explaining why dismissing the perception of the input as ‘too subjective’ acts as a roadblock not conducive to scientific inquiry. I outline the current and new theory that conjectures a mechanism for signal transduction based on quantum mechanical phenomena, dubbed the ‘swipe card’, which is perhaps controversial but feasible. I show that such lines of thinking may answer some questions, or at least pose the right questions. Most importantly, I draw links and comparisons as to how better understanding of how small (10’s of atoms) molecules can interact so specially with large (10 000’s of atoms) proteins in a way that is so integral to healthy living. Repercussions of this work are not just important in understanding a basic scientific tool used by us all, but often taken for granted, it is also a step closer to understanding generic mechanisms between drug and receptor, for example

    A Virtual Conversational Agent for Teens with Autism: Experimental Results and Design Lessons

    Full text link
    We present the design of an online social skills development interface for teenagers with autism spectrum disorder (ASD). The interface is intended to enable private conversation practice anywhere, anytime using a web-browser. Users converse informally with a virtual agent, receiving feedback on nonverbal cues in real-time, and summary feedback. The prototype was developed in consultation with an expert UX designer, two psychologists, and a pediatrician. Using the data from 47 individuals, feedback and dialogue generation were automated using a hidden Markov model and a schema-driven dialogue manager capable of handling multi-topic conversations. We conducted a study with nine high-functioning ASD teenagers. Through a thematic analysis of post-experiment interviews, identified several key design considerations, notably: 1) Users should be fully briefed at the outset about the purpose and limitations of the system, to avoid unrealistic expectations. 2) An interface should incorporate positive acknowledgment of behavior change. 3) Realistic appearance of a virtual agent and responsiveness are important in engaging users. 4) Conversation personalization, for instance in prompting laconic users for more input and reciprocal questions, would help the teenagers engage for longer terms and increase the system's utility

    Improved Contrast Sensitivity DVS and its Application to Event-Driven Stereo Vision

    Get PDF
    This paper presents a new DVS sensor with one order of magnitude improved contrast sensitivity over previous reported DVSs. This sensor has been applied to a bio-inspired event-based binocular system that performs 3D event-driven reconstruction of a scene. Events from two DVS sensors are matched by using precise timing information of their ocurrence. To improve matching reliability, satisfaction of epipolar geometry constraint is required, and simultaneously available information on the orientation is used as an additional matching constraint.Ministerio de Economía y Competitividad PRI-PIMCHI-2011-0768Ministerio de Economía y Competitividad TEC2009-10639-C04-01Junta de Andalucía TIC-609

    Deep Learning for Audio Signal Processing

    Full text link
    Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.Comment: 15 pages, 2 pdf figure
    corecore