123,884 research outputs found

    Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed

    Full text link
    Speechreading or lipreading is the technique of understanding and getting phonetic features from a speaker's visual features such as movement of lips, face, teeth and tongue. It has a wide range of multimedia applications such as in surveillance, Internet telephony, and as an aid to a person with hearing impairments. However, most of the work in speechreading has been limited to text generation from silent videos. Recently, research has started venturing into generating (audio) speech from silent video sequences but there have been no developments thus far in dealing with divergent views and poses of a speaker. Thus although, we have multiple camera feeds for the speech of a user, but we have failed in using these multiple video feeds for dealing with the different poses. To this end, this paper presents the world's first ever multi-view speech reading and reconstruction system. This work encompasses the boundaries of multimedia research by putting forth a model which leverages silent video feeds from multiple cameras recording the same subject to generate intelligent speech for a speaker. Initial results confirm the usefulness of exploiting multiple camera views in building an efficient speech reading and reconstruction system. It further shows the optimal placement of cameras which would lead to the maximum intelligibility of speech. Next, it lays out various innovative applications for the proposed system focusing on its potential prodigious impact in not just security arena but in many other multimedia analytics problems.Comment: 2018 ACM Multimedia Conference (MM '18), October 22--26, 2018, Seoul, Republic of Kore

    Continuous Authentication for Voice Assistants

    Full text link
    Voice has become an increasingly popular User Interaction (UI) channel, mainly contributing to the ongoing trend of wearables, smart vehicles, and home automation systems. Voice assistants such as Siri, Google Now and Cortana, have become our everyday fixtures, especially in scenarios where touch interfaces are inconvenient or even dangerous to use, such as driving or exercising. Nevertheless, the open nature of the voice channel makes voice assistants difficult to secure and exposed to various attacks as demonstrated by security researchers. In this paper, we present VAuth, the first system that provides continuous and usable authentication for voice assistants. We design VAuth to fit in various widely-adopted wearable devices, such as eyeglasses, earphones/buds and necklaces, where it collects the body-surface vibrations of the user and matches it with the speech signal received by the voice assistant's microphone. VAuth guarantees that the voice assistant executes only the commands that originate from the voice of the owner. We have evaluated VAuth with 18 users and 30 voice commands and find it to achieve an almost perfect matching accuracy with less than 0.1% false positive rate, regardless of VAuth's position on the body and the user's language, accent or mobility. VAuth successfully thwarts different practical attacks, such as replayed attacks, mangled voice attacks, or impersonation attacks. It also has low energy and latency overheads and is compatible with most existing voice assistants

    Integrating user-centred design in the development of a silent speech interface based on permanent magnetic articulography

    Get PDF
    Abstract: A new wearable silent speech interface (SSI) based on Permanent Magnetic Articulography (PMA) was developed with the involvement of end users in the design process. Hence, desirable features such as appearance, port-ability, ease of use and light weight were integrated into the prototype. The aim of this paper is to address the challenges faced and the design considerations addressed during the development. Evaluation on both hardware and speech recognition performances are presented here. The new prototype shows a com-parable performance with its predecessor in terms of speech recognition accuracy (i.e. ~95% of word accuracy and ~75% of sequence accuracy), but significantly improved appearance, portability and hardware features in terms of min-iaturization and cost

    We need to talk about silence: Re-examining silence in International Relations theory

    Get PDF
    The critique of silence in International Relations theory has been long-standing and sustained. However, despite the lasting popularity of the term, little effort has been made to unpack the implications of existing definitions and their uses, and of attempts to rid the worlds of theory and practice of silences. This article seeks to fill this vacuum by conducting a twofold exercise: a review and revision of the conceptualisation of silence current in the literature; and a review of the implications of attempts to eliminate silence from the worlds of theory and practice. Through the discussion, the article suggests that we deepen and broaden our understanding of silence while simultaneously accepting that a degree of silence will be a permanent feature of theory and practice in international politics. Finally, the conclusion illustrates the possibilities for analysis and theory opened by these arguments through an exploration of how they may be used to interpret and address recent events in Yemen

    Reverse production effect: Children recognize novel words better when they are heard rather than produced

    Get PDF
    This is the peer reviewed version of the following article: Tania S. Zamuner, Stephanie Strahm, Elizabeth Morin-Lessard, and Michael P. A. Page, 'Reverse production effect: children recognize novel words better when they are heard rather than produced', Developmental Science, which has been published in final form at DOI 10.1111/desc.12636. Under embargo until 15 November 2018. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving.This research investigates the effect of production on 4.5- to 6-year-old children’s recognition of newly learned words. In Experiment 1, children were taught four novel words in a produced or heard training condition during a brief training phase. In Experiment 2, children were taught eight novel words, and this time training condition was in a blocked design. Immediately after training, children were tested on their recognition of the trained novel words using a preferential looking paradigm. In both experiments, children recognized novel words that were produced and heard during training, but demonstrated better recognition for items that were heard. These findings are opposite to previous results reported in the literature with adults and children. Our results show that benefits of speech production for word learning are dependent on factors such as task complexity and the developmental stage of the learner.Peer reviewedFinal Accepted Versio
    • …
    corecore