123,884 research outputs found
Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed
Speechreading or lipreading is the technique of understanding and getting
phonetic features from a speaker's visual features such as movement of lips,
face, teeth and tongue. It has a wide range of multimedia applications such as
in surveillance, Internet telephony, and as an aid to a person with hearing
impairments. However, most of the work in speechreading has been limited to
text generation from silent videos. Recently, research has started venturing
into generating (audio) speech from silent video sequences but there have been
no developments thus far in dealing with divergent views and poses of a
speaker. Thus although, we have multiple camera feeds for the speech of a user,
but we have failed in using these multiple video feeds for dealing with the
different poses. To this end, this paper presents the world's first ever
multi-view speech reading and reconstruction system. This work encompasses the
boundaries of multimedia research by putting forth a model which leverages
silent video feeds from multiple cameras recording the same subject to generate
intelligent speech for a speaker. Initial results confirm the usefulness of
exploiting multiple camera views in building an efficient speech reading and
reconstruction system. It further shows the optimal placement of cameras which
would lead to the maximum intelligibility of speech. Next, it lays out various
innovative applications for the proposed system focusing on its potential
prodigious impact in not just security arena but in many other multimedia
analytics problems.Comment: 2018 ACM Multimedia Conference (MM '18), October 22--26, 2018, Seoul,
Republic of Kore
Continuous Authentication for Voice Assistants
Voice has become an increasingly popular User Interaction (UI) channel,
mainly contributing to the ongoing trend of wearables, smart vehicles, and home
automation systems. Voice assistants such as Siri, Google Now and Cortana, have
become our everyday fixtures, especially in scenarios where touch interfaces
are inconvenient or even dangerous to use, such as driving or exercising.
Nevertheless, the open nature of the voice channel makes voice assistants
difficult to secure and exposed to various attacks as demonstrated by security
researchers. In this paper, we present VAuth, the first system that provides
continuous and usable authentication for voice assistants. We design VAuth to
fit in various widely-adopted wearable devices, such as eyeglasses,
earphones/buds and necklaces, where it collects the body-surface vibrations of
the user and matches it with the speech signal received by the voice
assistant's microphone. VAuth guarantees that the voice assistant executes only
the commands that originate from the voice of the owner. We have evaluated
VAuth with 18 users and 30 voice commands and find it to achieve an almost
perfect matching accuracy with less than 0.1% false positive rate, regardless
of VAuth's position on the body and the user's language, accent or mobility.
VAuth successfully thwarts different practical attacks, such as replayed
attacks, mangled voice attacks, or impersonation attacks. It also has low
energy and latency overheads and is compatible with most existing voice
assistants
Integrating user-centred design in the development of a silent speech interface based on permanent magnetic articulography
Abstract: A new wearable silent speech interface (SSI) based on Permanent Magnetic Articulography (PMA) was developed with the involvement of end users in the design process. Hence, desirable features such as appearance, port-ability, ease of use and light weight were integrated into the prototype. The aim of this paper is to address the challenges faced and the design considerations addressed during the development. Evaluation on both hardware and speech recognition performances are presented here. The new prototype shows a com-parable performance with its predecessor in terms of speech recognition accuracy (i.e. ~95% of word accuracy and ~75% of sequence accuracy), but significantly improved appearance, portability and hardware features in terms of min-iaturization and cost
We need to talk about silence: Re-examining silence in International Relations theory
The critique of silence in International Relations theory has been long-standing and sustained. However, despite the lasting popularity of the term, little effort has been made to unpack the implications of existing definitions and their uses, and of attempts to rid the worlds of theory and practice of silences. This article seeks to fill this vacuum by conducting a twofold exercise: a review and revision of the conceptualisation of silence current in the literature; and a review of the implications of attempts to eliminate silence from the worlds of theory and practice. Through the discussion, the article suggests that we deepen and broaden our understanding of silence while simultaneously accepting that a degree of silence will be a permanent feature of theory and practice in international politics. Finally, the conclusion illustrates the possibilities for analysis and theory opened by these arguments through an exploration of how they may be used to interpret and address recent events in Yemen
Reverse production effect: Children recognize novel words better when they are heard rather than produced
This is the peer reviewed version of the following article: Tania S. Zamuner, Stephanie Strahm, Elizabeth Morin-Lessard, and Michael P. A. Page, 'Reverse production effect: children recognize novel words better when they are heard rather than produced', Developmental Science, which has been published in final form at DOI 10.1111/desc.12636. Under embargo until 15 November 2018. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving.This research investigates the effect of production on 4.5- to 6-year-old children’s recognition of newly learned words. In Experiment 1, children were taught four novel words in a produced or heard training condition during a brief training phase. In Experiment 2, children were taught eight novel words, and this time training condition was in a blocked design. Immediately after training, children were tested on their recognition of the trained novel words using a preferential looking paradigm. In both experiments, children recognized novel words that were produced and heard during training, but demonstrated better recognition for items that were heard. These findings are opposite to previous results reported in the literature with adults and children. Our results show that benefits of speech production for word learning are dependent on factors such as task complexity and the developmental stage of the learner.Peer reviewedFinal Accepted Versio
- …