38,243 research outputs found
MobiBits: Multimodal Mobile Biometric Database
This paper presents a novel database comprising representations of five
different biometric characteristics, collected in a mobile, unconstrained or
semi-constrained setting with three different mobile devices, including
characteristics previously unavailable in existing datasets, namely hand
images, thermal hand images, and thermal face images, all acquired with a
mobile, off-the-shelf device. In addition to this collection of data we perform
an extensive set of experiments providing insight on benchmark recognition
performance that can be achieved with these data, carried out with existing
commercial and academic biometric solutions. This is the first known to us
mobile biometric database introducing samples of biometric traits such as
thermal hand images and thermal face images. We hope that this contribution
will make a valuable addition to the already existing databases and enable new
experiments and studies in the field of mobile authentication. The MobiBits
database is made publicly available to the research community at no cost for
non-commercial purposes.Comment: Submitted for the BIOSIG2018 conference on June 18, 2018. Accepted
for publication on July 20, 201
Efficient Invariant Features for Sensor Variability Compensation in Speaker Recognition
In this paper, we investigate the use of invariant features for speaker recognition. Owing to their characteristics, these features are introduced to cope with the difficult and challenging problem of sensor variability and the source of performance degradation inherent in speaker recognition systems. Our experiments show: (1) the effectiveness of these features in match cases; (2) the benefit of combining these features with the mel frequency cepstral coefficients to exploit their discrimination power under uncontrolled conditions (mismatch cases). Consequently, the proposed invariant features result in a performance improvement as demonstrated by a reduction in the equal error rate and the minimum decision cost function compared to the GMM-UBM speaker recognition systems based on MFCC features
SALSA: A Novel Dataset for Multimodal Group Behavior Analysis
Studying free-standing conversational groups (FCGs) in unstructured social
settings (e.g., cocktail party ) is gratifying due to the wealth of information
available at the group (mining social networks) and individual (recognizing
native behavioral and personality traits) levels. However, analyzing social
scenes involving FCGs is also highly challenging due to the difficulty in
extracting behavioral cues such as target locations, their speaking activity
and head/body pose due to crowdedness and presence of extreme occlusions. To
this end, we propose SALSA, a novel dataset facilitating multimodal and
Synergetic sociAL Scene Analysis, and make two main contributions to research
on automated social interaction analysis: (1) SALSA records social interactions
among 18 participants in a natural, indoor environment for over 60 minutes,
under the poster presentation and cocktail party contexts presenting
difficulties in the form of low-resolution images, lighting variations,
numerous occlusions, reverberations and interfering sound sources; (2) To
alleviate these problems we facilitate multimodal analysis by recording the
social interplay using four static surveillance cameras and sociometric badges
worn by each participant, comprising the microphone, accelerometer, bluetooth
and infrared sensors. In addition to raw data, we also provide annotations
concerning individuals' personality as well as their position, head, body
orientation and F-formation information over the entire event duration. Through
extensive experiments with state-of-the-art approaches, we show (a) the
limitations of current methods and (b) how the recorded multiple cues
synergetically aid automatic analysis of social interactions. SALSA is
available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure
DolphinAtack: Inaudible Voice Commands
Speech recognition (SR) systems such as Siri or Google Now have become an
increasingly popular human-computer interaction method, and have turned various
systems into voice controllable systems(VCS). Prior work on attacking VCS shows
that the hidden voice commands that are incomprehensible to people can control
the systems. Hidden voice commands, though hidden, are nonetheless audible. In
this work, we design a completely inaudible attack, DolphinAttack, that
modulates voice commands on ultrasonic carriers (e.g., f > 20 kHz) to achieve
inaudibility. By leveraging the nonlinearity of the microphone circuits, the
modulated low frequency audio commands can be successfully demodulated,
recovered, and more importantly interpreted by the speech recognition systems.
We validate DolphinAttack on popular speech recognition systems, including
Siri, Google Now, Samsung S Voice, Huawei HiVoice, Cortana and Alexa. By
injecting a sequence of inaudible voice commands, we show a few
proof-of-concept attacks, which include activating Siri to initiate a FaceTime
call on iPhone, activating Google Now to switch the phone to the airplane mode,
and even manipulating the navigation system in an Audi automobile. We propose
hardware and software defense solutions. We validate that it is feasible to
detect DolphinAttack by classifying the audios using supported vector machine
(SVM), and suggest to re-design voice controllable systems to be resilient to
inaudible voice command attacks.Comment: 15 pages, 17 figure
Fingerprinting Smart Devices Through Embedded Acoustic Components
The widespread use of smart devices gives rise to both security and privacy
concerns. Fingerprinting smart devices can assist in authenticating physical
devices, but it can also jeopardize privacy by allowing remote identification
without user awareness. We propose a novel fingerprinting approach that uses
the microphones and speakers of smart phones to uniquely identify an individual
device. During fabrication, subtle imperfections arise in device microphones
and speakers which induce anomalies in produced and received sounds. We exploit
this observation to fingerprint smart devices through playback and recording of
audio samples. We use audio-metric tools to analyze and explore different
acoustic features and analyze their ability to successfully fingerprint smart
devices. Our experiments show that it is even possible to fingerprint devices
that have the same vendor and model; we were able to accurately distinguish
over 93% of all recorded audio clips from 15 different units of the same model.
Our study identifies the prominent acoustic features capable of fingerprinting
devices with high success rate and examines the effect of background noise and
other variables on fingerprinting accuracy
Mobile phones: a trade-off between speech intelligibility and exposure to noise levels and to radio-frequency electromagnetic fields
When making phone calls, cellphone and smartphone users are exposed to radio-frequency (RF) electromagnetic fields (EMFs) and sound pressure simultaneously. Speech intelligibility during mobile phone calls is related to the sound pressure level of speech relative to potential background sounds and also to the RF-EMF exposure, since the signal quality is correlated with the RF-EMF strength. Additionally, speech intelligibility, sound pressure level, and exposure to RF-EMFs are dependent on how the call is made (on speaker, held at the ear, or with headsets). The relationship between speech intelligibility, sound exposure, and exposure to RF-EMFs is determined in this study. To this aim, the transmitted RF-EMF power was recorded during phone calls made by 53 subjects in three different, controlled exposure scenarios: calling with the phone at the ear, calling in speaker mode, and calling with a headset. This emitted power is directly proportional to the exposure to RF EMFs and is translated into specific absorption rate using numerical simulations. Simultaneously, sound pressure levels have been recorded and speech intelligibility has been assessed during each phone call. The results show that exposure to RF-EMFs, quantified as the specific absorption in the head, will be reduced when speaker-mode or a headset is used, in comparison to calling next to the ear. Additionally, personal exposure to sound pressure is also found to be highest in the condition where the phone is held next to the ear. On the other hand, speech perception is found to be the best when calling with a phone next to the ear in comparison to the other studied conditions, when background noise is present
- …