Search CORE

1,650 research outputs found

Audio Splicing Detection and Localization Based on Acquisition Device Traces

Author: Aichroth P.
Bestagini P.
Cuccovillo L.
Leonzio D. U.
Marcon M.
Tubaro S.
Publication venue
Publication date: 01/01/2023
Field of study

In recent years, the multimedia forensic community has put a great effort in developing solutions to assess the integrity and authenticity of multimedia objects, focusing especially on manipulations applied by means of advanced deep learning techniques. However, in addition to complex forgeries as the deepfakes, very simple yet effective manipulation techniques not involving any use of state-of-the-art editing tools still exist and prove dangerous. This is the case of audio splicing for speech signals, i.e., to concatenate and combine multiple speech segments obtained from different recordings of a person in order to cast a new fake speech. Indeed, by simply adding a few words to an existing speech we can completely alter its meaning. In this work, we address the overlooked problem of detection and localization of audio splicing from different models of acquisition devices. Our goal is to determine whether an audio track under analysis is pristine, or it has been manipulated by splicing one or multiple segments obtained from different device models. Moreover, if a recording is detected as spliced, we identify where the modification has been introduced in the temporal dimension. The proposed method is based on a Convolutional Neural Network (CNN) that extracts model-specific features from the audio recording. After extracting the features, we determine whether there has been a manipulation through a clustering algorithm. Finally, we identify the point where the modification has been introduced through a distance-measuring technique. The proposed method allows to detect and localize multiple splicing points within a recording

Archivio istituzionale della ricerca - Politecnico di Milano

Spike encoding techniques for IoT time-varying signals benchmarked on a neuromorphic classification task

Author: Forno Evelina
Fra Vittorio
Macii Enrico
Pignari Riccardo
Urgese Gianvito
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2022
Field of study

Spiking Neural Networks (SNNs), known for their potential to enable low energy consumption and computational cost, can bring significant advantages to the realm of embedded machine learning for edge applications. However, input coming from standard digital sensors must be encoded into spike trains before it can be elaborated with neuromorphic computing technologies. We present here a detailed comparison of available spike encoding techniques for the translation of time-varying signals into the event-based signal domain, tested on two different datasets both acquired through commercially available digital devices: the Free Spoken Digit dataset (FSD), consisting of 8-kHz audio files, and the WISDM dataset, composed of 20-Hz recordings of human activity through mobile and wearable inertial sensors. We propose a complete pipeline to benchmark these encoding techniques by performing time-dependent signal classification through a Spiking Convolutional Neural Network (sCNN), including a signal preprocessing step consisting of a bank of filters inspired by the human cochlea, feature extraction by production of a sonogram, transfer learning via an equivalent ANN, and model compression schemes aimed at resource optimization. The resulting performance comparison and analysis provides a powerful practical tool, empowering developers to select the most suitable coding method based on the type of data and the desired processing algorithms, and further expands the applicability of neuromorphic computational paradigms to embedded sensor systems widely employed in the IoT and industrial domains

PubMed Central

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Survey and Systematization of Secure Device Pairing

Author: Fomichev Mikhail
Gardner-Stephen Paul
Hollick Matthias
Steinmetzer Daniel
Álvarez Flor
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2017
Field of study

Secure Device Pairing (SDP) schemes have been developed to facilitate secure communications among smart devices, both personal mobile devices and Internet of Things (IoT) devices. Comparison and assessment of SDP schemes is troublesome, because each scheme makes different assumptions about out-of-band channels and adversary models, and are driven by their particular use-cases. A conceptual model that facilitates meaningful comparison among SDP schemes is missing. We provide such a model. In this article, we survey and analyze a wide range of SDP schemes that are described in the literature, including a number that have been adopted as standards. A system model and consistent terminology for SDP schemes are built on the foundation of this survey, which are then used to classify existing SDP schemes into a taxonomy that, for the first time, enables their meaningful comparison and analysis.The existing SDP schemes are analyzed using this model, revealing common systemic security weaknesses among the surveyed SDP schemes that should become priority areas for future SDP research, such as improving the integration of privacy requirements into the design of SDP schemes. Our results allow SDP scheme designers to create schemes that are more easily comparable with one another, and to assist the prevention of persisting the weaknesses common to the current generation of SDP schemes.Comment: 34 pages, 5 figures, 3 tables, accepted at IEEE Communications Surveys & Tutorials 2017 (Volume: PP, Issue: 99

arXiv.org e-Print Archive

TUbiblio

DSP.Ear: Leveraging co-processor support for continuous audio sensing on smartphones

Author: Georgie P
Lane ND
Mascolo C
Rachuri KK
Publication venue: SenSys 2014 - Proceedings of the 12th ACM Conference on Embedded Networked Sensor Systems
Publication date: 01/01/2014
Field of study

The rapidly growing adoption of sensor-enabled smartphones has greatly fueled the proliferation of applications that use phone sensors to monitor user behavior. A central sensor among these is the microphone which enables, for instance, the detection of valence in speech, or the identification of speakers. Deploying multiple of these applications on a mobile device to continuously monitor the audio environment allows for the acquisition of a diverse range of sound-related contextual inferences. However, the cumulative processing burden critically impacts the phone battery. To address this problem, we propose DSP.Ear - an integrated sensing system that takes advantage of the latest low-power DSP co-processor technology in commodity mobile devices to enable the continuous and simultaneous operation of multiple established algorithms that perform complex audio inferences. The system extracts emotions from voice, estimates the number of people in a room, identifies the speakers, and detects commonly found ambient sounds, while critically incurring little overhead to the device battery. This is achieved through a series of pipeline optimizations that allow the computation to remain largely on the DSP. Through detailed evaluation of our prototype implementation we show that, by exploiting a smartphone's co-processor, DSP.Ear achieves a 3 to 7 times increase in the battery lifetime compared to a solution that uses only the phone's main processor. In addition, DSP.Ear is 2 to 3 times more power efficient than a naive DSP solution without optimizations. We further analyze a large-scale dataset from 1320 Android users to show that in about 80-90% of the daily usage instances DSP.Ear is able to sustain a full day of operation (even in the presence of other smartphone workloads) with a single battery charge.This work was supported by Microsoft Research through its PhD Scholarship Program.This is the author's accepted manuscript. The final version is available from ACM in the proceedings of the ACM Conference on Embedded Networked Sensor Systems: http://dl.acm.org/citation.cfm?id=2668349

arXiv.org e-Print Archive

Apollo (Cambridge)

CUED - Cambridge University Engineering Department

A Survey of Positioning Systems Using Visible LED Lights

Author: Cao Pan
Cao Y.
Haas H.
Hua L.
Qi L.
Thompson J.
Wu Y.
Yang J.
Zhuang Y.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2018
Field of study

© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.As Global Positioning System (GPS) cannot provide satisfying performance in indoor environments, indoor positioning technology, which utilizes indoor wireless signals instead of GPS signals, has grown rapidly in recent years. Meanwhile, visible light communication (VLC) using light devices such as light emitting diodes (LEDs) has been deemed to be a promising candidate in the heterogeneous wireless networks that may collaborate with radio frequencies (RF) wireless networks. In particular, light-fidelity has a great potential for deployment in future indoor environments because of its high throughput and security advantages. This paper provides a comprehensive study of a novel positioning technology based on visible white LED lights, which has attracted much attention from both academia and industry. The essential characteristics and principles of this system are deeply discussed, and relevant positioning algorithms and designs are classified and elaborated. This paper undertakes a thorough investigation into current LED-based indoor positioning systems and compares their performance through many aspects, such as test environment, accuracy, and cost. It presents indoor hybrid positioning systems among VLC and other systems (e.g., inertial sensors and RF systems). We also review and classify outdoor VLC positioning applications for the first time. Finally, this paper surveys major advances as well as open issues, challenges, and future research directions in VLC positioning systems.Peer reviewe

Northumbria Research Link

University of Hertfordshire Research Archive

Acoustic-channel attack and defence methods for personal voice assistants

Author: Cheng Peng
Publication venue: Lancaster University
Publication date: 01/01/2020
Field of study

Personal Voice Assistants (PVAs) are increasingly used as interface to digital environments. Voice commands are used to interact with phones, smart homes or cars. In the US alone the number of smart speakers such as Amazon’s Echo and Google Home has grown by 78% to 118.5 million and 21% of the US population own at least one device. Given the increasing dependency of society on PVAs, security and privacy of these has become a major concern of users, manufacturers and policy makers. Consequently, a steep increase in research efforts addressing security and privacy of PVAs can be observed in recent years. While some security and privacy research applicable to the PVA domain predates their recent increase in popularity and many new research strands have emerged, there lacks research dedicated to PVA security and privacy. The most important interaction interface between users and a PVA is the acoustic channel and acoustic channel related security and privacy studies are desirable and required. The aim of the work presented in this thesis is to enhance the cognition of security and privacy issues of PVA usage related to the acoustic channel, to propose principles and solutions to key usage scenarios to mitigate potential security threats, and to present a novel type of dangerous attack which can be launched only by using a PVA alone. The five core contributions of this thesis are: (i) a taxonomy is built for the research domain of PVA security and privacy issues related to acoustic channel. An extensive research overview on the state of the art is provided, describing a comprehensive research map for PVA security and privacy. It is also shown in this taxonomy where the contributions of this thesis lie; (ii) Work has emerged aiming to generate adversarial audio inputs which sound harmless to humans but can trick a PVA to recognise harmful commands. The majority of work has been focused on the attack side, but there rarely exists work on how to defend against this type of attack. A defence method against white-box adversarial commands is proposed and implemented as a prototype. It is shown that a defence Automatic Speech Recognition (ASR) can work in parallel with the PVA’s main one, and adversarial audio input is detected if the difference in the speech decoding results between both ASR surpasses a threshold. It is demonstrated that an ASR that differs in architecture and/or training data from the the PVA’s main ASR is usable as protection ASR; (iii) PVAs continuously monitor conversations which may be transported to a cloud back end where they are stored, processed and maybe even passed on to other service providers. A user has limited control over this process when a PVA is triggered without user’s intent or a PVA belongs to others. A user is unable to control the recording behaviour of surrounding PVAs, unable to signal privacy requirements and unable to track conversation recordings. An acoustic tagging solution is proposed aiming to embed additional information into acoustic signals processed by PVAs. A user employs a tagging device which emits an acoustic signal when PVA activity is assumed. Any active PVA will embed this tag into their recorded audio stream. The tag may signal a cooperating PVA or back-end system that a user has not given a recording consent. The tag may also be used to trace when and where a recording was taken if necessary. A prototype tagging device based on PocketSphinx is implemented. Using Google Home Mini as the PVA, it is demonstrated that the device can tag conversations and the tagging signal can be retrieved from conversations stored in the Google back-end system; (iv) Acoustic tagging provides users the capability to signal their permission to the back-end PVA service, and another solution inspired by Denial of Service (DoS) is proposed as well for protecting user privacy. Although PVAs are very helpful, they are also continuously monitoring conversations. When a PVA detects a wake word, the immediately following conversation is recorded and transported to a cloud system for further analysis. An active protection mechanism is proposed: reactive jamming. A Protection Jamming Device (PJD) is employed to observe conversations. Upon detection of a PVA wake word the PJD emits an acoustic jamming signal. The PJD must detect the wake word faster than the PVA such that the jamming signal still prevents wake word detection by the PVA. An evaluation of the effectiveness of different jamming signals and overlap between wake words and the jamming signals is carried out. 100% jamming success can be achieved with an overlap of at least 60% with a negligible false positive rate; (v) Acoustic components (speakers and microphones) on a PVA can potentially be re-purposed to achieve acoustic sensing. This has great security and privacy implication due to the key role of PVAs in digital environments. The first active acoustic side-channel attack is proposed. Speakers are used to emit human inaudible acoustic signals and the echo is recorded via microphones, turning the acoustic system of a smartphone into a sonar system. The echo signal can be used to profile user interaction with the device. For example, a victim’s finger movement can be monitored to steal Android unlock patterns. The number of candidate unlock patterns that an attacker must try to authenticate herself to a Samsung S4 phone can be reduced by up to 70% using this novel unnoticeable acoustic side-channel

Lancaster E-Prints

Sensor-Based Covert Channels on Mobile Devices

Author: Matyunin Nikolay
Publication venue
Publication date: 01/01/2022
Field of study

Smartphones have become ubiquitous in our daily activities, having billions of active users worldwide. The wide range of functionalities of modern mobile devices is enriched by many embedded sensors. These sensors, accessible by third-party mobile applications, pose novel security and privacy threats to the users of the devices. Numerous research works demonstrate that user keystrokes, location, or even speech can be inferred based on sensor measurements. Furthermore, the sensor itself can be susceptible to external physical interference, which can lead to attacks on systems that rely on sensor data. In this dissertation, we investigate how reaction of sensors in mobile devices to malicious physical interference can be exploited to establish covert communication channels between otherwise isolated devices or processes. We present multiple covert channels that use sensors’ reaction to electromagnetic and acoustic interference to transmit sensitive data from nearby devices with no dedicated equipment or hardware modifications. In addition, these covert channels can also transmit information between applications within a mobile device, breaking the logical isolation enforced by the operating system. Furthermore, we discuss how sensor-based covert channels can affect privacy of end users by tracking their activities on two different devices or across two different applications on the same device. Finally, we present a framework that automatically identifies covert channels that are based on physical interference between hardware components of mobile devices. As a result of the experimental evaluation, we can confirm previously known covert channels on smartphones, and discover novel sources of cross-component interference that can be used to establish covert channels. Focusing on mobile platforms in this work, we aim to show that it is of crucial importance to consider physical covert channels when assessing the security of the systems that rely on sensors, and advocate for holistic approaches that can proactively identify and estimate corresponding security and privacy risks

TUbiblio

tuprints

Sensing, interpreting, and anticipating human social behaviour in the real world

Author: Müller Philipp Matthias
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2020
Field of study

Low-level nonverbal social signals like glances, utterances, facial expressions and body language are central to human communicative situations and have been shown to be connected to important high-level constructs, such as emotions, turn-taking, rapport, or leadership. A prerequisite for the creation of social machines that are able to support humans in e.g. education, psychotherapy, or human resources is the ability to automatically sense, interpret, and anticipate human nonverbal behaviour. While promising results have been shown in controlled settings, automatically analysing unconstrained situations, e.g. in daily-life settings, remains challenging. Furthermore, anticipation of nonverbal behaviour in social situations is still largely unexplored. The goal of this thesis is to move closer to the vision of social machines in the real world. It makes fundamental contributions along the three dimensions of sensing, interpreting and anticipating nonverbal behaviour in social interactions. First, robust recognition of low-level nonverbal behaviour lays the groundwork for all further analysis steps. Advancing human visual behaviour sensing is especially relevant as the current state of the art is still not satisfactory in many daily-life situations. While many social interactions take place in groups, current methods for unsupervised eye contact detection can only handle dyadic interactions. We propose a novel unsupervised method for multi-person eye contact detection by exploiting the connection between gaze and speaking turns. Furthermore, we make use of mobile device engagement to address the problem of calibration drift that occurs in daily-life usage of mobile eye trackers. Second, we improve the interpretation of social signals in terms of higher level social behaviours. In particular, we propose the first dataset and method for emotion recognition from bodily expressions of freely moving, unaugmented dyads. Furthermore, we are the first to study low rapport detection in group interactions, as well as investigating a cross-dataset evaluation setting for the emergent leadership detection task. Third, human visual behaviour is special because it functions as a social signal and also determines what a person is seeing at a given moment in time. Being able to anticipate human gaze opens up the possibility for machines to more seamlessly share attention with humans, or to intervene in a timely manner if humans are about to overlook important aspects of the environment. We are the first to propose methods for the anticipation of eye contact in dyadic conversations, as well as in the context of mobile device interactions during daily life, thereby paving the way for interfaces that are able to proactively intervene and support interacting humans.Blick, Gesichtsausdrücke, Körpersprache, oder Prosodie spielen als nonverbale Signale eine zentrale Rolle in menschlicher Kommunikation. Sie wurden durch vielzählige Studien mit wichtigen Konzepten wie Emotionen, Sprecherwechsel, Führung, oder der Qualität des Verhältnisses zwischen zwei Personen in Verbindung gebracht. Damit Menschen effektiv während ihres täglichen sozialen Lebens von Maschinen unterstützt werden können, sind automatische Methoden zur Erkennung, Interpretation, und Antizipation von nonverbalem Verhalten notwendig. Obwohl die bisherige Forschung in kontrollierten Studien zu ermutigenden Ergebnissen gekommen ist, bleibt die automatische Analyse nonverbalen Verhaltens in weniger kontrollierten Situationen eine Herausforderung. Darüber hinaus existieren kaum Untersuchungen zur Antizipation von nonverbalem Verhalten in sozialen Situationen. Das Ziel dieser Arbeit ist, die Vision vom automatischen Verstehen sozialer Situationen ein Stück weit mehr Realität werden zu lassen. Diese Arbeit liefert wichtige Beiträge zur autmatischen Erkennung menschlichen Blickverhaltens in alltäglichen Situationen. Obwohl viele soziale Interaktionen in Gruppen stattfinden, existieren unüberwachte Methoden zur Augenkontakterkennung bisher lediglich für dyadische Interaktionen. Wir stellen einen neuen Ansatz zur Augenkontakterkennung in Gruppen vor, welcher ohne manuelle Annotationen auskommt, indem er sich den statistischen Zusammenhang zwischen Blick- und Sprechverhalten zu Nutze macht. Tägliche Aktivitäten sind eine Herausforderung für Geräte zur mobile Augenbewegungsmessung, da Verschiebungen dieser Geräte zur Verschlechterung ihrer Kalibrierung führen können. In dieser Arbeit verwenden wir Nutzerverhalten an mobilen Endgeräten, um den Effekt solcher Verschiebungen zu korrigieren. Neben der Erkennung verbessert diese Arbeit auch die Interpretation sozialer Signale. Wir veröffentlichen den ersten Datensatz sowie die erste Methode zur Emotionserkennung in dyadischen Interaktionen ohne den Einsatz spezialisierter Ausrüstung. Außerdem stellen wir die erste Studie zur automatischen Erkennung mangelnder Verbundenheit in Gruppeninteraktionen vor, und führen die erste datensatzübergreifende Evaluierung zur Detektion von sich entwickelndem Führungsverhalten durch. Zum Abschluss der Arbeit präsentieren wir die ersten Ansätze zur Antizipation von Blickverhalten in sozialen Interaktionen. Blickverhalten hat die besondere Eigenschaft, dass es sowohl als soziales Signal als auch der Ausrichtung der visuellen Wahrnehmung dient. Somit eröffnet die Fähigkeit zur Antizipation von Blickverhalten Maschinen die Möglichkeit, sich sowohl nahtloser in soziale Interaktionen einzufügen, als auch Menschen zu warnen, wenn diese Gefahr laufen wichtige Aspekte der Umgebung zu übersehen. Wir präsentieren Methoden zur Antizipation von Blickverhalten im Kontext der Interaktion mit mobilen Endgeräten während täglicher Aktivitäten, als auch während dyadischer Interaktionen mittels Videotelefonie

Universaar

Acronym

MPG.PuRe