1,419 research outputs found

    SALSA: A Novel Dataset for Multimodal Group Behavior Analysis

    Get PDF
    Studying free-standing conversational groups (FCGs) in unstructured social settings (e.g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels. However, analyzing social scenes involving FCGs is also highly challenging due to the difficulty in extracting behavioral cues such as target locations, their speaking activity and head/body pose due to crowdedness and presence of extreme occlusions. To this end, we propose SALSA, a novel dataset facilitating multimodal and Synergetic sociAL Scene Analysis, and make two main contributions to research on automated social interaction analysis: (1) SALSA records social interactions among 18 participants in a natural, indoor environment for over 60 minutes, under the poster presentation and cocktail party contexts presenting difficulties in the form of low-resolution images, lighting variations, numerous occlusions, reverberations and interfering sound sources; (2) To alleviate these problems we facilitate multimodal analysis by recording the social interplay using four static surveillance cameras and sociometric badges worn by each participant, comprising the microphone, accelerometer, bluetooth and infrared sensors. In addition to raw data, we also provide annotations concerning individuals' personality as well as their position, head, body orientation and F-formation information over the entire event duration. Through extensive experiments with state-of-the-art approaches, we show (a) the limitations of current methods and (b) how the recorded multiple cues synergetically aid automatic analysis of social interactions. SALSA is available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure

    Développement d’un système intelligent de reconnaissance automatisée pour la caractérisation des états de surface de la chaussée en temps réel par une approche multicapteurs

    Get PDF
    Le rôle d’un service dédié à l’analyse de la météo routière est d’émettre des prévisions et des avertissements aux usagers quant à l’état de la chaussée, permettant ainsi d’anticiper les conditions de circulations dangereuses, notamment en période hivernale. Il est donc important de définir l’état de chaussée en tout temps. L’objectif de ce projet est donc de développer un système de détection multicapteurs automatisée pour la caractérisation en temps réel des états de surface de la chaussée (neige, glace, humide, sec). Ce mémoire se focalise donc sur le développement d’une méthode de fusion de données images et sons par apprentissage profond basée sur la théorie de Dempster-Shafer. Les mesures directes pour l’acquisition des données qui ont servi à l’entrainement du modèle de fusion ont été effectuées à l’aide de deux capteurs à faible coût disponibles dans le commerce. Le premier capteur est une caméra pour enregistrer des vidéos de la surface de la route. Le second capteur est un microphone pour enregistrer le bruit de l’interaction pneu-chaussée qui caractérise chaque état de surface. La finalité de ce système est de pouvoir fonctionner sur un nano-ordinateur pour l’acquisition, le traitement et la diffusion de l’information en temps réel afin d’avertir les services d’entretien routier ainsi que les usagers de la route. De façon précise, le système se présente comme suit :1) une architecture d’apprentissage profond classifiant chaque état de surface à partir des images issues de la vidéo sous forme de probabilités ; 2) une architecture d’apprentissage profond classifiant chaque état de surface à partir du son sous forme de probabilités ; 3) les probabilités issues de chaque architecture ont été ensuite introduites dans le modèle de fusion pour obtenir la décision finale. Afin que le système soit léger et moins coûteux, il a été développé à partir d’architectures alliant légèreté et précision à savoir Squeeznet pour les images et M5 pour le son. Lors de la validation, le système a démontré une bonne performance pour la détection des états surface avec notamment 87,9 % pour la glace noire et 97 % pour la neige fondante

    Signal processing techniques for robust sound event recognition

    Get PDF
    The computational analysis of acoustic scenes is today a topic of major interest, with a growing community focused on designing machines capable of identifying and understanding the sounds produced in our environment, similar to how humans perform this task. Although these domains have not reached the industrial popularity of other related audio domains, such as speech recognition or music analysis, applications designed to identify the occurrence of sounds in a given scenario are rapidly increasing. These applications are usually limited to a set of sound classes, which must be defined beforehand. In order to train sound classification models, representative sets of sound events are recorded and used as training data. However, the acoustic conditions present during the collection of training examples may not coincide with the conditions during application testing. Background noise, overlapping sound events or weakly segmented data, among others, may substantially affect audio data, lowering the actual performance of the learned models. To avoid such situations, machine learning systems have to be designed with the ability to generalize to data collected under conditions different from the ones seen during training. Traditionally, the techniques used to carry out tasks related to the computational understanding of sound events have been inspired by similar domains such as music or speech, so the features selected to represent acoustic events come from those specific domains. Most of the contributions of this thesis are based on how such features are suitably applied for sound event recognition, proposing specific methods to adapt the features extracted both within classical recognition approaches and modern end-to-end convolutional neural networks. The objective of this thesis is therefore to develop novel signal processing techniques aimed at increasing the robustness of the features representing acoustic events to adverse conditions affecting the mismatch between the training and test conditions in model learning. To achieve such objective, we start first by analyzing the importance of classical feature sets such as Mel-frequency cepstral coefficients (MFCCs) or the energies extracted from log-mel filterbanks, analyzing as well the impact of noise, reverberveration or segmentation errors in diverse scenarios. We show that the performance of both classical and deep learning-based approaches is severely affected by these factors and we propose novel signal processing techniques designed to improve their robustness by means of the non-linear transformation of feature vectors along the temporal axis. Such transformation is based on the so called event trace, which can be interpreted as an indicator of the temporal activity of the event within the feature space. Finally, we propose the use of the energy envelope as a target for event detection, which implies the change from a classification-based approach to a regression-oriented one

    State of the art of audio- and video based solutions for AAL

    Get PDF
    Working Group 3. Audio- and Video-based AAL ApplicationsIt is a matter of fact that Europe is facing more and more crucial challenges regarding health and social care due to the demographic change and the current economic context. The recent COVID-19 pandemic has stressed this situation even further, thus highlighting the need for taking action. Active and Assisted Living (AAL) technologies come as a viable approach to help facing these challenges, thanks to the high potential they have in enabling remote care and support. Broadly speaking, AAL can be referred to as the use of innovative and advanced Information and Communication Technologies to create supportive, inclusive and empowering applications and environments that enable older, impaired or frail people to live independently and stay active longer in society. AAL capitalizes on the growing pervasiveness and effectiveness of sensing and computing facilities to supply the persons in need with smart assistance, by responding to their necessities of autonomy, independence, comfort, security and safety. The application scenarios addressed by AAL are complex, due to the inherent heterogeneity of the end-user population, their living arrangements, and their physical conditions or impairment. Despite aiming at diverse goals, AAL systems should share some common characteristics. They are designed to provide support in daily life in an invisible, unobtrusive and user-friendly manner. Moreover, they are conceived to be intelligent, to be able to learn and adapt to the requirements and requests of the assisted people, and to synchronise with their specific needs. Nevertheless, to ensure the uptake of AAL in society, potential users must be willing to use AAL applications and to integrate them in their daily environments and lives. In this respect, video- and audio-based AAL applications have several advantages, in terms of unobtrusiveness and information richness. Indeed, cameras and microphones are far less obtrusive with respect to the hindrance other wearable sensors may cause to one’s activities. In addition, a single camera placed in a room can record most of the activities performed in the room, thus replacing many other non-visual sensors. Currently, video-based applications are effective in recognising and monitoring the activities, the movements, and the overall conditions of the assisted individuals as well as to assess their vital parameters (e.g., heart rate, respiratory rate). Similarly, audio sensors have the potential to become one of the most important modalities for interaction with AAL systems, as they can have a large range of sensing, do not require physical presence at a particular location and are physically intangible. Moreover, relevant information about individuals’ activities and health status can derive from processing audio signals (e.g., speech recordings). Nevertheless, as the other side of the coin, cameras and microphones are often perceived as the most intrusive technologies from the viewpoint of the privacy of the monitored individuals. This is due to the richness of the information these technologies convey and the intimate setting where they may be deployed. Solutions able to ensure privacy preservation by context and by design, as well as to ensure high legal and ethical standards are in high demand. After the review of the current state of play and the discussion in GoodBrother, we may claim that the first solutions in this direction are starting to appear in the literature. A multidisciplinary 4 debate among experts and stakeholders is paving the way towards AAL ensuring ergonomics, usability, acceptance and privacy preservation. The DIANA, PAAL, and VisuAAL projects are examples of this fresh approach. This report provides the reader with a review of the most recent advances in audio- and video-based monitoring technologies for AAL. It has been drafted as a collective effort of WG3 to supply an introduction to AAL, its evolution over time and its main functional and technological underpinnings. In this respect, the report contributes to the field with the outline of a new generation of ethical-aware AAL technologies and a proposal for a novel comprehensive taxonomy of AAL systems and applications. Moreover, the report allows non-technical readers to gather an overview of the main components of an AAL system and how these function and interact with the end-users. The report illustrates the state of the art of the most successful AAL applications and functions based on audio and video data, namely (i) lifelogging and self-monitoring, (ii) remote monitoring of vital signs, (iii) emotional state recognition, (iv) food intake monitoring, activity and behaviour recognition, (v) activity and personal assistance, (vi) gesture recognition, (vii) fall detection and prevention, (viii) mobility assessment and frailty recognition, and (ix) cognitive and motor rehabilitation. For these application scenarios, the report illustrates the state of play in terms of scientific advances, available products and research project. The open challenges are also highlighted. The report ends with an overview of the challenges, the hindrances and the opportunities posed by the uptake in real world settings of AAL technologies. In this respect, the report illustrates the current procedural and technological approaches to cope with acceptability, usability and trust in the AAL technology, by surveying strategies and approaches to co-design, to privacy preservation in video and audio data, to transparency and explainability in data processing, and to data transmission and communication. User acceptance and ethical considerations are also debated. Finally, the potentials coming from the silver economy are overviewed.publishedVersio

    Robust Speech Recognition for Adverse Environments

    Get PDF
    • …
    corecore