67,824 research outputs found

    First impressions: A survey on vision-based apparent personality trait analysis

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Personality analysis has been widely studied in psychology, neuropsychology, and signal processing fields, among others. From the past few years, it also became an attractive research area in visual computing. From the computational point of view, by far speech and text have been the most considered cues of information for analyzing personality. However, recently there has been an increasing interest from the computer vision community in analyzing personality from visual data. Recent computer vision approaches are able to accurately analyze human faces, body postures and behaviors, and use these information to infer apparent personality traits. Because of the overwhelming research interest in this topic, and of the potential impact that this sort of methods could have in society, we present in this paper an up-to-date review of existing vision-based approaches for apparent personality trait recognition. We describe seminal and cutting edge works on the subject, discussing and comparing their distinctive features and limitations. Future venues of research in the field are identified and discussed. Furthermore, aspects on the subjectivity in data labeling/evaluation, as well as current datasets and challenges organized to push the research on the field are reviewed.Peer ReviewedPostprint (author's final draft

    Listening for Sirens: Locating and Classifying Acoustic Alarms in City Scenes

    Get PDF
    This paper is about alerting acoustic event detection and sound source localisation in an urban scenario. Specifically, we are interested in spotting the presence of horns, and sirens of emergency vehicles. In order to obtain a reliable system able to operate robustly despite the presence of traffic noise, which can be copious, unstructured and unpredictable, we propose to treat the spectrograms of incoming stereo signals as images, and apply semantic segmentation, based on a Unet architecture, to extract the target sound from the background noise. In a multi-task learning scheme, together with signal denoising, we perform acoustic event classification to identify the nature of the alerting sound. Lastly, we use the denoised signals to localise the acoustic source on the horizon plane, by regressing the direction of arrival of the sound through a CNN architecture. Our experimental evaluation shows an average classification rate of 94%, and a median absolute error on the localisation of 7.5{\deg} when operating on audio frames of 0.5s, and of 2.5{\deg} when operating on frames of 2.5s. The system offers excellent performance in particularly challenging scenarios, where the noise level is remarkably high.Comment: 6 pages, 9 figure

    Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition

    Get PDF
    Aerial scene recognition is a fundamental task in remote sensing and has recently received increased interest. While the visual information from overhead images with powerful models and efficient algorithms yields considerable performance on scene recognition, it still suffers from the variation of ground objects, lighting conditions etc. Inspired by the multi-channel perception theory in cognition science, in this paper, for improving the performance on the aerial scene recognition, we explore a novel audiovisual aerial scene recognition task using both images and sounds as input. Based on an observation that some specific sound events are more likely to be heard at a given geographic location, we propose to exploit the knowledge from the sound events to improve the performance on the aerial scene recognition. For this purpose, we have constructed a new dataset named AuDio Visual Aerial sceNe reCognition datasEt (ADVANCE). With the help of this dataset, we evaluate three proposed approaches for transferring the sound event knowledge to the aerial scene recognition task in a multimodal learning framework, and show the benefit of exploiting the audio information for the aerial scene recognition. The source code is publicly available for reproducibility purposes.Comment: ECCV 202

    Visual to Sound: Generating Natural Sound for Videos in the Wild

    Full text link
    As two of the five traditional human senses (sight, hearing, taste, smell, and touch), vision and sound are basic sources through which humans understand the world. Often correlated during natural events, these two modalities combine to jointly affect human perception. In this paper, we pose the task of generating sound given visual input. Such capabilities could help enable applications in virtual reality (generating sound for virtual scenes automatically) or provide additional accessibility to images or videos for people with visual impairments. As a first step in this direction, we apply learning-based methods to generate raw waveform samples given input video frames. We evaluate our models on a dataset of videos containing a variety of sounds (such as ambient sounds and sounds from people/animals). Our experiments show that the generated sounds are fairly realistic and have good temporal synchronization with the visual inputs.Comment: Project page: http://bvision11.cs.unc.edu/bigpen/yipin/visual2sound_webpage/visual2sound.htm
    corecore