5,629 research outputs found

    SALSA: A Novel Dataset for Multimodal Group Behavior Analysis

    Get PDF
    Studying free-standing conversational groups (FCGs) in unstructured social settings (e.g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels. However, analyzing social scenes involving FCGs is also highly challenging due to the difficulty in extracting behavioral cues such as target locations, their speaking activity and head/body pose due to crowdedness and presence of extreme occlusions. To this end, we propose SALSA, a novel dataset facilitating multimodal and Synergetic sociAL Scene Analysis, and make two main contributions to research on automated social interaction analysis: (1) SALSA records social interactions among 18 participants in a natural, indoor environment for over 60 minutes, under the poster presentation and cocktail party contexts presenting difficulties in the form of low-resolution images, lighting variations, numerous occlusions, reverberations and interfering sound sources; (2) To alleviate these problems we facilitate multimodal analysis by recording the social interplay using four static surveillance cameras and sociometric badges worn by each participant, comprising the microphone, accelerometer, bluetooth and infrared sensors. In addition to raw data, we also provide annotations concerning individuals' personality as well as their position, head, body orientation and F-formation information over the entire event duration. Through extensive experiments with state-of-the-art approaches, we show (a) the limitations of current methods and (b) how the recorded multiple cues synergetically aid automatic analysis of social interactions. SALSA is available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure

    Continuous Estimation of Emotions in Speech by Dynamic Cooperative Speaker Models

    Get PDF
    Automatic emotion recognition from speech has been recently focused on the prediction of time-continuous dimensions (e.g., arousal and valence) of spontaneous and realistic expressions of emotion, as found in real-life interactions. However, the automatic prediction of such emotions poses several challenges, such as the subjectivity found in the definition of a gold standard from a pool of raters and the issue of data scarcity in training models. In this work, we introduce a novel emotion recognition system, based on ensemble of single-speaker-regression-models (SSRMs). The estimation of emotion is provided by combining a subset of the initial pool of SSRMs selecting those that are most concordance among them. The proposed approach allows the addition or removal of speakers from the ensemble without the necessity to re-build the entire machine learning system. The simplicity of this aggregation strategy, coupled with the flexibility assured by the modular architecture, and the promising results obtained on the RECOLA database highlight the potential implications of the proposed method in a real-life scenario and in particular in WEB-based applications

    Feasibility of Ocean Acoustic Waveguide Remote Sensing (OAWRS) of Atlantic Cod with Seafloor Scattering Limitations

    Get PDF
    Recently reported declines in the population of Atlantic cod have led to calls for additional survey methods for stock assessments. In combination with conventional line-transect methods that may have ambiguities in sampling fish populations, Ocean Acoustic Waveguide Remote Sensing (OAWRS) has been shown to have a potential for providing accurate stock assessments (Makris N.C., et al. Science 2009, 323, 1,734–1,737; 54th Northeast Regional Stock Assessment Workshop (54th SAW) US Department of Commerce, Northeast Fisheries Science Center, 2012). The use of OAWRS technology enables instantaneous wide-area sensing of fish aggregations over thousands of square kilometers. The ratio of the intensity of scattered returns from fish versus the seafloor in any resolution cell typically determines the maximum fish detection range of OAWRS, which then is a function of fish population density, scattering amplitude and depth distribution, as well as the level of seafloor scattering. With the knowledge of oceanographic parameters, such as bathymetry, sound speed structure and attenuation, we find that a Rayleigh–Born volume scattering approach can be used to efficiently and accurately estimate seafloor scattering over wide areas. From hundreds of OAWRS measurements of seafloor scattering, we determine the Rayleigh–Born scattering amplitude of the seafloor, which we find has a ƒ[superscript 2,4] frequency dependence below roughly 2 kHz in typical continental shelf environments along the US northeast coast. We then find that it is possible to robustly detect cod aggregations across frequencies at and near swim bladder resonance for observed spawningconfigurations along the U.S. northeast coast, roughly the two octave range 150–600 Hzfor water depths up to roughly 100 m. This frequency range is also optimal for long-rangeocean acoustic waveguide propagation, because it enables multimodal acoustic waveguidepropagation with minimal acoustic absorption and forward scattering losses. As the sensingfrequency moves away from the resonance peak, OAWRS detection of cod becomesincreasingly less optimal, due to a rapid decrease in cod scattering amplitude. In otherenvironments where cod depth may be greater, the optimal frequencies for cod detectionare expected to increase with swim bladder resonance frequency.National Oceanographic Partnership Program (U.S.)United States. Office of Naval ResearchUnited States. National Oceanic and Atmospheric Administratio
    • …
    corecore