29,455 research outputs found
SALSA: A Novel Dataset for Multimodal Group Behavior Analysis
Studying free-standing conversational groups (FCGs) in unstructured social
settings (e.g., cocktail party ) is gratifying due to the wealth of information
available at the group (mining social networks) and individual (recognizing
native behavioral and personality traits) levels. However, analyzing social
scenes involving FCGs is also highly challenging due to the difficulty in
extracting behavioral cues such as target locations, their speaking activity
and head/body pose due to crowdedness and presence of extreme occlusions. To
this end, we propose SALSA, a novel dataset facilitating multimodal and
Synergetic sociAL Scene Analysis, and make two main contributions to research
on automated social interaction analysis: (1) SALSA records social interactions
among 18 participants in a natural, indoor environment for over 60 minutes,
under the poster presentation and cocktail party contexts presenting
difficulties in the form of low-resolution images, lighting variations,
numerous occlusions, reverberations and interfering sound sources; (2) To
alleviate these problems we facilitate multimodal analysis by recording the
social interplay using four static surveillance cameras and sociometric badges
worn by each participant, comprising the microphone, accelerometer, bluetooth
and infrared sensors. In addition to raw data, we also provide annotations
concerning individuals' personality as well as their position, head, body
orientation and F-formation information over the entire event duration. Through
extensive experiments with state-of-the-art approaches, we show (a) the
limitations of current methods and (b) how the recorded multiple cues
synergetically aid automatic analysis of social interactions. SALSA is
available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure
Monocular tracking of the human arm in 3D: real-time implementation and experiments
We have developed a system capable of tracking a human arm in 3D and in real time. The system is based on a previously developed algorithm for 3D tracking which requires only a monocular view and no special markers on the body. In this paper we describe our real-time system and the insights gained from real-time experimentation
DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation
In real-world crowd counting applications, the crowd densities vary greatly
in spatial and temporal domains. A detection based counting method will
estimate crowds accurately in low density scenes, while its reliability in
congested areas is downgraded. A regression based approach, on the other hand,
captures the general density information in crowded regions. Without knowing
the location of each person, it tends to overestimate the count in low density
areas. Thus, exclusively using either one of them is not sufficient to handle
all kinds of scenes with varying densities. To address this issue, a novel
end-to-end crowd counting framework, named DecideNet (DEteCtIon and Density
Estimation Network) is proposed. It can adaptively decide the appropriate
counting mode for different locations on the image based on its real density
conditions. DecideNet starts with estimating the crowd density by generating
detection and regression based density maps separately. To capture inevitable
variation in densities, it incorporates an attention module, meant to
adaptively assess the reliability of the two types of estimations. The final
crowd counts are obtained with the guidance of the attention module to adopt
suitable estimations from the two kinds of density maps. Experimental results
show that our method achieves state-of-the-art performance on three challenging
crowd counting datasets.Comment: CVPR 201
Forecasting People Trajectories and Head Poses by Jointly Reasoning on Tracklets and Vislets
In this work, we explore the correlation between people trajectories and
their head orientations. We argue that people trajectory and head pose
forecasting can be modelled as a joint problem. Recent approaches on trajectory
forecasting leverage short-term trajectories (aka tracklets) of pedestrians to
predict their future paths. In addition, sociological cues, such as expected
destination or pedestrian interaction, are often combined with tracklets. In
this paper, we propose MiXing-LSTM (MX-LSTM) to capture the interplay between
positions and head orientations (vislets) thanks to a joint unconstrained
optimization of full covariance matrices during the LSTM backpropagation. We
additionally exploit the head orientations as a proxy for the visual attention,
when modeling social interactions. MX-LSTM predicts future pedestrians location
and head pose, increasing the standard capabilities of the current approaches
on long-term trajectory forecasting. Compared to the state-of-the-art, our
approach shows better performances on an extensive set of public benchmarks.
MX-LSTM is particularly effective when people move slowly, i.e. the most
challenging scenario for all other models. The proposed approach also allows
for accurate predictions on a longer time horizon.Comment: Accepted at IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE
INTELLIGENCE 2019. arXiv admin note: text overlap with arXiv:1805.0065
Capturing Dialogue State Variable Dependencies with an Energy-based Neural Dialogue State Tracker
Dialogue state tracking requires the population and maintenance of a multi-slot frame representation of the dialogue state. Frequently, dialogue state tracking systems assume independence between slot values within a frame. In this paper we argue that treating the prediction of each slot value as an independent prediction task may ignore important associations between the slot values, and, consequently, we argue that treating dialogue state tracking as a structured prediction problem can help to improve dialogue state tracking performance. To support this argument, the research presented in this paper is structured into three stages: (i) analyzing variable dependencies in dialogue data; (ii) applying an energy-based methodology to model dialogue state tracking as a structured prediction task; and (iii) evaluating the impact of inter-slot relationships on model performance. Overall, we demonstrate that modelling the associations between target slots with an energy-based formalism improves dialogue state tracking performance in a number of ways
- …