3,181 research outputs found

    SALSA: A Novel Dataset for Multimodal Group Behavior Analysis

    Get PDF
    Studying free-standing conversational groups (FCGs) in unstructured social settings (e.g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels. However, analyzing social scenes involving FCGs is also highly challenging due to the difficulty in extracting behavioral cues such as target locations, their speaking activity and head/body pose due to crowdedness and presence of extreme occlusions. To this end, we propose SALSA, a novel dataset facilitating multimodal and Synergetic sociAL Scene Analysis, and make two main contributions to research on automated social interaction analysis: (1) SALSA records social interactions among 18 participants in a natural, indoor environment for over 60 minutes, under the poster presentation and cocktail party contexts presenting difficulties in the form of low-resolution images, lighting variations, numerous occlusions, reverberations and interfering sound sources; (2) To alleviate these problems we facilitate multimodal analysis by recording the social interplay using four static surveillance cameras and sociometric badges worn by each participant, comprising the microphone, accelerometer, bluetooth and infrared sensors. In addition to raw data, we also provide annotations concerning individuals' personality as well as their position, head, body orientation and F-formation information over the entire event duration. Through extensive experiments with state-of-the-art approaches, we show (a) the limitations of current methods and (b) how the recorded multiple cues synergetically aid automatic analysis of social interactions. SALSA is available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure

    Human Motion Trajectory Prediction: A Survey

    Full text link
    With growing numbers of intelligent autonomous systems in human environments, the ability of such systems to perceive, understand and anticipate human behavior becomes increasingly important. Specifically, predicting future positions of dynamic agents and planning considering such predictions are key tasks for self-driving vehicles, service robots and advanced surveillance systems. This paper provides a survey of human motion trajectory prediction. We review, analyze and structure a large selection of work from different communities and propose a taxonomy that categorizes existing methods based on the motion modeling approach and level of contextual information used. We provide an overview of the existing datasets and performance metrics. We discuss limitations of the state of the art and outline directions for further research.Comment: Submitted to the International Journal of Robotics Research (IJRR), 37 page

    Análise de multidões usando coerência de vizinhança local

    Get PDF
    Large numbers of crowd analysis methods using computer vision have been developed in the past years. This dissertation presents an approach to explore characteristics inherent to human crowds – proxemics, and neighborhood relationship – with the purpose of extracting crowd features and using them for crowd flow estimation and anomaly detection and localization. Given the optical flow produced by any method, the proposed approach compares the similarity of each flow vector and its neighborhood using the Mahalanobis distance, which can be obtained in an efficient manner using integral images. This similarity value is then used either to filter the original optical flow or to extract features that describe the crowd behavior in different resolutions, depending on the radius of the personal space selected in the analysis. To show that the extracted features are indeed relevant, we tested several classifiers in the context of abnormality detection. More precisely, we used Recurrent Neural Networks, Dense Neural Networks, Support Vector Machines, Random Forest and Extremely Random Trees. The two developed approaches (crowd flow estimation and abnormality detection) were tested on publicly available datasets involving human crowded scenarios and compared with state-of-the-art methods.Métodos para análise de ambientes de multidões são amplamente desenvolvidos na área de visão computacional. Esta tese apresenta uma abordagem para explorar características inerentes às multidões humanas - comunicação proxêmica e relações de vizinhança - para extrair características de multidões e usá-las para estimativa de fluxo de multidões e detecção e localização de anomalias. Dado o fluxo óptico produzido por qualquer método, a abordagem proposta compara a similaridade de cada vetor de fluxo e sua vizinhança usando a distância de Mahalanobis, que pode ser obtida de maneira eficiente usando imagens integrais. Esse valor de similaridade é então utilizado para filtrar o fluxo óptico original ou para extrair informações que descrevem o comportamento da multidão em diferentes resoluções, dependendo do raio do espaço pessoal selecionado na análise. Para mostrar que as características são realmente relevantes, testamos vários classificadores no contexto da detecção de anormalidades. Mais precisamente, usamos redes neurais recorrentes, redes neurais densas, máquinas de vetores de suporte, floresta aleatória e árvores extremamente aleatórias. As duas abordagens desenvolvidas (estimativa do fluxo de multidões e detecção de anormalidades) foram testadas em conjuntos de dados públicos, envolvendo cenários de multidões humanas e comparados com métodos estado-da-arte

    Visual Analysis of Extremely Dense Crowded Scenes

    Get PDF
    Visual analysis of dense crowds is particularly challenging due to large number of individuals, occlusions, clutter, and fewer pixels per person which rarely occur in ordinary surveillance scenarios. This dissertation aims to address these challenges in images and videos of extremely dense crowds containing hundreds to thousands of humans. The goal is to tackle the fundamental problems of counting, detecting and tracking people in such images and videos using visual and contextual cues that are automatically derived from the crowded scenes. For counting in an image of extremely dense crowd, we propose to leverage multiple sources of information to compute an estimate of the number of individuals present in the image. Our approach relies on sources such as low confidence head detections, repetition of texture elements (using SIFT), and frequency-domain analysis to estimate counts, along with confidence associated with observing individuals, in an image region. Furthermore, we employ a global consistency constraint on counts using Markov Random Field which caters for disparity in counts in local neighborhoods and across scales. We tested this approach on crowd images with the head counts ranging from 94 to 4543 and obtained encouraging results. Through this approach, we are able to count people in images of high-density crowds unlike previous methods which are only applicable to videos of low to medium density crowded scenes. However, the counting procedure just outputs a single number for a large patch or an entire image. With just the counts, it becomes difficult to measure the counting error for a query image with unknown number of people. For this, we propose to localize humans by finding repetitive patterns in the crowd image. Starting with detections from an underlying head detector, we correlate them within the image after their selection through several criteria: in a pre-defined grid, locally, or at multiple scales by automatically finding the patches that are most representative of recurring patterns in the crowd image. Finally, the set of generated hypotheses is selected using binary integer quadratic programming with Special Ordered Set (SOS) Type 1 constraints. Human Detection is another important problem in the analysis of crowded scenes where the goal is to place a bounding box on visible parts of individuals. Primarily applicable to images depicting medium to high density crowds containing several hundred humans, it is a crucial pre-requisite for many other visual tasks, such as tracking, action recognition or detection of anomalous behaviors, exhibited by individuals in a dense crowd. For detecting humans, we explore context in dense crowds in the form of locally-consistent scale prior which captures the similarity in scale in local neighborhoods with smooth variation over the image. Using the scale and confidence of detections obtained from an underlying human detector, we infer scale and confidence priors using Markov Random Field. In an iterative mechanism, the confidences of detections are modified to reflect consistency with the inferred priors, and the priors are updated based on the new detections. The final set of detections obtained are then reasoned for occlusion using Binary Integer Programming where overlaps and relations between parts of individuals are encoded as linear constraints. Both human detection and occlusion reasoning in this approach are solved with local neighbor-dependent constraints, thereby respecting the inter-dependence between individuals characteristic to dense crowd analysis. In addition, we propose a mechanism to detect different combinations of body parts without requiring annotations for individual combinations. Once human detection and localization is performed, we then use it for tracking people in dense crowds. Similar to the use of context as scale prior for human detection, we exploit it in the form of motion concurrence for tracking individuals in dense crowds. The proposed method for tracking provides an alternative and complementary approach to methods that require modeling of crowd flow. Simultaneously, it is less likely to fail in the case of dynamic crowd flows and anomalies by minimally relying on previous frames. The approach begins with the automatic identification of prominent individuals from the crowd that are easy to track. Then, we use Neighborhood Motion Concurrence to model the behavior of individuals in a dense crowd, this predicts the position of an individual based on the motion of its neighbors. When the individual moves with the crowd flow, we use Neighborhood Motion Concurrence to predict motion while leveraging five-frame instantaneous flow in case of dynamically changing flow and anomalies. All these aspects are then embedded in a framework which imposes hierarchy on the order in which positions of individuals are updated. The results are reported on eight sequences of medium to high density crowds and our approach performs on par with existing approaches without learning or modeling patterns of crowd flow. We experimentally demonstrate the efficacy and reliability of our algorithms by quantifying the performance of counting, localization, as well as human detection and tracking on new and challenging datasets containing hundreds to thousands of humans in a given scene

    Detection and Simulation of Dangerous Human Crowd Behavior

    Get PDF
    Tragically, gatherings of large human crowds quite often end in crowd disasters such as the recent catastrophe at the Loveparade 2010. In the past, research on pedestrian and crowd dynamics focused on simulation of pedestrian motion. As of yet, however, there does not exist any automatic system which can detect hazardous situations in crowds, thus helping to prevent these tragic incidents. In the thesis at hand, we analyze pedestrian behavior in large crowds and observe characteristic motion patterns. Based on our findings, we present a computer vision system that detects unusual events and critical situations from video streams and thus alarms security personnel in order to take necessary actions. We evaluate the system’s performance on synthetic, experimental as well as on real-world data. In particular, we show its effectiveness on the surveillance videos recorded at the Loveparade crowd stampede. Since our method is based on optical flow computations, it meets two crucial prerequisites in video surveillance: Firstly, it works in real-time and, secondly, the privacy of the people being monitored is preserved. In addition to that, we integrate the observed motion patterns into models for simulating pedestrian motion and show that the proposed simulation model produces realistic trajectories. We employ this model to simulate large human crowds and use techniques from computer graphics to render synthetic videos for further evaluation of our automatic video surveillance system
    • …
    corecore