154 research outputs found

    Expanded Parts Model for Semantic Description of Humans in Still Images

    Get PDF
    We introduce an Expanded Parts Model (EPM) for recognizing human attributes (e.g. young, short hair, wearing suit) and actions (e.g. running, jumping) in still images. An EPM is a collection of part templates which are learnt discriminatively to explain specific scale-space regions in the images (in human centric coordinates). This is in contrast to current models which consist of a relatively few (i.e. a mixture of) 'average' templates. EPM uses only a subset of the parts to score an image and scores the image sparsely in space, i.e. it ignores redundant and random background in an image. To learn our model, we propose an algorithm which automatically mines parts and learns corresponding discriminative templates together with their respective locations from a large number of candidate parts. We validate our method on three recent challenging datasets of human attributes and actions. We obtain convincing qualitative and state-of-the-art quantitative results on the three datasets.Comment: Accepted for publication in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI

    Calibration-free Pedestrian Partial Pose Estimation Using a High-mounted Kinect

    Get PDF
    Les applications de l’analyse du comportement humain ont subit de rapides développements durant les dernières décades, tant au niveau des systèmes de divertissements que pour des applications professionnelles comme les interfaces humain-machine, les systèmes d’assistance de conduite automobile ou des systèmes de protection des piétons. Cette thèse traite du problème de reconnaissance de piétons ainsi qu’à l’estimation de leur orientation en 3D. Cette estimation est faite dans l’optique que la connaissance de cette orientation est bénéfique tant au niveau de l’analyse que de la prédiction du comportement des piétons. De ce fait, cette thèse propose à la fois une nouvelle méthode pour détecter les piétons et une manière d’estimer leur orientation, par l’intégration séquentielle d’un module de détection et un module d’estimation d’orientation. Pour effectuer cette détection de piéton, nous avons conçu un classificateur en cascade qui génère automatiquement une boîte autour des piétons détectés dans l’image. Suivant cela, des régions sont extraites d’un nuage de points 3D afin de classifier l’orientation du torse du piéton. Cette classification se base sur une image synthétique grossière par tramage (rasterization) qui simule une caméra virtuelle placée immédiatement au-dessus du piéton détecté. Une machine à vecteurs de support effectue la classification à partir de cette image de synthèse, pour l’une des 10 orientations discrètes utilisées lors de l’entrainement (incréments de 30 degrés). Afin de valider les performances de notre approche d’estimation d’orientation, nous avons construit une base de données de référence contenant 764 nuages de points. Ces données furent capturées à l’aide d’une caméra Kinect de Microsoft pour 30 volontaires différents, et la vérité-terrain sur l’orientation fut établie par l’entremise d’un système de capture de mouvement Vicon. Finalement, nous avons démontré les améliorations apportées par notre approche. En particulier, nous pouvons détecter des piétons avec une précision de 95.29% et estimer l’orientation du corps (dans un intervalle de 30 degrés) avec une précision de 88.88%. Nous espérons ainsi que nos résultats de recherche puissent servir de point de départ à d’autres recherches futures.The application of human behavior analysis has undergone rapid development during the last decades from entertainment system to professional one, as Human Robot Interaction (HRI), Advanced Driver Assistance System (ADAS), Pedestrian Protection System (PPS), etc. Meanwhile, this thesis addresses the problem of recognizing pedestrians and estimating their body orientation in 3D based on the fact that estimating a person’s orientation is beneficial in determining their behavior. In this thesis, a new method is proposed for detecting and estimating the orientation, in which the result of a pedestrian detection module and a orientation estimation module are integrated sequentially. For the goal of pedestrian detection, a cascade classifier is designed to draw a bounding box around the detected pedestrian. Following this, extracted regions are given to a discrete orientation classifier to estimate pedestrian body’s orientation. This classification is based on a coarse, rasterized depth image simulating a top-view virtual camera, and uses a support vector machine classifier that was trained to distinguish 10 orientations (30 degrees increments). In order to test the performance of our approach, a new benchmark database contains 764 sets of point cloud for body-orientation classification was captured. For this benchmark, a Kinect recorded the point cloud of 30 participants and a marker-based motion capture system (Vicon) provided the ground truth on their orientation. Finally we demonstrated the improvements brought by our system, as it detected pedestrian with an accuracy of 95:29% and estimated the body orientation with an accuracy of 88:88%.We hope it can provide a new foundation for future researches

    CENTRIST3D : um descritor espaço-temporal para detecção de anomalias em vídeos de multidões

    Get PDF
    Orientador: Hélio PedriniDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O campo de estudo da detecção de anomalias em multidões possui uma vasta gama de aplicações, podendo-se destacar o monitoramento e vigilância de áreas de interesse, tais como aeroportos, bancos, parques, estádios e estações de trens, como uma das mais importantes. Em geral, sistemas de vigilância requerem prossionais qualicados para assistir longas gravações à procura de alguma anomalia, o que demanda alta concentração e dedicação. Essa abordagem tende a ser ineciente, pois os seres humanos estão sujeitos a falhas sob condições de fadiga e repetição devido aos seus próprios limites quanto à capacidade de observação e seu desempenho está diretamente ligado a fatores físicos e psicológicos, os quais podem impactar negativamente na qualidade de reconhecimento. Multidões tendem a se comportar de maneira complexa, possivelmente mudando de orientação e velocidade rapidamente, bem como devido à oclusão parcial ou total. Consequentemente, técnicas baseadas em rastreamento de pedestres ou que dependam de segmentação de fundo geralmente apresentam maiores taxas de erros. O conceito de anomalia é subjetivo e está sujeito a diferentes interpretações, dependendo do contexto da aplicação. Neste trabalho, duas contribuições são apresentadas. Inicialmente, avaliamos a ecácia do descritor CENsus TRansform hISTogram (CENTRIST), originalmente utilizado para categorização de cenas, no contexto de detecção de anomalias em multidões. Em seguida, propusemos o CENTRIST3D, uma versão modicada do CENTRIST que se utiliza de informações espaço-temporais para melhorar a discriminação dos eventos anômalos. Nosso método cria histogramas de características espaço-temporais de quadros de vídeos sucessivos, os quais foram divididos hierarquicamente utilizando um algoritmo modicado da correspondência em pirâmide espacial. Os resultados foram validados em três bases de dados públicas: University of California San Diego (UCSD) Anomaly Detection Dataset, Violent Flows Dataset e University of Minesota (UMN) Dataset. Comparado com outros trabalhos da literatura, CENTRIST3D obteve resultados satisfatórios nas bases Violent Flows e UMN, mas um desempenho abaixo do esperado na base UCSD, indicando que nosso método é mais adequado para cenas com mudanças abruptas em movimento e textura. Por m, mostramos que há evidências de que o CENTRIST3D é um descritor eciente de ser computado, sendo facilmente paralelizável e obtendo uma taxa de quadros por segundo suciente para ser utilizado em aplicações de tempo realAbstract: Crowd abnormality detection is a eld of study with a wide range of applications, where surveillance of interest areas, such as airports, banks, parks, stadiums and subways, is one of the most important purposes. In general, surveillance systems require well-trained personnel to watch video footages in order to search for abnormal events. Moreover, they usually are dependent on human operators, who are susceptible to failure under stressful and repetitive conditions. This tends to be an ineective approach since humans have their own natural limits of observation and their performance is tightly related to their physical and mental state, which might aect the quality of surveillance. Crowds tend to be complex, subject to subtle changes in motion and to partial or total occlusion. Consequently, approaches based on individual pedestrian tracking and background segmentation may suer in quality due to the aforementioned problems. Anomaly itself is a subjective concept, since it depends on the context of the application. Two main contributions are presented in this work. We rst evaluate the eectiveness of the CENsus TRansform hISTogram (CENTRIST) descriptor, initially designed for scene categorization, in crowd abnormality detection. Then, we propose the CENTRIST3D descriptor, a spatio-temporal variation of CENTRIST. Our method creates a histogram of spatiotemporal features from successive frames by extracting histograms of Volumetric Census Transform from a spatial representation using a modied Spatial Pyramid Matching algorithm. Additionally, we test both descriptors in three public data collections: UCSD Anomaly Detection Dataset, Violent Flows Dataset, and UMN Datasets. Compared to other works of the literature, CENTRIST3D achieved satisfactory accuracy rates on both Violent Flows and UMN Datasets, but poor performance on the UCSD Dataset, indicating that our method is more suitable to scenes with fast changes in motion and texture. Finally, we provide evidence that CENTRIST3D is an ecient descriptor to be computed, since it requires little computational time, is easily parallelizable and achieves suitable frame-per-second rates to be used in real-time applicationsMestradoCiência da ComputaçãoMestre em Ciência da Computação1406874159166/2015-2CAPESCNP

    Automatic visual detection of human behavior: a review from 2000 to 2014

    Get PDF
    Due to advances in information technology (e.g., digital video cameras, ubiquitous sensors), the automatic detection of human behaviors from video is a very recent research topic. In this paper, we perform a systematic and recent literature review on this topic, from 2000 to 2014, covering a selection of 193 papers that were searched from six major scientific publishers. The selected papers were classified into three main subjects: detection techniques, datasets and applications. The detection techniques were divided into four categories (initialization, tracking, pose estimation and recognition). The list of datasets includes eight examples (e.g., Hollywood action). Finally, several application areas were identified, including human detection, abnormal activity detection, action recognition, player modeling and pedestrian detection. Our analysis provides a road map to guide future research for designing automatic visual human behavior detection systems.This work is funded by the Portuguese Foundation for Science and Technology (FCT - Fundacao para a Ciencia e a Tecnologia) under research Grant SFRH/BD/84939/2012

    Soft computing and non-parametric techniques for effective video surveillance systems

    Get PDF
    Esta tesis propone varios objetivos interconectados para el diseño de un sistema de vídeovigilancia cuyo funcionamiento es pensado para un amplio rango de condiciones. Primeramente se propone una métrica de evaluación del detector y sistema de seguimiento basada en una mínima referencia. Dicha técnica es una respuesta a la demanda de ajuste de forma rápida y fácil del sistema adecuándose a distintos entornos. También se propone una técnica de optimización basada en Estrategias Evolutivas y la combinación de funciones de idoneidad en varios pasos. El objetivo es obtener los parámetros de ajuste del detector y el sistema de seguimiento adecuados para el mejor funcionamiento en una amplia gama de situaciones posibles Finalmente, se propone la construcción de un clasificador basado en técnicas no paramétricas que pudieran modelar la distribución de datos de entrada independientemente de la fuente de generación de dichos datos. Se escogen actividades detectables a corto plazo que siguen un patrón de tiempo que puede ser fácilmente modelado mediante HMMs. La propuesta consiste en una modificación del algoritmo de Baum-Welch con el fin de modelar las probabilidades de emisión del HMM mediante una técnica no paramétrica basada en estimación de densidad con kernels (KDE). _____________________________________This thesis proposes several interconnected objectives for the design of a video-monitoring system whose operation is thought for a wide rank of conditions. Firstly an evaluation technique of the detector and tracking system is proposed and it is based on a minimum reference or ground-truth. This technique is an answer to the demand of fast and easy adjustment of the system adapting itself to different contexts. Also, this thesis proposes a technique of optimization based on Evolutionary Strategies and the combination of fitness functions. The objective is to obtain the parameters of adjustment of the detector and tracking system for the best operation in an ample range of possible situations. Finally, it is proposed the generation of a classifier in which a non-parametric statistic technique models the distribution of data regardless the source generation of such data. Short term detectable activities are chosen that follow a time pattern that can easily be modeled by Hidden Markov Models (HMMs). The proposal consists in a modification of the Baum-Welch algorithm with the purpose of modeling the emission probabilities of the HMM by means of a nonparametric technique based on the density estimation with kernels (KDE)

    Vision-Based Intersection Monitoring: Behavior Analysis & Safety Issues

    Full text link
    The main objective of my dissertation is to provide a vision-based system to automatically understands traffic patterns and analyze intersections. The system leverages the existing traffic cameras to provide safety and behavior analysis of intersection participants including behavior and safety. The first step is to provide a robust detection and tracking system for vehicles and pedestrians of intersection videos. The appearance and motion based detectors are evaluated on test videos and public available datasets are prepared and evaluated. The contextual fusion method is proposed for detecting pedestrians and motion-based technique is proposed for vehicles based on evaluation results. The detections are feed to the tracking system which uses the mutual cooperation of bipartite graph and enhance optical flow. The enhanced optical flow tracker handles the partial occlusion problem, and it cooperates with the detection module to provide long-term tracks of vehicles and pedestrians. The system evaluation shows 13% and 43% improvement in tracking of vehicles and pedestrians respectively when both participants are addressed by the proposed framework. Finally, trajectories are assessed to provide a comprehensive analysis of safety and behavior of intersection participants including vehicles and pedestrians. Different important applications are addressed such as turning movement count, pedestrians crossing count, turning speed, waiting time, queue length, and surrogate safety measurements. The contribution of the proposed methods are shown through the comparison with ground truths for each mentioned application, and finally heat-maps show benefits of using the proposed system through the visual depiction of intersection usage

    Multi-Cue Pedestrian Recognition

    Full text link
    This thesis addresses the problem of detecting complex, deformable objects in an arbitrary, cluttered environment in sequences of video images. Often, no single best technique exists for such a challenging problem, as different approaches possess different characteristics with regard to detection accuracy, processing speed, or the kind of errors made. Therefore, multi-cue approaches are pursued in this thesis. By combining multiple detection methods, each utilizing a different aspect of the video images, we seek to gain detection accuracy, robustness, and computational efficiency. The first part of this thesis deals with texture classification. In a comparative study, various combinations of feature extraction and classification methods, some of which novel, are examined with respect to classification performance and processing speed, and the relation to the training sample size is analyzed. The integration of shape matching and texture classification is investigated. A pose-specific mixture-of-experts architecture is proposed, where shape matching yields a probabilistic assignment of a texture pattern to a set of distinct pose clusters, each handled by a specialized texture classifier, the local expert. The reduced appearance variability that each local expert needs to cope with leads to improved classification performance. A slight further performance gain could be achieved by shape normalization. The second multi-cue approach deals with cascade systems that employ a sequence of fast-to-complex system modules in order to gain computational efficiency. Three optimization techniques are examined that adjust system parameters so as to optimize the three performance measures detection rate, false positive rate, and processing cost. A combined application of two techniques, a novel fast sequential optimization scheme based on ROC (receiver operating characteristics) frontier following, followed by an iterative gradient descent optimization method, is found to work best. The third method investigated is a Bayesian combination of multiple visual cues. An integrated object detection and tracking framework based on particle filtering is presented. A novel object representation combines mixture models of shape and texture, the former based on a generative point distribution model, the latter on discriminative texture classifiers. The associated observation density function integrates the three visual cues shape, texture, and depth. All methods are extensively evaluated on the problem of detecting pedestrians in urban environment from within a moving vehicle. Large data sets consisting of tens of thousands of video images have been recorded in order to obtain statistically meaningful results

    Survey on Emotional Body Gesture Recognition

    Get PDF
    Automatic emotion recognition has become a trending research topic in the past decade. While works based on facial expressions or speech abound, recognizing affect from body gestures remains a less explored topic. We present a new comprehensive survey hoping to boost research in the field. We first introduce emotional body gestures as a component of what is commonly known as "body language" and comment general aspects as gender differences and culture dependence. We then define a complete framework for automatic emotional body gesture recognition. We introduce person detection and comment static and dynamic body pose estimation methods both in RGB and 3D. We then comment the recent literature related to representation learning and emotion recognition from images of emotionally expressive gestures. We also discuss multi-modal approaches that combine speech or face with body gestures for improved emotion recognition. While pre-processing methodologies (e.g., human detection and pose estimation) are nowadays mature technologies fully developed for robust large scale analysis, we show that for emotion recognition the quantity of labelled data is scarce. There is no agreement on clearly defined output spaces and the representations are shallow and largely based on naive geometrical representations
    • …
    corecore