4 research outputs found
Vehicle speed estimation by license plate detection and tracking
Speed control systems are used in most countries to enforce speed limits and, consequently, to prevent accidents. Most of such systems are based on intrusive technologies which require complex installation and maintenance, usually causing traffic disturbance. In this work, we propose a non-intrusive video-based system for vehicle speed estimation. The proposed system detects moving vehicles using an optimized motion detector. We apply a specialized text detector to locate the vehicle’s license plate region, in which stable features are selected for tracking. The tracked features are then filtered and rectified for perspective distortion. Vehicle speed is estimated by comparing the trajectory of the tracked features to known real world measures. For our tests, we used almost five hours of videos in different conditions, captured by a single low-cost camera positioned at 5.5 meters height. The recorded videos contain more than 8,000 vehicles, in three different road lanes, with associated ground truth speeds obtained from an inductive loop detector. We compared our license plate detector with three other state-of-the-art text detectors, and our approach has shown the best performance for our dataset, attaining a precision of 0.93 and a recall of 0.87. Vehicle speeds were estimated with an average error of -0.5 km/h, staying inside the +2/-3 km/h limit determined by regulatory authorities in several countries in over 96.0% of the cases.CNPqSistemas de controle de velocidade sĂŁo utilizados em vários paĂses para fiscalizar o cumprimento dos limites de velocidade, prevenindo assim acidentes de trânsito. Muitos desses sistemas sĂŁo baseados em tecnologias intrusivas que requerem processos de instalação e manutenção complexos, geralmente atrapalhando o trânsito. Neste projeto, propõe-se um sistema nĂŁo intrusivo para estimativa da velocidade de veĂculos baseado em vĂdeo. O sistema proposto detecta veĂculos em movimento utilizando um detector de movimento otimizado. Aplicou-se um detector de texto especializado para localizar a placa dos veĂculos, a qual foi utilizada para seleção e rastreamento de pontos estáveis. Os pontos rastreados sĂŁo entĂŁo filtrados e retificados para remoção do efeito da perspectiva. A velocidade dos veĂculos Ă© estimada comparando-se a trajetĂłria dos pontos rastreados com dimensões conhecidas no mundo. Para os testes, utilizou-se aproximadamente cinco horas de vĂdeos em diferentes condições, capturados por uma câmera de baixo custo posicionada a 5,5 metros de altura. Os vĂdeos capturados contĂ©m mais de 8.000 veĂculos distribuĂdos em trĂŞs pistas diferentes, com as velocidades reais para cada veĂculo obtidas a partir de um detector por laço indutivo. O detector de placas proposto foi comparado com trĂŞs outros mĂ©todos no estado da arte e obteve os melhores resultados de performance para os nossos vĂdeos, com precisĂŁo (precision) de 0,93 e coeficiente de revocação (recall) de 0,87. A estimativa de velocidade dos veĂculos apresentou erro mĂ©dio de -0,5 km/h, permanecendo dentro da margem de +2/-3 km/h, determinada por agĂŞncias reguladoras em vários paĂses, em 96,0% dos casos
Machine Learning for Human Action Recognition and Pose Estimation based on 3D Information
La reconnaissance d'actions humaines en 3D est une tâche difficile en raisonde la complexité de mouvements humains et de la variété des poses et desactions accomplies par différents sujets. Les technologies récentes baséessur des capteurs de profondeur peuvent fournir les représentationssquelettiques à faible coût de calcul, ce qui est une information utilepour la reconnaissance d'actions.Cependant, ce type de capteurs se limite à des environnementscontrôlés et génère fréquemment des données bruitées. Parallèlement à cesavancées technologiques, les réseaux de neurones convolutifs (CNN) ontmontré des améliorations significatives pour la reconnaissance d’actions etpour l’estimation de la pose humaine en 3D à partir des images couleurs.Même si ces problèmes sont étroitement liés, les deux tâches sont souventtraitées séparément dans la littérature.Dans ce travail, nous analysons le problème de la reconnaissance d'actionshumaines dans deux scénarios: premièrement, nous explorons lescaractéristiques spatiales et temporelles à partir de représentations desquelettes humains, et qui sont agrégées par une méthoded'apprentissage de métrique. Dans le deuxième scénario, nous montrons nonseulement l'importance de la précision de la pose en 3D pour lareconnaissance d'actions, mais aussi que les deux tâches peuvent êtreefficacement effectuées par un seul réseau de neurones profond capabled'obtenir des résultats du niveau de l'état de l'art.De plus, nous démontrons que l'optimisation de bout en bout en utilisant lapose comme contrainte intermédiaire conduit à une précision plus élevée sur latâche de reconnaissance d'action que l'apprentissage séparé de ces tâches. Enfin, nous proposons une nouvellearchitecture adaptable pour l’estimation de la pose en 3D et la reconnaissancede l’actions simultanément et en temps réel. Cette architecture offre une gammede compromis performances vs vitesse avec une seule procédure d’entraînementmultitâche et multimodale.3D human action recognition is a challenging task due to the complexity ofhuman movements and to the variety on poses and actions performed by distinctsubjects. Recent technologies based on depth sensors can provide 3D humanskeletons with low computational cost, which is an useful information foraction recognition. However, such low cost sensors are restricted tocontrolled environment and frequently output noisy data. Meanwhile,convolutional neural networks (CNN) have shown significant improvements onboth action recognition and 3D human pose estimation from RGB images. Despitebeing closely related problems, the two tasks are frequently handled separatedin the literature. In this work, we analyze the problem of 3D human actionrecognition in two scenarios: first, we explore spatial and temporalfeatures from human skeletons, which are aggregated by a shallow metriclearning approach. In the second scenario, we not only show that precise 3Dposes are beneficial to action recognition, but also that both tasks can beefficiently performed by a single deep neural network and stillachieves state-of-the-art results. Additionally, wedemonstrate that optimization from end-to-end using poses as an intermediateconstraint leads to significant higher accuracy on the action task thanseparated learning. Finally, we propose a new scalable architecture forreal-time 3D pose estimation and action recognition simultaneously, whichoffers a range of performance vs speed trade-off with a single multimodal andmultitask training procedure
Learning features combination for human action recognition from skeleton sequences
International audienceHuman action recognition is a challenging task due to the complexity of human movements and to the variety among the same actions performed by distinct subjects. Recent technologies provide the skeletal representation of human body extracted in real time from depth maps, which is a high dis-criminant information for efficient action recognition. In this context, we present a new framework for human action recognition from skeleton sequences. We propose extracting sets of spatial and temporal local features from subgroups of joints, which are aggregated by a robust method based on the VLAD algorithm and a pool of clusters. Several feature vectors are then combined by a metric learning method inspired by the LMNN algorithm with the objective to improve the classification accuracy using the nonparametric k-NN classifier. We evaluated our method on three public datasets, including the MSR-Action3D, the UTKinect-Action3D, and the Florence 3D Actions dataset. As a result, the proposed framework performance overcomes the methods in the state of the art on all the experiments
SSP-Net: Scalable sequential pyramid networks for real-Time 3D human pose regression
International audienceIn this paper we propose a highly scalable convolutional neural networks, end-to-end trainable, for real-time 3D human pose regression from still RGB images. We call this approach Scalable Sequential Pyramid Networks (SSP-Net) as it is trained with refined supervision at multiple scales in a sequential manner. Our network requires a single training procedure and is capable of producing its best predictions at 120 frames per second (FPS), or acceptable predictions at more than 200 FPS when cut at test time. We show that the proposed regression approach is invariant to the size of feature maps, allowing our method to perform multi-resolution intermediate supervisions and reaching results comparable to the state-of-the-art with very low resolution feature maps. We demonstrate the accuracy and the effectiveness of our method by providing extensive experiments on two of the most important publicly available datasets for 3D pose estimation, Human3.6M and MPI-INF-3DHP. Additionally, we provide relevant insights about our decisions on the network architecture and show its flexibility to meet the best precision-speed compromise