Search CORE

313 research outputs found

Detection of Motorcycles in Urban Traffic Using Video Analysis: A Review

Author: Branch John W.
Espinosa Jorge E.
Velastin Carroza Sergio Alejandro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/06/2020
Field of study

Motorcycles are Vulnerable Road Users (VRU) and as such, in addition to bicycles and pedestrians, they are the traffic actors most affected by accidents in urban areas. Automatic video processing for urban surveillance cameras has the potential to effectively detect and track these road users. The present review focuses on algorithms used for detection and tracking of motorcycles, using the surveillance infrastructure provided by CCTV cameras. Given the importance of results achieved by Deep Learning theory in the field of computer vision, the use of such techniques for detection and tracking of motorcycles is also reviewed. The paper ends by describing the performance measures generally used, publicly available datasets (introducing the Urban Motorbike Dataset (UMD) with quantitative evaluation results for different detectors), discussing the challenges ahead and presenting a set of conclusions with proposed future work in this evolving area

Universidad Carlos III de Madrid e-Archivo

3D Object Representations for Recognition.

Author: Xiang Yu
Publication venue
Publication date
Field of study

Object recognition from images is a longstanding and challenging problem in computer vision. The main challenge is that the appearance of objects in images is affected by a number of factors, such as illumination, scale, camera viewpoint, intra-class variability, occlusion, truncation, and so on. How to handle all these factors in object recognition is still an open problem. In this dissertation, I present my efforts in building 3D object representations for object recognition. Compared to 2D appearance based object representations, 3D object representations can capture the 3D nature of objects and better handle viewpoint variation, occlusion and truncation in object recognition. I introduce three new 3D object representations: the 3D aspect part representation, the 3D aspectlet representation and the 3D voxel pattern representation. These representations are built to handle different challenging factors in object recognition. The 3D aspect part representation is able to capture the appearance change of object categories due to viewpoint transformation. The 3D aspectlet representation and the 3D voxel pattern representation are designed to handle occlusions between objects in addition to viewpoint change. Based on these representations, we propose new object recognition methods and conduct experiments on benchmark datasets to verify the advantages of our methods. Furthermore, we introduce, PASCAL3D+, a new large scale dataset for 3D object recognition by aligning objects in images with 3D CAD models. We also propose two novel methods to tackle object co-detection and multiview object tracking using our 3D aspect part representation, and a novel Convolutional Neural Network-based approach for object detection using our 3D voxel pattern representation. In order to track multiple objects in videos, we introduce a new online multi-object tracking framework based on Markov Decision Processes. Lastly, I conclude the dissertation and discuss future steps for 3D object recognition.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120836/1/yuxiang_1.pd

Deep Blue Documents at the University of Michigan

Towards human interaction analysis

Author: Gavari Maedeh Aghaei
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2013
Field of study

Modeling and recognizing human behaviors in a visual surveillance task is receiving increasing attention from computer vision and machine learning researchers. Such a system should deal in particularly with detecting when interactions between people occur and classifying the type of interaction. In this work we study a flexible model for detecting human interactions. This has been done by detecting the people in the scene and retrieving their corresponding pose and position sequentially in each frame of the video. To achieve this goal our work relies on robust object detection algorithm which is based on discriminatively trained part based models to detect the human bodies in videos. We apply a ‘Gaussian Mixture Models based’ method for background subtraction and human segmentation. The output from the segmentation method which is labeled human body is combined with the background subtraction methods to obtain a bounding box around each person in images to improve the task of human body pose detection. To gain more precise pose detection models, we trained the algorithm on large, challenging but reliable dataset (PASCAL 2010). Our method is applied in home-made database comprising depth data from Kinect sensors. After successfully getting in every image sequence the corresponding label for each person as well as their pose and position, understanding of human motion comes naturally which is an important step towards human interaction analysis

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

Learning Birds-Eye View Representations for Autonomous Driving

Author: Roddick Thomas
Publication venue: University of Cambridge
Publication date: 19/03/2021
Field of study

Over the past few years, progress towards the ambitious goal of widespread fully-autonomous vehicles on our roads has accelerated dramatically. This progress has been spurred largely by the success of highly accurate LiDAR sensors, as well the use of detailed high-resolution maps, which together allow a vehicle to navigate its surroundings effectively. Often, however, one or both of these resources may be unavailable, whether due to cost, sensor failure, or the need to operate in an unmapped environment. The aim of this thesis is therefore to demonstrate that it is possible to build detailed three-dimensional representations of traffic scenes using only 2D monocular camera images as input. Such an approach faces many challenges: most notably that 2D images do not provide explicit 3D structure. We overcome this limitation by applying a combination of deep learning and geometry to transform image-based features into an orthographic birds-eye view representation of the scene, allowing algorithms to reason in a metric, 3D space. This approach is applied to solving two challenging perception tasks central to autonomous driving. The first part of this thesis addresses the problem of monocular 3D object detection, which involves determining the size and location of all objects in the scene. Our solution was based on a novel convolutional network architecture that processed features in both the image and birds-eye view perspective. Results on the KITTI dataset showed that this network outperformed existing works at the time, and although more recent works have improved on these results, we conducted extensive analysis to find that our solution performed well in many difficult edge-case scenarios such as objects close to or distant from the camera. In the second part of the thesis, we consider the related problem of semantic map prediction. This consists of estimating a birds-eye view map of the world visible from a given camera, encoding both static elements of the scene such as pavement and road layout, as well as dynamic objects such as vehicles and pedestrians. This was accomplished using a second network that built on the experience from the previous work and achieved convincing performance on two real-world driving datasets. By formulating the maps as an occupancy grid map (a widely used representation from robotics), we were able to demonstrate how predictions could be accumulated across multiple frames, and that doing so further improved the robustness of maps produced by our system.Toyota Motors Europ

Apollo (Cambridge)

Sensor fusion in driving assistance systems

Author: Ponz Vila Aurelio
Publication venue
Publication date: 01/01/2017
Field of study

Mención Internacional en el título de doctorLa vida diaria en los países desarrollados y en vías de desarrollo depende en gran medida del transporte urbano y en carretera. Esta actividad supone un coste importante para sus usuarios activos y pasivos en términos de polución y accidentes, muy habitualmente debidos al factor humano. Los nuevos desarrollos en seguridad y asistencia a la conducción, llamados Advanced Driving Assistance Systems (ADAS), buscan mejorar la seguridad en el transporte, y a medio plazo, llegar a la conducción autónoma. Los ADAS, al igual que la conducción humana, están basados en sensores que proporcionan información acerca del entorno, y la fiabilidad de los sensores es crucial para las aplicaciones ADAS al igual que las capacidades sensoriales lo son para la conducción humana. Una de las formas de aumentar la fiabilidad de los sensores es el uso de la Fusión Sensorial, desarrollando nuevas estrategias para el modelado del entorno de conducción gracias al uso de diversos sensores, y obteniendo una información mejorada a partid de los datos disponibles. La presente tesis pretende ofrecer una solución novedosa para la detección y clasificación de obstáculos en aplicaciones de automoción, usando fusión vii sensorial con dos sensores ampliamente disponibles en el mercado: la cámara de espectro visible y el escáner láser. Cámaras y láseres son sensores comúnmente usados en la literatura científica, cada vez más accesibles y listos para ser empleados en aplicaciones reales. La solución propuesta permite la detección y clasificación de algunos de los obstáculos comúnmente presentes en la vía, como son ciclistas y peatones. En esta tesis se han explorado novedosos enfoques para la detección y clasificación, desde la clasificación empleando clusters de nubes de puntos obtenidas desde el escáner láser, hasta las técnicas de domain adaptation para la creación de bases de datos de imágenes sintéticas, pasando por la extracción inteligente de clusters y la detección y eliminación del suelo en nubes de puntos.Life in developed and developing countries is highly dependent on road and urban motor transport. This activity involves a high cost for its active and passive users in terms of pollution and accidents, which are largely attributable to the human factor. New developments in safety and driving assistance, called Advanced Driving Assistance Systems (ADAS), are intended to improve security in transportation, and, in the mid-term, lead to autonomous driving. ADAS, like the human driving, are based on sensors, which provide information about the environment, and sensors’ reliability is crucial for ADAS applications in the same way the sensing abilities are crucial for human driving. One of the ways to improve reliability for sensors is the use of Sensor Fusion, developing novel strategies for environment modeling with the help of several sensors and obtaining an enhanced information from the combination of the available data. The present thesis is intended to offer a novel solution for obstacle detection and classification in automotive applications using sensor fusion with two highly available sensors in the market: visible spectrum camera and laser scanner. Cameras and lasers are commonly used sensors in the scientific literature, increasingly affordable and ready to be deployed in real world applications. The solution proposed provides obstacle detection and classification for some obstacles commonly present in the road, such as pedestrians and bicycles. Novel approaches for detection and classification have been explored in this thesis, from point cloud clustering classification for laser scanner, to domain adaptation techniques for synthetic dataset creation, and including intelligent clustering extraction and ground detection and removal from point clouds.Programa Oficial de Doctorado en Ingeniería Eléctrica, Electrónica y AutomáticaPresidente: Cristina Olaverri Monreal.- Secretario: Arturo de la Escalera Hueso.- Vocal: José Eugenio Naranjo Hernánde

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Perception and Prediction in Multi-Agent Urban Traffic Scenarios for Autonomous Driving

Author: Bhattacharyya Prarthana
Publication venue: 'University of Waterloo'
Publication date: 15/09/2023
Field of study

In multi-agent urban scenarios, autonomous vehicles navigate an intricate network of interactions with a variety of agents, necessitating advanced perception modeling and trajectory prediction. Research to improve perception modeling and trajectory prediction in autonomous vehicles is fundamental to enhance safety and efficiency in complex driving scenarios. Better data association for 3D multi-object tracking ensures consistent identification and tracking of multiple objects over time, crucial in crowded urban environments to avoid mis-identifications that can lead to unsafe maneuvers or collisions. Effective context modeling for 3D object detection aids in interpreting complex scenes, effectively dealing with challenges like noisy or missing points in sensor data, and occlusions. It enables the system to infer properties of partially observed or obscured objects, enhancing the robustness of the autonomous system in varying conditions. Furthermore, improved trajectory prediction of surrounding vehicles allows an autonomous vehicle to anticipate future actions of other road agents and adapt accordingly, crucial in scenarios like merging lanes, making unprotected turns, or navigating intersections. In essence, these research directions are key to mitigating risks in autonomous driving, and facilitating seamless interaction with other road users. In Part I, we address the task of improving perception modeling for AV systems. Concretely our contributions are: (i) FANTrack introduces a novel application of Convolutional Neural Networks (CNNs) for real-time 3D Multi-object Tracking (MOT) in autonomous driving, addressing challenges such as varying number of targets, track fragmentation, and noisy detections, thereby enhancing the accuracy of perception capabilities for safe and efficient navigation. (ii) FANTrack proposes to leverage both visual and 3D bounding box data, utilizing Siamese networks and hard-mining, to enhance the similarity functions used in data associations for 3D Multi-object Tracking (MOT). (iii) SA-Det3D introduces a globally-adaptive Full Self-Attention (FSA) module for enhanced feature extraction in 3D object detection, overcoming the limitations of traditional convolution-based techniques by facilitating adaptive context aggregation from entire point-cloud data, thereby bolstering perception modeling in autonomous driving. (iv) SA-Det3D also introduces the Deformable Self-Attention (DSA) module, a scalable adaptation for global context assimilation in large-scale point-cloud datasets, designed to select and focus on most informative regions, thereby improving the quality of feature descriptors and perception modeling in autonomous driving. In Part II, we focus on the task of improving trajectory prediction of surrounding agents. Concretely, our contributions are: (i) SSL-Lanes introduces a self-supervised learning approach for motion forecasting in autonomous driving that enhances accuracy and generalizability without compromising inference speed or model simplicity, utilizing pseudo-labels from pretext tasks for learning transferable motion patterns. (ii) The second contribution in SSL-Lanes is the design of comprehensive experiments to demonstrate that SSL-Lanes can yield more generalizable and robust trajectory predictions than traditional supervised learning approaches. (iii) SSL-Interactions presents a new framework that utilizes pretext tasks to enhance interaction modeling for trajectory prediction in autonomous driving. (iv) SSL-Interactions advances the prediction of agent trajectories in interaction-centric scenarios by creating a curated dataset that explicitly labels meaningful interactions, thus enabling the effective training of a predictor utilizing pretext tasks and enhancing the modeling of agent-agent interactions in autonomous driving environments

University of Waterloo's Institutional Repository