203 research outputs found
3D Object Detection and High-Resolution Traffic Parameters Extraction Using Low-Resolution LiDAR Data
Traffic volume data collection is a crucial aspect of transportation
engineering and urban planning, as it provides vital insights into traffic
patterns, congestion, and infrastructure efficiency. Traditional manual methods
of traffic data collection are both time-consuming and costly. However, the
emergence of modern technologies, particularly Light Detection and Ranging
(LiDAR), has revolutionized the process by enabling efficient and accurate data
collection. Despite the benefits of using LiDAR for traffic data collection,
previous studies have identified two major limitations that have impeded its
widespread adoption. These are the need for multiple LiDAR systems to obtain
complete point cloud information of objects of interest, as well as the
labor-intensive process of annotating 3D bounding boxes for object detection
tasks. In response to these challenges, the current study proposes an
innovative framework that alleviates the need for multiple LiDAR systems and
simplifies the laborious 3D annotation process. To achieve this goal, the study
employed a single LiDAR system, that aims at reducing the data acquisition cost
and addressed its accompanying limitation of missing point cloud information by
developing a Point Cloud Completion (PCC) framework to fill in missing point
cloud information using point density. Furthermore, we also used zero-shot
learning techniques to detect vehicles and pedestrians, as well as proposed a
unique framework for extracting low to high features from the object of
interest, such as height, acceleration, and speed. Using the 2D bounding box
detection and extracted height information, this study is able to generate 3D
bounding boxes automatically without human intervention.Comment: 19 pages, 11 figures. This paper has been submitted for consideration
for presentation at the 103rd Annual Meeting of the Transportation Research
Board, January 202
A Comprehensive Review on Computer Vision Analysis of Aerial Data
With the emergence of new technologies in the field of airborne platforms and
imaging sensors, aerial data analysis is becoming very popular, capitalizing on
its advantages over land data. This paper presents a comprehensive review of
the computer vision tasks within the domain of aerial data analysis. While
addressing fundamental aspects such as object detection and tracking, the
primary focus is on pivotal tasks like change detection, object segmentation,
and scene-level analysis. The paper provides the comparison of various hyper
parameters employed across diverse architectures and tasks. A substantial
section is dedicated to an in-depth discussion on libraries, their
categorization, and their relevance to different domain expertise. The paper
encompasses aerial datasets, the architectural nuances adopted, and the
evaluation metrics associated with all the tasks in aerial data analysis.
Applications of computer vision tasks in aerial data across different domains
are explored, with case studies providing further insights. The paper
thoroughly examines the challenges inherent in aerial data analysis, offering
practical solutions. Additionally, unresolved issues of significance are
identified, paving the way for future research directions in the field of
aerial data analysis.Comment: 112 page
New horizons in the study of child language acquisition
URL to paper on conference site.Naturalistic longitudinal recordings of child development promise to reveal fresh perspectives on fundamental questions of language acquisition. In a pilot effort, we have recorded 230,000 hours of audio-video recordings spanning the first three years of one child's life at home. To study a corpus of this scale and richness, current methods of developmental cognitive science are inadequate. We are developing new methods for data analysis and interpretation that combine pattern recognition algorithms with interactive user interfaces and data visualization. Preliminary speech analysis reveals surprising levels of linguistic fine-tuning by caregivers that may provide crucial support for word learning. Ongoing analyses of the corpus aim to model detailed aspects of the child's language development as a function of learning mechanisms combined with lifetime experience. Plans to collect similar corpora from more children based on a transportable recording system are underway.National Science Foundation (U.S.)MIT Center for Future BankingMassachusetts Institute of Technology. Media LaboratoryUnited States. Office of Naval ResearchUnited States. Dept. of Defens
Application of 2D Homography for High Resolution Traffic Data Collection using CCTV Cameras
Traffic cameras remain the primary source data for surveillance activities
such as congestion and incident monitoring. To date, State agencies continue to
rely on manual effort to extract data from networked cameras due to limitations
of the current automatic vision systems including requirements for complex
camera calibration and inability to generate high resolution data. This study
implements a three-stage video analytics framework for extracting
high-resolution traffic data such vehicle counts, speed, and acceleration from
infrastructure-mounted CCTV cameras. The key components of the framework
include object recognition, perspective transformation, and vehicle trajectory
reconstruction for traffic data collection. First, a state-of-the-art vehicle
recognition model is implemented to detect and classify vehicles. Next, to
correct for camera distortion and reduce partial occlusion, an algorithm
inspired by two-point linear perspective is utilized to extracts the region of
interest (ROI) automatically, while a 2D homography technique transforms the
CCTV view to bird's-eye view (BEV). Cameras are calibrated with a two-layer
matrix system to enable the extraction of speed and acceleration by converting
image coordinates to real-world measurements. Individual vehicle trajectories
are constructed and compared in BEV using two time-space-feature-based object
trackers, namely Motpy and BYTETrack. The results of the current study showed
about +/- 4.5% error rate for directional traffic counts, less than 10% MSE for
speed bias between camera estimates in comparison to estimates from probe data
sources. Extracting high-resolution data from traffic cameras has several
implications, ranging from improvements in traffic management and identify
dangerous driving behavior, high-risk areas for accidents, and other safety
concerns, enabling proactive measures to reduce accidents and fatalities.Comment: 25 pages, 9 figures, this paper was submitted for consideration for
presentation at the 102nd Annual Meeting of the Transportation Research
Board, January 202
Lidar-based scene understanding for autonomous driving using deep learning
With over 1.35 million fatalities related to traffic accidents worldwide, autonomous driving was foreseen at the beginning of this century as a feasible solution to improve security in our roads. Nevertheless, it is meant to disrupt our transportation paradigm, allowing to reduce congestion, pollution, and costs, while increasing the accessibility, efficiency, and reliability of the transportation for both people and goods. Although some advances have gradually been transferred into commercial vehicles in the way of Advanced Driving Assistance Systems (ADAS) such as adaptive cruise control, blind spot detection or automatic parking, however, the technology is far from mature. A full understanding of the scene is actually needed so that allowing the vehicles to be aware of the surroundings, knowing the existing elements of the scene, as well as their motion, intentions and interactions.
In this PhD dissertation, we explore new approaches for understanding driving scenes from 3D LiDAR point clouds by using Deep Learning methods. To this end, in Part I we analyze the scene from a static perspective using independent frames to detect the neighboring vehicles. Next, in Part II we develop new ways for understanding the dynamics of the scene. Finally, in Part III we apply all the developed methods to accomplish higher level challenges such as segmenting moving obstacles while obtaining their rigid motion vector over the ground.
More specifically, in Chapter 2 we develop a 3D vehicle detection pipeline based on a multi-branch deep-learning architecture and propose a Front (FR-V) and a Bird’s Eye view (BE-V) as 2D representations of the 3D point cloud to serve as input for training our models. Later on, in Chapter 3 we apply and further test this method on two real uses-cases, for pre-filtering moving
obstacles while creating maps to better localize ourselves on subsequent days, as well as for vehicle tracking. From the dynamic perspective, in Chapter 4 we learn from the 3D point cloud a novel dynamic feature that resembles optical flow from RGB images. For that, we develop a new approach to leverage RGB optical flow as pseudo ground truth for training purposes but allowing the use of only 3D LiDAR data at inference time. Additionally, in Chapter 5 we explore the benefits of combining classification and regression learning problems to face the optical flow estimation task in a joint coarse-and-fine manner. Lastly, in Chapter 6 we gather the previous methods and demonstrate that with these independent tasks we can guide the learning of higher challenging problems such as segmentation and motion estimation of moving vehicles from our own moving perspective.Con más de 1,35 millones de muertes por accidentes de tráfico en el mundo, a principios de siglo se predijo que la conducción autónoma sería una solución viable para mejorar la seguridad en nuestras carreteras. Además la conducción autónoma está destinada a cambiar nuestros paradigmas de transporte, permitiendo reducir la congestión del tráfico, la contaminación y el coste, a la vez que aumentando la accesibilidad, la eficiencia y confiabilidad del transporte tanto de personas como de mercancías. Aunque algunos avances, como el control de crucero adaptativo, la detección de puntos ciegos o el estacionamiento automático, se han transferido gradualmente a vehículos comerciales en la forma de los Sistemas Avanzados de Asistencia a la Conducción (ADAS), la tecnología aún no ha alcanzado el suficiente grado de madurez. Se necesita una comprensión completa de la escena para que los vehículos puedan entender el entorno, detectando los elementos presentes, así como su movimiento, intenciones e interacciones. En la presente tesis doctoral, exploramos nuevos enfoques para comprender escenarios de conducción utilizando nubes de puntos en 3D capturadas con sensores LiDAR, para lo cual empleamos métodos de aprendizaje profundo. Con este fin, en la Parte I analizamos la escena desde una perspectiva estática para detectar vehículos. A continuación, en la Parte II, desarrollamos nuevas formas de entender las dinámicas del entorno. Finalmente, en la Parte III aplicamos los métodos previamente desarrollados para lograr desafíos de nivel superior, como segmentar obstáculos dinámicos a la vez que estimamos su vector de movimiento sobre el suelo. Específicamente, en el Capítulo 2 detectamos vehículos en 3D creando una arquitectura de aprendizaje profundo de dos ramas y proponemos una vista frontal (FR-V) y una vista de pájaro (BE-V) como representaciones 2D de la nube de puntos 3D que sirven como entrada para entrenar nuestros modelos. Más adelante, en el Capítulo 3 aplicamos y probamos aún más este método en dos casos de uso reales, tanto para filtrar obstáculos en movimiento previamente a la creación de mapas sobre los que poder localizarnos mejor en los días posteriores, como para el seguimiento de vehículos. Desde la perspectiva dinámica, en el Capítulo 4 aprendemos de la nube de puntos en 3D una característica dinámica novedosa que se asemeja al flujo óptico sobre imágenes RGB. Para ello, desarrollamos un nuevo enfoque que aprovecha el flujo óptico RGB como pseudo muestras reales para entrenamiento, usando solo information 3D durante la inferencia. Además, en el Capítulo 5 exploramos los beneficios de combinar los aprendizajes de problemas de clasificación y regresión para la tarea de estimación de flujo óptico de manera conjunta. Por último, en el Capítulo 6 reunimos los métodos anteriores y demostramos que con estas tareas independientes podemos guiar el aprendizaje de problemas de más alto nivel, como la segmentación y estimación del movimiento de vehículos desde nuestra propia perspectivaAmb més d’1,35 milions de morts per accidents de trànsit al món, a principis de segle es va
predir que la conducció autònoma es convertiria en una solució viable per millorar la seguretat
a les nostres carreteres. D’altra banda, la conducció autònoma està destinada a canviar els
paradigmes del transport, fent possible així reduir la densitat del trànsit, la contaminació i
el cost, alhora que augmentant l’accessibilitat, l’eficiència i la confiança del transport tant de
persones com de mercaderies. Encara que alguns avenços, com el control de creuer adaptatiu,
la detecció de punts cecs o l’estacionament automàtic, s’han transferit gradualment a vehicles
comercials en forma de Sistemes Avançats d’Assistència a la Conducció (ADAS), la tecnologia
encara no ha arribat a aconseguir el grau suficient de maduresa. És necessària, doncs, una
total comprensió de l’escena de manera que els vehicles puguin entendre l’entorn, detectant els
elements presents, així com el seu moviment, intencions i interaccions.
A la present tesi doctoral, explorem nous enfocaments per tal de comprendre les diferents
escenes de conducció utilitzant núvols de punts en 3D capturats amb sensors LiDAR, mitjançant
l’ús de mètodes d’aprenentatge profund. Amb aquest objectiu, a la Part I analitzem l’escena des
d’una perspectiva estàtica per a detectar vehicles. A continuació, a la Part II, desenvolupem
noves formes d’entendre les dinàmiques de l’entorn. Finalment, a la Part III apliquem els
mètodes prèviament desenvolupats per a aconseguir desafiaments d’un nivell superior, com, per
exemple, segmentar obstacles dinàmics al mateix temps que estimem el seu vector de moviment
respecte al terra.
Concretament, al Capítol 2 detectem vehicles en 3D creant una arquitectura d’aprenentatge
profund amb dues branques, i proposem una vista frontal (FR-V) i una vista d’ocell (BE-V)
com a representacions 2D del núvol de punts 3D que serveixen com a punt de partida per
entrenar els nostres models. Més endavant, al Capítol 3 apliquem i provem de nou aquest
mètode en dos casos d’ús reals, tant per filtrar obstacles en moviment prèviament a la creació
de mapes en els quals poder localitzar-nos millor en dies posteriors, com per dur a terme
el seguiment de vehicles. Des de la perspectiva dinàmica, al Capítol 4 aprenem una nova
característica dinàmica del núvol de punts en 3D que s’assembla al flux òptic sobre imatges
RGB. Per a fer-ho, desenvolupem un nou enfocament que aprofita el flux òptic RGB com pseudo
mostres reals per a entrenament, utilitzant només informació 3D durant la inferència. Després,
al Capítol 5 explorem els beneficis que s’obtenen de combinar els aprenentatges de problemes
de classificació i regressió per la tasca d’estimació de flux òptic de manera conjunta. Finalment,
al Capítol 6 posem en comú els mètodes anteriors i demostrem que mitjançant aquests processos
independents podem abordar l’aprenentatge de problemes més complexos, com la segmentació
i estimació del moviment de vehicles des de la nostra pròpia perspectiva
Label-efficient learning of LiDAR-based perception models for autonomous driving
Deep learning on 3D LiDAR point clouds is in its infancy stages, with room to grow and improve, especially in the context of automated driving systems. A considerable amount of research has been pointed at this particular application very lately as a means to boost the performance and reliability of self-driving cars. However, the quantity of data needed to supervise perception point cloud-based models is extremely large and costly to annotate. This thesis studies, evaluates and compares state-of-the-art detection networks and label efficient learning techniques, shedding some light on how to train perception models on point clouds with less annotated data
Detecting semantic concepts in digital photographs: low-level features vs. non-homogeneous data fusion
Semantic concepts, such as faces, buildings, and other real world objects, are the most preferred instrument that humans use to navigate through and retrieve visual content from large multimedia databases. Semantic annotation of visual content in large collections is therefore essential if ease of access and use is to be ensured. Classification of images into broad categories such as indoor/outdoor, building/non-building, urban/landscape, people/no-people, etc., allows us to obtain the semantic labels without the full knowledge of all objects in the scene.
Inferring the presence of high-level semantic concepts from low-level visual features is a research
topic that has been attracting a significant amount of interest lately. However, the power of lowlevel visual features alone has been shown to be limited when faced with the task of semantic scene classification in heterogeneous, unconstrained, broad-topic image collections. Multi-modal fusion or combination of information from different modalities has been identified as one possible way of overcoming the limitations of single-mode approaches. In the field of digital photography, the incorporation of readily available camera metadata, i.e. information about the image capture conditions stored in the EXIF header of each image, along with the GPS information, offers a way to move towards a better understanding of the imaged scene.
In this thesis we focus on detection of semantic concepts such as artificial text in video and large buildings in digital photographs, and examine how fusion of low-level visual features with selected camera metadata, using a Support Vector Machine as an integration device, affects the performance of the building detector in a genuine personal photo collection. We implemented two approaches to detection of buildings that combine content-based and the context-based information, and an approach to indoor/outdoor classification based exclusively on camera metadata. An outdoor detection rate of 85.6% was obtained using camera metadata only. The first approach to building detection, based on simple edge orientation-based features extracted at three different scales, has been tested on a dataset of 1720 outdoor images, with a classification accuracy of 88.22%. The second approach integrates the edge orientation-based features with the camera metadata-based features, both at the feature and at the decision level. The fusion approaches have been evaluated using an unconstrained dataset of 8000 genuine consumer photographs. The experiments demonstrate that the fusion approaches outperform the visual features-only approach by of 2-3% on average regardless of the operating point chosen, while all the performance measures are approximately 4% below the upper limit of performance. The early fusion approach consistently improves all performance measures
Dashcam-Enabled Deep Learning Applications for Airport Runway Pavement Distress Detection
23-8193Pavement distress detection plays a vital role in ensuring the safety and longevity of runway infrastructure. This project presents a comprehensive approach to automate distress detection and geolocation on runway pavement using state-of-the-art deep learning techniques. A Faster R-CNN model is trained to accurately identify and classify various distress types, including longitudinal and transverse cracking, weathering, rutting, and depression. The developed model is deployed on a dataset of high-resolution dashcam images captured along the runway, allowing for real-time detection of distresses. Geolocation techniques are employed to accurately map the distresses onto the runway pavement in real-world coordinates. The system implementation and deployment are discussed, emphasizing the importance of a seamless integration into existing infrastructure. The developed distress detection system offers significant benefits to the Utah Department of Transportation (UDOT) by enabling proactive maintenance planning, optimizing resource allocation, and enhancing runway management capabilities. Future potential for advanced distress analysis, integration with other data sources, and continuous model improvement are also explored. The project showcases the potential of low-cost dashcam solutions combined with deep learning for efficient and cost-effective runway distress detection and management
Progress toward an understanding of cortical computation
The additional data, perspectives, questions, and criticisms contributed by the commentaries strengthen our view that local cortical processors coordinate their activity with the context in which it occurs using contextual fields and synchronized population codes. We therefore predict that whereas the specialization of function has been the keynote of this century the coordination of function will be the keynote of the next
- …