249 research outputs found
Detection of Motorcycles in Urban Traffic Using Video Analysis: A Review
Motorcycles are Vulnerable Road Users (VRU) and as such, in addition to bicycles and pedestrians, they are the traffic actors most affected by accidents in urban areas. Automatic video processing for urban surveillance cameras has the potential to effectively detect and track these road users. The present review focuses on algorithms used for detection and tracking of motorcycles, using the surveillance infrastructure provided by CCTV cameras. Given the importance of results achieved by Deep Learning theory in the field of computer vision, the use of such techniques for detection and tracking of motorcycles is also reviewed. The paper ends by describing the performance measures generally used, publicly available datasets (introducing the Urban Motorbike Dataset (UMD) with quantitative evaluation results for different detectors), discussing the challenges ahead and presenting a set of conclusions with proposed future work in this evolving area
Learning visual representations with deep neural networks for intelligent transportation systems problems
Esta tesis se centra en dos grandes problemas en el área de los sistemas de transportes inteligentes (STI): el conteo de vehículos en escenas de congestión de tráfico; y la detección y estimación del punto de vista, de forma simultánea, de los objetos en una escena.
Respecto al problema del conteo, este trabajo se centra primero en el diseño de arquitecturas de redes neuronales profundas que tengan la capacidad de aprender representaciones multi-escala profundas, capaces de estimar de forma precisa la cuenta de objetos, mediante mapas de densidad. Se trata también el problema de la escala de los objetos introducida por la gran perspectiva típicamente presente en el área de recuento de objetos. Además, con el éxito de las redes hourglass profundas en el campo del conteo de objetos, este trabajo propone un nuevo tipo de red hourglass profunda con conexiones de corto circuito auto-gestionadas. Los modelos propuestos se evalúan en las bases de datos públicas más utilizadas y logran los resultados iguales o superiores al estado del arte en el momento en que fueron publicadas.
Para la segunda parte, se realiza un estudio comparativo completo del problema de detección de objetos y la estimación de la pose de forma simultánea. Se expone el compromiso existente entre la localización del objeto y la estimación de su pose. Un detector necesita idealmente una representación que sea invariable al punto de vista, mientras que un estimador de poses necesita ser discriminatorio. Por lo tanto, se proponen tres nuevas arquitecturas de redes neurales profundas en las que el problema de la detección de objetos y la estimación de la pose se van desacoplando progresivamente. Además, se aborda la cuestión de si la pose debe expresarse como un valor discreto o continuo. A pesar de ofrecer un rendimiento similar, los resultados muestran que los enfoques continuos son más sensibles al sesgo del punto de vista principal de la categoría del objeto. Se realiza un análisis comparativo detallado en las dos bases de datos principales, es decir, PASCAL3D+ y ObjectNet3D. Se logran resultados competitivos con todos los modelos propuestos en ambos conjuntos de datos
Latent Dependency Mining for Solving Regression Problems in Computer Vision
PhDRegression-based frameworks, learning the direct mapping between low-level imagery features
and vector/scalar-formed continuous labels, have been widely exploited in computer vision, e.g.
in crowd counting, age estimation and human pose estimation. In the last decade, many efforts
have been dedicated by researchers in computer vision for better regression fitting. Nevertheless,
solving these computer vision problems with regression frameworks remained a formidable
challenge due to 1) feature variation and 2) imbalance and sparse data. On one hand, large feature
variation can be caused by the changes of extrinsic conditions (i.e. images are taken under
different lighting condition and viewing angles) and also intrinsic conditions (e.g. different aging
process of different persons in age estimation and inter-object occlusion in crowd density
estimation). On the other hand, imbalanced and sparse data distributions can also have an important
effect on regression performance. Apparently, these two challenges existing in regression
learning are related in the sense that the feature inconsistency problem is compounded by sparse
and imbalanced training data and vice versa, and they need be tackled jointly in modelling and
explicitly in representation. This thesis firstly mines an intermediary feature representation consisting
of concatenating spatially localised feature for sharing the information from neighbouring
localised cells in the frames. This thesis secondly introduces the cumulative attribute concept
constructed for learning a regression model by exploiting the latent cumulative dependent nature
of label space in regression, in the application of facial age and crowd density estimation.
The thesis thirdly demonstrates the effectiveness of a discriminative structured-output regression
framework to learn the inherent latent correlation between each element of output variables in
the application of 2D human upper body pose estimation. The effectiveness of the proposed regression
frameworks for crowd counting, age estimation, and human pose estimation is validated
with public benchmarks
Bounding Box-Free Instance Segmentation Using Semi-Supervised Learning for Generating a City-Scale Vehicle Dataset
Vehicle classification is a hot computer vision topic, with studies ranging
from ground-view up to top-view imagery. In remote sensing, the usage of
top-view images allows for understanding city patterns, vehicle concentration,
traffic management, and others. However, there are some difficulties when
aiming for pixel-wise classification: (a) most vehicle classification studies
use object detection methods, and most publicly available datasets are designed
for this task, (b) creating instance segmentation datasets is laborious, and
(c) traditional instance segmentation methods underperform on this task since
the objects are small. Thus, the present research objectives are: (1) propose a
novel semi-supervised iterative learning approach using GIS software, (2)
propose a box-free instance segmentation approach, and (3) provide a city-scale
vehicle dataset. The iterative learning procedure considered: (1) label a small
number of vehicles, (2) train on those samples, (3) use the model to classify
the entire image, (4) convert the image prediction into a polygon shapefile,
(5) correct some areas with errors and include them in the training data, and
(6) repeat until results are satisfactory. To separate instances, we considered
vehicle interior and vehicle borders, and the DL model was the U-net with the
Efficient-net-B7 backbone. When removing the borders, the vehicle interior
becomes isolated, allowing for unique object identification. To recover the
deleted 1-pixel borders, we proposed a simple method to expand each prediction.
The results show better pixel-wise metrics when compared to the Mask-RCNN (82%
against 67% in IoU). On per-object analysis, the overall accuracy, precision,
and recall were greater than 90%. This pipeline applies to any remote sensing
target, being very efficient for segmentation and generating datasets.Comment: 38 pages, 10 figures, submitted to journa
Real Time Fusion of Radioisotope Direction Estimation and Visual Object Tracking
Research into discovering prohibited nuclear material plays an integral role in providing security from terrorism. Although many diverse methods contribute to defense, there exists a capability gap in localizing moving sources. This thesis introduces a real time radioisotope tracking algorithm assisted by visual object tracking methods to fill the capability gap. The proposed algorithm can estimate carrier likelihood for objects in its field of view, and is designed to assist a pedestrian agent wearing a backpack detector. The complex, crowd-filled, urban environments where this algorithm must function combined with the size and weight limitations of a pedestrian system makes designing a functioning algorithm challenging.The contribution of this thesis is threefold. First, a generalized directional estimator is proposed. Second, two state-of-the-art visual object detection and visual object tracking methods are combined into a single tracking algorithm. Third, those outputs are fused to produce a real time radioisotope tracking algorithm. This algorithm is designed for use with the backpack detector built by the IDEAS for WIND research group. This setup takes advantage of recent advances in detector, camera, and computer technologies to meet the challenging physical limitations.The directional estimator operates via gradient boosting regression to predict radioisotope direction with a variance of 50 degrees when trained on a simple laboratory dataset. Under conditions similar to other state-of-the-art methods, the accuracy is comparable. YOLOv3 and SiamFC are chosen by evaluating advanced visual tracking methods in terms of speed and efficiency across multiple architectures, and in terms of accuracy on datasets like the Visual Object Tracking (VOT) Challenge and Common Objects in Context (COCO). The resultant tracking algorithm operates in real time. The outputs of direction estimation and visual tracking are fused using sequential Bayesian inference to predict carrier likelihood. Using lab trials evaluated by hand on visual and nuclear data, and a synthesized challenge dataset using visual data from the Boston Marathon attack, it can be observed that this prototype system advances the state-of-the-art towards localization of a moving source
Automatic counting of mounds on UAV images using computer vision and machine learning
Site preparation by mounding is a commonly used silvicultural treatment that improves tree growth conditions by mechanically creating planting microsites called mounds. Following site preparation, an important planning step is to count the number of mounds, which provides forest managers with an estimate of the number of seedlings required for a given plantation block. In the forest industry, counting the number of mounds is generally conducted through manual field surveys by forestry workers, which is costly and prone to errors, especially for large areas. To address this issue, we present a novel framework exploiting advances in Unmanned Aerial Vehicle (UAV) imaging and computer vision to estimate the number of mounds on a planting block accurately. The proposed framework comprises two main components. First, we exploit a visual recognition method based on a deep learning algorithm for multiple object detection by pixel-based segmentation. This enables a preliminary count of visible mounds and other frequently seen objects on the forest floor (e.g., trees, debris, accumulation of water) to be used to characterize the planting block. Second, since visual recognition could be limited by several perturbation factors (e.g., mound erosion, occlusion), we employ a machine learning estimation function that predicts the final number of mounds based on the local block properties extracted in the first stage. We evaluate the proposed framework on a new UAV dataset representing numerous planting blocks with varying features. The proposed method outperformed manual counting methods in terms of relative counting precision, indicating that it has the potential to be advantageous and efficient under challenging situations
- …