925 research outputs found

    Monocular Camera Viewpoint-Invariant Vehicular Traffic Segmentation and Classification Utilizing Small Datasets

    Get PDF
    The work presented here develops a computer vision framework that is view angle independent for vehicle segmentation and classification from roadway traffic systems installed by the Virginia Department of Transportation (VDOT). An automated technique for extracting a region of interest is discussed to speed up the processing. The VDOT traffic videos are analyzed for vehicle segmentation using an improved robust low-rank matrix decomposition technique. It presents a new and effective thresholding method that improves segmentation accuracy and simultaneously speeds up the segmentation processing. Size and shape physical descriptors from morphological properties and textural features from the Histogram of Oriented Gradients (HOG) are extracted from the segmented traffic. Furthermore, a multi-class support vector machine classifier is employed to categorize different traffic vehicle types, including passenger cars, passenger trucks, motorcycles, buses, and small and large utility trucks. It handles multiple vehicle detections through an iterative k-means clustering over-segmentation process. The proposed algorithm reduced the processed data by an average of 40%. Compared to recent techniques, it showed an average improvement of 15% in segmentation accuracy, and it is 55% faster than the compared segmentation techniques on average. Moreover, a comparative analysis of 23 different deep learning architectures is presented. The resulting algorithm outperformed the compared deep learning algorithms for the quality of vehicle classification accuracy. Furthermore, the timing analysis showed that it could operate in real-time scenarios

    Traffic sign recognition based on human visual perception.

    Get PDF
    This thesis presents a new approach, based on human visual perception, for detecting and recognising traffic signs under different viewing conditions. Traffic sign recognition is an important issue within any driver support system as it is fundamental to traffic safety and increases the drivers' awareness of situations and possible decisions that are ahead. All traffic signs possess similar visual characteristics, they are often the same size, shape and colour. However shapes may be distorted when viewed from different viewing angles and colours are affected by overall luminosity and the presence of shadows. Human vision can identify traffic signs correctly by ignoring this variance of colours and shapes. Consequently traffic sign recognition based on human visual perception has been researched during this project. In this approach two human vision models are adopted to solve the problems above: Colour Appearance Model (CIECAM97s) and Behavioural Model of Vision (BMV). Colour Appearance Model (CIECAM97s) is used to segment potential traffic signs from the image background under different weather conditions. Behavioural Model of Vision (BMV) is used to recognize the potential traffic signs. Results show that segmentation based on CIECAM97s performs better than, or comparable to, other perceptual colour spaces in terms of accuracy. In addition, results illustrate that recognition based on BMV can be used in this project effectively to detect a certain range of shape transformations. Furthermore, a fast method of distinguishing and recognizing the different weather conditions within images has been developed. The results show that 84% recognition rate can be achieved under three weather and different viewing conditions

    Context-unsupervised adversarial network for video sensors

    Get PDF
    This paper is an extended version of our conference paper: Pardàs, M. and Canet, G. Refinement Network for unsupervised on the scene Foreground Segmentation. In Proceedings of the 2020 28th European Signal Processing Conference (EUSIPCO), Amsterdam, The Netherlands , 18–21 January 2021.Foreground object segmentation is a crucial first step for surveillance systems based on networks of video sensors. This problem in the context of dynamic scenes has been widely explored in the last two decades, but it still has open research questions due to challenges such as strong shadows, background clutter and illumination changes. After years of solid work based on statistical background pixel modeling, most current proposals use convolutional neural networks (CNNs) either to model the background or to make the foreground/background decision. Although these new techniques achieve outstanding results, they usually require specific training for each scene, which is unfeasible if we aim at designing software for embedded video systems and smart cameras. Our approach to the problem does not require specific context or scene training, and thus no manual labeling. We propose a network for a refinement step on top of conventional state-of-the-art background subtraction systems. By using a statistical technique to produce a rough mask, we do not need to train the network for each scene. The proposed method can take advantage of the specificity of the classic techniques, while obtaining the highly accurate segmentation that a deep learning system provides. We also show the advantage of using an adversarial network to improve the generalization ability of the network and produce more consistent results than an equivalent non-adversarial network. The results provided were obtained by training the network on a common database, without fine-tuning for specific scenes. Experiments on the unseen part of the CDNet database provided 0.82 a F-score, and 0.87 was achieved for LASIESTA databases, which is a database unrelated to the training one. On this last database, the results outperformed by 8.75% those available in the official table. The results achieved for CDNet are well above those of the methods not based on CNNs, and according to the literature, among the best for the context-unsupervised CNNs systems.This work has been supported by the Spanish Research Agency (AEI) under project PID2020-116907RB-I00.Peer ReviewedPostprint (published version

    Selected topics in video coding and computer vision

    Get PDF
    Video applications ranging from multimedia communication to computer vision have been extensively studied in the past decades. However, the emergence of new applications continues to raise questions that are only partially answered by existing techniques. This thesis studies three selected topics related to video: intra prediction in block-based video coding, pedestrian detection and tracking in infrared imagery, and multi-view video alignment.;In the state-of-art video coding standard H.264/AVC, intra prediction is defined on the hierarchical quad-tree based block partitioning structure which fails to exploit the geometric constraint of edges. We propose a geometry-adaptive block partitioning structure and a new intra prediction algorithm named geometry-adaptive intra prediction (GAIP). A new texture prediction algorithm named geometry-adaptive intra displacement prediction (GAIDP) is also developed by extending the original intra displacement prediction (IDP) algorithm with the geometry-adaptive block partitions. Simulations on various test sequences demonstrate that intra coding performance of H.264/AVC can be significantly improved by incorporating the proposed geometry adaptive algorithms.;In recent years, due to the decreasing cost of thermal sensors, pedestrian detection and tracking in infrared imagery has become a topic of interest for night vision and all weather surveillance applications. We propose a novel approach for detecting and tracking pedestrians in infrared imagery based on a layered representation of infrared images. Pedestrians are detected from the foreground layer by a Principle Component Analysis (PCA) based scheme using the appearance cue. To facilitate the task of pedestrian tracking, we formulate the problem of shot segmentation and present a graph matching-based tracking algorithm. Simulations with both OSU Infrared Image Database and WVU Infrared Video Database are reported to demonstrate the accuracy and robustness of our algorithms.;Multi-view video alignment is a process to facilitate the fusion of non-synchronized multi-view video sequences for various applications including automatic video based surveillance and video metrology. In this thesis, we propose an accurate multi-view video alignment algorithm that iteratively aligns two sequences in space and time. To achieve an accurate sub-frame temporal alignment, we generalize the existing phase-correlation algorithm to 3-D case. We also present a novel method to obtain the ground-truth of the temporal alignment by using supplementary audio signals sampled at a much higher rate. The accuracy of our algorithm is verified by simulations using real-world sequences

    SINet: A Scale-insensitive Convolutional Neural Network for Fast Vehicle Detection

    Full text link
    Vision-based vehicle detection approaches achieve incredible success in recent years with the development of deep convolutional neural network (CNN). However, existing CNN based algorithms suffer from the problem that the convolutional features are scale-sensitive in object detection task but it is common that traffic images and videos contain vehicles with a large variance of scales. In this paper, we delve into the source of scale sensitivity, and reveal two key issues: 1) existing RoI pooling destroys the structure of small scale objects, 2) the large intra-class distance for a large variance of scales exceeds the representation capability of a single network. Based on these findings, we present a scale-insensitive convolutional neural network (SINet) for fast detecting vehicles with a large variance of scales. First, we present a context-aware RoI pooling to maintain the contextual information and original structure of small scale objects. Second, we present a multi-branch decision network to minimize the intra-class distance of features. These lightweight techniques bring zero extra time complexity but prominent detection accuracy improvement. The proposed techniques can be equipped with any deep network architectures and keep them trained end-to-end. Our SINet achieves state-of-the-art performance in terms of accuracy and speed (up to 37 FPS) on the KITTI benchmark and a new highway dataset, which contains a large variance of scales and extremely small objects.Comment: Accepted by IEEE Transactions on Intelligent Transportation Systems (T-ITS

    Artificial Intelligence for Multimedia Signal Processing

    Get PDF
    Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining

    Multiple approaches for assessing mangrove biophysical and biochemical variables using in situ and remote sensing techniques

    Get PDF
    Mangrove forests are important ecosystems and play a key role in maintaining the equilibrium in coastal lagoons and estuaries. However, in recent years, there has been a considerable loss of mangrove extension due to anthropogenic activities. Recent studies suggest that multiple in situ and remote sensing approaches must be carried out to understand the dynamics in these complex ecosystems. Therefore, the objective for this PhD dissertation is to develop multiple techniques for monitoring the seasonal biophysical and biochemical conditions of the mangrove forests. Particular objectives will include: i. Test the feasibility of using a Chlorophyll Content Index from a CCM-200 unit as an estimator of the variation of leaf pigments (chlorophyll-a, chlorophyll-b) content for a range of mangrove species. ii. Assess changes in chlorophyll-a, leaf area, leaf length, and Leaf Area Index between the dry and rainy seasons in a variety of mangrove classes. iii. Assess the seasonal importance of in situ hyperspectral measurements (e.g. 450-1000 nm) for chlorophyll-a determination in a variety of mangrove species. And finally, iv. Determine whether an object-based image analysis approach can provide an accurate classification of mangroves from spaceborne Synthetic Aperture Radar data. The results from these studies could provide reliable information regarding seasonal ecological assessments of mangrove forests using in situ and remote sensing methods
    • …
    corecore