16 research outputs found

    Enhancing Multi-Object Detection in Video Content: Exploring Hybrid Techniques and Method Combinations for Improved Classification Accuracy

    Get PDF
    The classification of objects within video content holds significant importance, particularly in the context of automated visual surveillance systems. Object classification refers to the procedure of categorizing objects into predefined and semantically meaningful groups based on their features. While humans find object classification in videos to be straightforward, machines face complexity and challenges in this task due to various factors like object size, occlusion, scaling, lighting conditions, and more. Consequently, the demand for analyzing video sequences has spurred the development of various techniques for object classification. This paper proposes hybrid techniques for multi object detection. The experimental analysis focused on a vehicles-openimages dataset containing 627 different catagories of vehicles. The results emphasize the profound impact of method combinations on image classification accuracy. Two primary methods, wavelet transformation and Principal Component Analysis (PCA), were employed alongside Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN). The evaluation encompassed performance metrics, including accuracy, precision, recall, specificity, and F1 score. In the analysis, Wavelet + RNN" combination consistently achieved the highest accuracy across all performance metrics, including accuracy percentage (96.76%), precision (96.76%), recall (86.32%), F1 score (87.12%), and specificity (87.43%). In addition, the hybrid classifiers were subjected for image classification of different vehicle catagories. In the analysis of different catagories, Wavelet + RNN" emerges as the standout performer, consistently achieving high accuracy percentages across all object categories, ranging from 82.87% for identifying People to 90.12% for recognizing Trucks

    Real time motion estimation using a neural architecture implemented on GPUs

    Get PDF
    This work describes a neural network based architecture that represents and estimates object motion in videos. This architecture addresses multiple computer vision tasks such as image segmentation, object representation or characterization, motion analysis and tracking. The use of a neural network architecture allows for the simultaneous estimation of global and local motion and the representation of deformable objects. This architecture also avoids the problem of finding corresponding features while tracking moving objects. Due to the parallel nature of neural networks, the architecture has been implemented on GPUs that allows the system to meet a set of requirements such as: time constraints management, robustness, high processing speed and re-configurability. Experiments are presented that demonstrate the validity of our architecture to solve problems of mobile agents tracking and motion analysis.This work was partially funded by the Spanish Government DPI2013-40534-R grant and Valencian Government GV/2013/005 grant

    Boosting video tracking performance by means of Tabu Search in Intelligent Visual Surveillance Systems

    Get PDF
    In this paper, we present a fast and efficient technique for the data association problem applied to visual tracking systems. Visual tracking process is formulated as a combinatorial hypotheses search with a heuristic evaluation function taking into account structural and specific information such as distance, shape, color, etc. We introduce a Tabu Search algorithm which performs a search on an indirect space. A novel problem formulation allows us to transform any solution into the real search space, which is needed for fitness calculation, in linear time. This new formulation and the use of auxiliary structures yields a fast transformation from a blob-to-track assignment space to the real shape and position of tracks space (while calculating fitness in an incremental fashion), which is key in order to produce efficient and fast results. Other previous approaches are based on statistical techniques or on evolutionary algorithms. These techniques are quite efficient and robust although they cannot converge as fast as our approach.This work was supported in part by Projects CICYT TIN2008-06742-C02-02/TSI, CICYT TEC2008-06732-C02-02/TEC, CAM CONTEXTS (S2009/TIC-1485) and DPS2008-07029-C02-02.Publicad

    Deep learning of appearance affinity for multi-object tracking and re-identification: a comparative view

    Get PDF
    Recognizing the identity of a query individual in a surveillance sequence is the core of Multi-Object Tracking (MOT) and Re-Identification (Re-Id) algorithms. Both tasks can be addressed by measuring the appearance affinity between people observations with a deep neural model. Nevertheless, the differences in their specifications and, consequently, in the characteristics and constraints of the available training data for each one of these tasks, arise from the necessity of employing different learning approaches to attain each one of them. This article offers a comparative view of the Double-Margin-Contrastive and the Triplet loss function, and analyzes the benefits and drawbacks of applying each one of them to learn an Appearance Affinity model for Tracking and Re-Identification. A batch of experiments have been conducted, and their results support the hypothesis concluded from the presented study: Triplet loss function is more effective than the Contrastive one when an Re-Id model is learnt, and, conversely, in the MOT domain, the Contrastive loss can better discriminate between pairs of images rendering the same person or not.This research was funded by the Spanish Government through the CICYT projects (TRA2016-78886-C3-1-R and RTI2018-096036-B-C21), Universidad Carlos III of Madrid through (PEAVAUTO-CM-UC3M), the Comunidad de Madrid through SEGVAUTO-4.0-CM (P2018/EMT-4362), and the Ministerio de Educación, Cultura y Deporte para la Formación de Profesorado Universitario (FPU14/02143)

    Exploring entropy measurements to identify multi-occupancy in activities of daily living

    Get PDF
    Human Activity Recognition (HAR) is the process of automatically detecting human actions from the data collected from different types of sensors. Research related to HAR has devoted particular attention to monitoring and recognizing the human activities of a single occupant in a home environment, in which it is assumed that only one person is present at any given time. Recognition of the activities is then used to identify any abnormalities within the routine activities of daily living. Despite the assumption in the published literature, living environments are commonly occupied by more than one person and/or accompanied by pet animals. In this paper, a novel method based on different entropy measures, including Approximate Entropy (ApEn), Sample Entropy (SampEn), and Fuzzy Entropy (FuzzyEn), is explored to detect and identify a visitor in a home environment. The research has mainly focused on when another individual visits the main occupier, and it is, therefore, not possible to distinguish between their movement activities. The goal of this research is to assess whether entropy measures can be used to detect and identify the visitor in a home environment. Once the presence of the main occupier is distinguished from others, the existing activity recognition and abnormality detection processes could be applied for the main occupier. The proposed method is tested and validated using two different datasets. The results obtained from the experiments show that the proposed method could be used to detect and identify a visitor in a home environment with a high degree of accuracy based on the data collected from the occupancy sensors

    Collaborative Solutions to Visual Sensor Networks

    Get PDF
    Visual sensor networks (VSNs) merge computer vision, image processing and wireless sensor network disciplines to solve problems in multi-camera applications in large surveillance areas. Although potentially powerful, VSNs also present unique challenges that could hinder their practical deployment because of the unique camera features including the extremely higher data rate, the directional sensing characteristics, and the existence of visual occlusions. In this dissertation, we first present a collaborative approach for target localization in VSNs. Traditionally; the problem is solved by localizing targets at the intersections of the back-projected 2D cones of each target. However, the existence of visual occlusions among targets would generate many false alarms. Instead of resolving the uncertainty about target existence at the intersections, we identify and study the non-occupied areas in 2D cones and generate the so-called certainty map of targets non-existence. We also propose distributed integration of local certainty maps by following a dynamic itinerary where the entire map is progressively clarified. The accuracy of target localization is affected by the existence of faulty nodes in VSNs. Therefore, we present the design of a fault-tolerant localization algorithm that would not only accurately localize targets but also detect the faults in camera orientations, tolerate these errors and further correct them before they cascade. Based on the locations of detected targets in the fault-tolerated final certainty map, we construct a generative image model that estimates the camera orientations, detect inaccuracies and correct them. In order to ensure the required visual coverage to accurately localize targets or tolerate the faulty nodes, we need to calculate the coverage before deploying sensors. Therefore, we derive the closed-form solution for the coverage estimation based on the certainty-based detection model that takes directional sensing of cameras and existence of visual occlusions into account. The effectiveness of the proposed collaborative and fault-tolerant target localization algorithms in localization accuracy as well as fault detection and correction performance has been validated through the results obtained from both simulation and real experiments. In addition, conducted simulation shows extreme consistency with results from theoretical closed-form solution for visual coverage estimation, especially when considering the boundary effect

    Vehicle Tracking in Occlusion and Clutter

    Get PDF
    Vehicle tracking in environments containing occlusion and clutter is an active research area. The problem of tracking vehicles through such environments presents a variety of challenges. These challenges include vehicle track initialization, tracking an unknown number of targets and the variations in real-world lighting, scene conditions and camera vantage. Scene clutter and target occlusion present additional challenges. A stochastic framework is proposed which allows for vehicles tracks to be identified from a sequence of images. The work focuses on the identification of vehicle tracks present in transportation scenes, namely, vehicle movements at intersections. The framework combines background subtraction and motion history based approaches to deal with the segmentation problem. The tracking problem is solved using a Monte Carlo Markov Chain Data Association (MCMCDA) method. The method includes a novel concept of including the notion of discrete, independent regions in the MCMC scoring function. Results are presented which show that the framework is capable of tracking vehicles in scenes containing multiple vehicles that occlude one another, and that are occluded by foreground scene objects
    corecore