5,024 research outputs found

    Online real-time crowd behavior detection in video sequences

    Get PDF
    Automatically detecting events in crowded scenes is a challenging task in Computer Vision. A number of offline approaches have been proposed for solving the problem of crowd behavior detection, however the offline assumption limits their application in real-world video surveillance systems. In this paper, we propose an online and real-time method for detecting events in crowded video sequences. The proposed approach is based on the combination of visual feature extraction and image segmentation and it works without the need of a training phase. A quantitative experimental evaluation has been carried out on multiple publicly available video sequences, containing data from various crowd scenarios and different types of events, to demonstrate the effectiveness of the approach

    Visual Crowd Analysis: Open Research Problems

    Full text link
    Over the last decade, there has been a remarkable surge in interest in automated crowd monitoring within the computer vision community. Modern deep-learning approaches have made it possible to develop fully-automated vision-based crowd-monitoring applications. However, despite the magnitude of the issue at hand, the significant technological advancements, and the consistent interest of the research community, there are still numerous challenges that need to be overcome. In this article, we delve into six major areas of visual crowd analysis, emphasizing the key developments in each of these areas. We outline the crucial unresolved issues that must be tackled in future works, in order to ensure that the field of automated crowd monitoring continues to progress and thrive. Several surveys related to this topic have been conducted in the past. Nonetheless, this article thoroughly examines and presents a more intuitive categorization of works, while also depicting the latest breakthroughs within the field, incorporating more recent studies carried out within the last few years in a concise manner. By carefully choosing prominent works with significant contributions in terms of novelty or performance gains, this paper presents a more comprehensive exposition of advancements in the current state-of-the-art.Comment: Accepted in AI Magazine published by Wiley Periodicals LLC on behalf of the Association for the Advancement of Artificial Intelligenc

    Enhanced tracking and recognition of moving objects by reasoning about spatio-temporal continuity.

    Get PDF
    A framework for the logical and statistical analysis and annotation of dynamic scenes containing occlusion and other uncertainties is presented. This framework consists of three elements; an object tracker module, an object recognition/classification module and a logical consistency, ambiguity and error reasoning engine. The principle behind the object tracker and object recognition modules is to reduce error by increasing ambiguity (by merging objects in close proximity and presenting multiple hypotheses). The reasoning engine deals with error, ambiguity and occlusion in a unified framework to produce a hypothesis that satisfies fundamental constraints on the spatio-temporal continuity of objects. Our algorithm finds a globally consistent model of an extended video sequence that is maximally supported by a voting function based on the output of a statistical classifier. The system results in an annotation that is significantly more accurate than what would be obtained by frame-by-frame evaluation of the classifier output. The framework has been implemented and applied successfully to the analysis of team sports with a single camera. Key words: Visua

    Monocular Camera Viewpoint-Invariant Vehicular Traffic Segmentation and Classification Utilizing Small Datasets

    Get PDF
    The work presented here develops a computer vision framework that is view angle independent for vehicle segmentation and classification from roadway traffic systems installed by the Virginia Department of Transportation (VDOT). An automated technique for extracting a region of interest is discussed to speed up the processing. The VDOT traffic videos are analyzed for vehicle segmentation using an improved robust low-rank matrix decomposition technique. It presents a new and effective thresholding method that improves segmentation accuracy and simultaneously speeds up the segmentation processing. Size and shape physical descriptors from morphological properties and textural features from the Histogram of Oriented Gradients (HOG) are extracted from the segmented traffic. Furthermore, a multi-class support vector machine classifier is employed to categorize different traffic vehicle types, including passenger cars, passenger trucks, motorcycles, buses, and small and large utility trucks. It handles multiple vehicle detections through an iterative k-means clustering over-segmentation process. The proposed algorithm reduced the processed data by an average of 40%. Compared to recent techniques, it showed an average improvement of 15% in segmentation accuracy, and it is 55% faster than the compared segmentation techniques on average. Moreover, a comparative analysis of 23 different deep learning architectures is presented. The resulting algorithm outperformed the compared deep learning algorithms for the quality of vehicle classification accuracy. Furthermore, the timing analysis showed that it could operate in real-time scenarios

    Recognizing Objects And Reasoning About Their Interactions

    Get PDF
    The task of scene understanding involves recognizing the different objects present in the scene, segmenting the scene into meaningful regions, as well as obtaining a holistic understanding of the activities taking place in the scene. Each of these problems has received considerable interest within the computer vision community. We present contributions to two aspects of visual scene understanding. First we explore multiple methods of feature selection for the problem of object detection. We demonstrate the use of Principal Component Analysis to detect avifauna in field observation videos. We improve on existing approaches by making robust decisions based on regional features and by a feature selection strategy that chooses different features in different parts of the image. We then demonstrate the use of Partial Least Squares to detect vehicles in aerial and satellite imagery. We propose two new feature sets; Color Probability Maps are used to capture the color statistics of vehicles and their surroundings, and Pairs of Pixels are used to capture captures the structural characteristics of objects. A powerful feature selection analysis based on Partial Least Squares is employed to deal with the resulting high dimensional feature space (almost 70,000 dimensions). We also propose an Incremental Multiple Kernel Learning (IMKL) scheme to detect vehicles in a traffic surveillance scenario. Obtaining task and scene specific datasets of visual categories is far more tedious than obtaining a generic dataset of the same classes. Our IMKL approach initializes on a generic training database and then tunes itself to the classification task at hand. Second, we develop a video understanding system for scene elements, such as bus stops, crosswalks, and intersections, that are characterized more by qualitative activities and geometry than by intrinsic appearance. The domain models for scene elements are not learned from a corpus of video, but instead, naturally elicited by humans, and represented as probabilistic logic rules within a Markov Logic Network framework. Human elicited models, however, represent object interactions as they occur in the 3D world rather than describing their appearance projection in some specific 2D image plane. We bridge this gap by recovering qualitative scene geometry to analyze object interactions in the 3D world and then reasoning about scene geometry, occlusions and common sense domain knowledge using a set of meta-rules

    A Comprehensive Review of Vehicle Detection Techniques Under Varying Moving Cast Shadow Conditions Using Computer Vision and Deep Learning

    Get PDF
    Design of a vision-based traffic analytic system for urban traffic video scenes has a great potential in context of Intelligent Transportation System (ITS). It offers useful traffic-related insights at much lower costs compared to their conventional sensor based counterparts. However, it remains a challenging problem till today due to the complexity factors such as camera hardware constraints, camera movement, object occlusion, object speed, object resolution, traffic flow density, and lighting conditions etc. ITS has many applications including and not just limited to queue estimation, speed detection and different anomalies detection etc. All of these applications are primarily dependent on sensing vehicle presence to form some basis for analysis. Moving cast shadows of vehicles is one of the major problems that affects the vehicle detection as it can cause detection and tracking inaccuracies. Therefore, it is exceedingly important to distinguish dynamic objects from their moving cast shadows for accurate vehicle detection and recognition. This paper provides an in-depth comparative analysis of different traffic paradigm-focused conventional and state-of-the-art shadow detection and removal algorithms. Till date, there has been only one survey which highlights the shadow removal methodologies particularly for traffic paradigm. In this paper, a total of 70 research papers containing results of urban traffic scenes have been shortlisted from the last three decades to give a comprehensive overview of the work done in this area. The study reveals that the preferable way to make a comparative evaluation is to use the existing Highway I, II, and III datasets which are frequently used for qualitative or quantitative analysis of shadow detection or removal algorithms. Furthermore, the paper not only provides cues to solve moving cast shadow problems, but also suggests that even after the advent of Convolutional Neural Networks (CNN)-based vehicle detection methods, the problems caused by moving cast shadows persists. Therefore, this paper proposes a hybrid approach which uses a combination of conventional and state-of-the-art techniques as a pre-processing step for shadow detection and removal before using CNN for vehicles detection. The results indicate a significant improvement in vehicle detection accuracies after using the proposed approach

    Scene and crowd analysis using synthetic data generation with 3D quality improvements and deep network architectures

    Get PDF
    In this thesis, a scene analysis mainly focusing on vision-based techniques have been explored. The vision-based scene analysis techniques have a wide range of applications from surveillance, security to agriculture. A vision sensor can provide rich information about the environment such as colour, depth, shape, size and much more. This information can be further processed to have an in-depth knowledge of the scene such as type of environment, objects and distances. Hence, this thesis covers initially the background on human detection in particular pedestrian and crowd detection methods and introduces various vision-based techniques used in human detection. Followed by a detailed analysis of the use of synthetic data to improve the performance of state-of-the-art Deep Learning techniques and a multi-purpose synthetic data generation tool is proposed. The tool is a real-time graphics simulator which generates multiple types of synthetic data applicable for pedestrian detection, crowd density estimation, image segmentation, depth estimation, and 3D pose estimation. In the second part of the thesis, a novel technique has been proposed to improve the quality of the synthetic data. The inter-reflection also known as global illumination is a naturally occurring phenomena and is a major problem for 3D scene generation from an image. Thus, the proposed methods utilised a reverted ray-tracing technique to reduce the effect of inter-reflection problem and increased the quality of generated data. In addition, a method to improve the quality of the density map is discussed in the following chapter. The density map is the most commonly used technique to estimate crowds. However, the current procedure used to generate the map is not content-aware i.e., density map does not highlight the humans’ heads according to their size in the image. Thus, a novel method to generate a content-aware density map was proposed and demonstrated that the use of such maps can elevate the performance of an existing Deep Learning architecture. In the final part, a Deep Learning architecture has been proposed to estimate the crowd in the wild. The architecture tackled the challenging aspect such as perspective distortion by implementing several techniques like pyramid style inputs, scale aggregation method and self-attention mechanism to estimate a crowd density map and achieved state-of-the-art results at the time
    • …
    corecore