1,757 research outputs found

    Separation and contrast enhancement of overlapping cast shadow components using polarization

    Get PDF
    Shadow is an inseparable aspect of all natural scenes. When there are multiple light sources or multiple reflections several different shadows may overlap at the same location and create complicated patterns. Shadows are a potentially good source of information about a scene if the shadow regions can be properly identified and segmented. However, shadow region identification and segmentation is a difficult task and improperly identified shadows often interfere with machine vision tasks like object recognition and tracking. We propose here a new shadow separation and contrast enhancement method based on the polarization of light. Polarization information of the scene captured by our polarization-sensitive camera is shown to separate shadows from different light sources effectively. Such shadow separation is almost impossible to realize with conventional, polarization-insensitive imaging

    Detecting, Tracking, And Recognizing Activities In Aerial Video

    Get PDF
    In this dissertation, we address the problem of detecting humans and vehicles, tracking them in crowded scenes, and finally determining their activities in aerial video. Even though this is a well explored problem in the field of computer vision, many challenges still remain when one is presented with realistic data. These challenges include large camera motion, strong scene parallax, fast object motion, large object density, strong shadows, and insufficiently large action datasets. Therefore, we propose a number of novel methods based on exploiting scene constraints from the imagery itself to aid in the detection and tracking of objects. We show, via experiments on several datasets, that superior performance is achieved with the use of proposed constraints. First, we tackle the problem of detecting moving, as well as stationary, objects in scenes that contain parallax and shadows. We do this on both regular aerial video, as well as the new and challenging domain of wide area surveillance. This problem poses several challenges: large camera motion, strong parallax, large number of moving objects, small number of pixels on target, single channel data, and low frame-rate of video. We propose a method for detecting moving and stationary objects that overcomes these challenges, and evaluate it on CLIF and VIVID datasets. In order to find moving objects, we use median background modelling which requires few frames to obtain a workable model, and is very robust when there is a large number of moving objects in the scene while the model is being constructed. We then iii remove false detections from parallax and registration errors using gradient information from the background image. Relying merely on motion to detect objects in aerial video may not be sufficient to provide complete information about the observed scene. First of all, objects that are permanently stationary may be of interest as well, for example to determine how long a particular vehicle has been parked at a certain location. Secondly, moving vehicles that are being tracked through the scene may sometimes stop and remain stationary at traffic lights and railroad crossings. These prolonged periods of non-motion make it very difficult for the tracker to maintain the identities of the vehicles. Therefore, there is a clear need for a method that can detect stationary pedestrians and vehicles in UAV imagery. This is a challenging problem due to small number of pixels on the target, which makes it difficult to distinguish objects from background clutter, and results in a much larger search space. We propose a method for constraining the search based on a number of geometric constraints obtained from the metadata. Specifically, we obtain the orientation of the ground plane normal, the orientation of the shadows cast by out of plane objects in the scene, and the relationship between object heights and the size of their corresponding shadows. We utilize the above information in a geometry-based shadow and ground plane normal blob detector, which provides an initial estimation for the locations of shadow casting out of plane (SCOOP) objects in the scene. These SCOOP candidate locations are then classified as either human or clutter using a combination of wavelet features, and a Support Vector Machine. Additionally, we combine regular SCOOP and inverted SCOOP candidates to obtain vehicle candidates. We show impressive results on sequences from VIVID and CLIF datasets, and provide comparative quantitative and qualitative analysis. We also show that we can extend the SCOOP detection method to automatically estimate the iv orientation of the shadow in the image without relying on metadata. This is useful in cases where metadata is either unavailable or erroneous. Simply detecting objects in every frame does not provide sufficient understanding of the nature of their existence in the scene. It may be necessary to know how the objects have travelled through the scene over time and which areas they have visited. Hence, there is a need to maintain the identities of the objects across different time instances. The task of object tracking can be very challenging in videos that have low frame rate, high density, and a very large number of objects, as is the case in the WAAS data. Therefore, we propose a novel method for tracking a large number of densely moving objects in an aerial video. In order to keep the complexity of the tracking problem manageable when dealing with a large number of objects, we divide the scene into grid cells, solve the tracking problem optimally within each cell using bipartite graph matching and then link the tracks across the cells. Besides tractability, grid cells also allow us to define a set of local scene constraints, such as road orientation and object context. We use these constraints as part of cost function to solve the tracking problem; This allows us to track fast-moving objects in low frame rate videos. In addition to moving through the scene, the humans that are present may be performing individual actions that should be detected and recognized by the system. A number of different approaches exist for action recognition in both aerial and ground level video. One of the requirements for the majority of these approaches is the existence of a sizeable dataset of examples of a particular action from which a model of the action can be constructed. Such a luxury is not always possible in aerial scenarios since it may be difficult to fly a large number of missions to observe a particular event multiple times. Therefore, we propose a method for v recognizing human actions in aerial video from as few examples as possible (a single example in the extreme case). We use the bag of words action representation and a 1vsAll multi-class classification framework. We assume that most of the classes have many examples, and construct Support Vector Machine models for each class. Then, we use Support Vector Machines that were trained for classes with many examples to improve the decision function of the Support Vector Machine that was trained using few examples, via late weighted fusion of decision values

    A Comprehensive Review of Vehicle Detection Techniques Under Varying Moving Cast Shadow Conditions Using Computer Vision and Deep Learning

    Get PDF
    Design of a vision-based traffic analytic system for urban traffic video scenes has a great potential in context of Intelligent Transportation System (ITS). It offers useful traffic-related insights at much lower costs compared to their conventional sensor based counterparts. However, it remains a challenging problem till today due to the complexity factors such as camera hardware constraints, camera movement, object occlusion, object speed, object resolution, traffic flow density, and lighting conditions etc. ITS has many applications including and not just limited to queue estimation, speed detection and different anomalies detection etc. All of these applications are primarily dependent on sensing vehicle presence to form some basis for analysis. Moving cast shadows of vehicles is one of the major problems that affects the vehicle detection as it can cause detection and tracking inaccuracies. Therefore, it is exceedingly important to distinguish dynamic objects from their moving cast shadows for accurate vehicle detection and recognition. This paper provides an in-depth comparative analysis of different traffic paradigm-focused conventional and state-of-the-art shadow detection and removal algorithms. Till date, there has been only one survey which highlights the shadow removal methodologies particularly for traffic paradigm. In this paper, a total of 70 research papers containing results of urban traffic scenes have been shortlisted from the last three decades to give a comprehensive overview of the work done in this area. The study reveals that the preferable way to make a comparative evaluation is to use the existing Highway I, II, and III datasets which are frequently used for qualitative or quantitative analysis of shadow detection or removal algorithms. Furthermore, the paper not only provides cues to solve moving cast shadow problems, but also suggests that even after the advent of Convolutional Neural Networks (CNN)-based vehicle detection methods, the problems caused by moving cast shadows persists. Therefore, this paper proposes a hybrid approach which uses a combination of conventional and state-of-the-art techniques as a pre-processing step for shadow detection and removal before using CNN for vehicles detection. The results indicate a significant improvement in vehicle detection accuracies after using the proposed approach

    Effective moving cast shadow detection for monocular color traffic image sequences

    Get PDF
    For an accurate scene analysis using monocular color traffic image sequences, a robust segmentation of moving vehicles from the stationary background is generally required. However, the presence of moving cast shadow may lead to an inaccurate vehicle segmentation, and as a result, may lead to further erroneous scene analysis. We propose an effective method for the detection of moving cast shadow. By observing the characteristics of cast shadow in the luminance, chrominance, gradient density, and geometry domains, a combined probability map, called a shadow confidence score (SCS), is obtained. From the edge map of the input image, each edge pixel is examined to determine whether it belongs to the vehicle region based on its neighboring SCCs. The cast shadow is identified as those regions with high SCSs, which are outside the convex hull of the selected vehicle edge pixels. The proposed method is tested on 100 vehicle images taken under different lighting conditions (sunny and cloudy), viewing angles (roadside and overhead), vehicle sizes (small, medium, and large), and colors (similar to the road and not). The results indicate that an average error rate of around 14% is obtained while the lowest error rate is around 3% for large vehicles.published_or_final_versio

    Cast shadow segmentation using invariant colour features

    Get PDF
    Shadows are integral parts of natural scenes and one of the elements contributing to naturalness of synthetic scenes. In many image analysis and interpretation applications, shadows interfere with fundamental tasks such as object extraction and description. For this reason, shadow segmentation is an important step in image analysis. In this paper, we propose a new cast shadow segmentation algorithm for both still and moving images. The proposed technique exploits spectral and geometrical properties of shadows in a scene to perform this task. The presence of a shadow is first hypothesized with an initial and simple evidence based on the fact that shadows darken the surface which they are cast upon. The validity of detected regions as shadows is further verified by making use of more complex hypotheses on color invariance and geometric properties of shadows. Finally, an information integration stage confirms or rejects the initial hypothesis for every detected region. Simulation results show that the proposed algorithm is robust and efficient in detecting shadows for a large class of scenes

    Monitoring opencast mine restorations using Unmanned Aerial System (UAS) imagery

    Get PDF
    Altres ajuts: Joan-Cristian Padró is a recipient of the FI-DGR scholarship grant (2016B_00410). Xavier Pons is a recipient of the ICREA Academia Excellence in Research Grant (2016-2020).Open-pit mine is still an unavoidable activity but can become unsustainable without the restoration of degraded sites. Monitoring the restoration after extractive activities is a legal requirement for mine companies and public administrations in many countries, involving financial provisions for environmental liabilities. The objective of this contribution is to present a rigorous, low-cost and easy-to-use application of Unmanned Aerial Systems (UAS) for supporting opencast mining and restoration monitoring, complementing the inspections with very high (<10 cm) spatial resolution multispectral imagery, and improving any restoration documentation with detailed land cover maps. The potential of UAS as a tool to control restoration works is presented in a calcareous quarry that has undergone different post-mining restoration actions in the last 20 years, representing 4 reclaimed stages. We used a small (<2 kg) drone equipped with a multispectral sensor, along with field spectroradiometer measurements that were used to radiometrically correct the UAS sensor data. Imagery was processed with photogrammetric and Remote Sensing and Geographical Information Systems software, resulting in spectral information, vegetation and soil indices, structural information and land cover maps. Spectral data and land cover classification, which were validated through ground-truth plots, aided in the detection and quantification of mine waste dumping, bare soil and other land cover extension. Moreover, plant formations and vegetation development were evaluated, allowing a quantitative, but at the same time visual and intuitive comparison with the surrounding reference systems. The protocol resulting from this research constitutes a pipeline solution intended for the implementation by public administrations and privates companies for precisely evaluating restoration dynamics in an expedient manner at a very affordable budget. Furthermore, the proposed solution prevents subjective interpretations by providing objective data, which integrate new technologies at the service of scientists, environmental managers and decision makers

    Neural network approach to the classification of urban images

    Get PDF
    Over the past few years considerable research effort has been devoted to the study of pattern recognition methods applied to the classification of remotely sensed images. Neural network methods have been widely explored, and been shown to be generally superior to conventional statistical methods. However, the classification of objects shown on greylevel high resolution images in urban areas presents significant difficulties. This thesis presents the results of work aimed at reducing some of these difficulties. High resolution greylevel aerial images are used as the raw material, and methods of processing using neural networks are presented. If a per-pixel approach were used there would be only one input neuron, the pixel greylevel, which would not provide a sufficient basis for successful object identification. The use of spatial neighbourhoods providing an m x m input vector centred on each pixel is investigated; this method takes into account the texture of the pixel's neighbourhood. The pixel's neighbourhood could be considered to contain more that textural information. Second order methods using mean greylevel, Laplacian and variance values derived from the pixel neighbourhood are developed to provide the neural network with a three neuron input vector. This method provides the neural network with additional information, improving the strength of the relationship between the input and output neurons, and therefore reducing the training time and improving the classification accuracy. A third method using a hierarchical set of two or more neural networks is proposed as a method of identifying the high level objects in the images. The methods were applied to representative data sets and the results were compared with manually classified images to quantify the results. Classification accuracy varied from 69% with a window of raw pixel values and 84% with a three neuron input vector of second order values

    Multi-sensor human action recognition with particular application to tennis event-based indexing

    Get PDF
    The ability to automatically classify human actions and activities using vi- sual sensors or by analysing body worn sensor data has been an active re- search area for many years. Only recently with advancements in both fields and the ubiquitous nature of low cost sensors in our everyday lives has auto- matic human action recognition become a reality. While traditional sports coaching systems rely on manual indexing of events from a single modality, such as visual or inertial sensors, this thesis investigates the possibility of cap- turing and automatically indexing events from multimodal sensor streams. In this work, we detail a novel approach to infer human actions by fusing multimodal sensors to improve recognition accuracy. State of the art visual action recognition approaches are also investigated. Firstly we apply these action recognition detectors to basic human actions in a non-sporting con- text. We then perform action recognition to infer tennis events in a tennis court instrumented with cameras and inertial sensing infrastructure. The system proposed in this thesis can use either visual or inertial sensors to au- tomatically recognise the main tennis events during play. A complete event retrieval system is also presented to allow coaches to build advanced queries, which existing sports coaching solutions cannot facilitate, without an inordi- nate amount of manual indexing. The event retrieval interface is evaluated against a leading commercial sports coaching tool in terms of both usability and efficiency
    corecore