562 research outputs found

    Unsupervised Deep Single-Image Intrinsic Decomposition using Illumination-Varying Image Sequences

    Full text link
    Machine learning based Single Image Intrinsic Decomposition (SIID) methods decompose a captured scene into its albedo and shading images by using the knowledge of a large set of known and realistic ground truth decompositions. Collecting and annotating such a dataset is an approach that cannot scale to sufficient variety and realism. We free ourselves from this limitation by training on unannotated images. Our method leverages the observation that two images of the same scene but with different lighting provide useful information on their intrinsic properties: by definition, albedo is invariant to lighting conditions, and cross-combining the estimated albedo of a first image with the estimated shading of a second one should lead back to the second one's input image. We transcribe this relationship into a siamese training scheme for a deep convolutional neural network that decomposes a single image into albedo and shading. The siamese setting allows us to introduce a new loss function including such cross-combinations, and to train solely on (time-lapse) images, discarding the need for any ground truth annotations. As a result, our method has the good properties of i) taking advantage of the time-varying information of image sequences in the (pre-computed) training step, ii) not requiring ground truth data to train on, and iii) being able to decompose single images of unseen scenes at runtime. To demonstrate and evaluate our work, we additionally propose a new rendered dataset containing illumination-varying scenes and a set of quantitative metrics to evaluate SIID algorithms. Despite its unsupervised nature, our results compete with state of the art methods, including supervised and non data-driven methods.Comment: To appear in Pacific Graphics 201

    Gaussian mixture model classifiers for detection and tracking in UAV video streams.

    Get PDF
    Masters Degree. University of KwaZulu-Natal, Durban.Manual visual surveillance systems are subject to a high degree of human-error and operator fatigue. The automation of such systems often employs detectors, trackers and classifiers as fundamental building blocks. Detection, tracking and classification are especially useful and challenging in Unmanned Aerial Vehicle (UAV) based surveillance systems. Previous solutions have addressed challenges via complex classification methods. This dissertation proposes less complex Gaussian Mixture Model (GMM) based classifiers that can simplify the process; where data is represented as a reduced set of model parameters, and classification is performed in the low dimensionality parameter-space. The specification and adoption of GMM based classifiers on the UAV visual tracking feature space formed the principal contribution of the work. This methodology can be generalised to other feature spaces. This dissertation presents two main contributions in the form of submissions to ISI accredited journals. In the first paper, objectives are demonstrated with a vehicle detector incorporating a two stage GMM classifier, applied to a single feature space, namely Histogram of Oriented Gradients (HoG). While the second paper demonstrates objectives with a vehicle tracker using colour histograms (in RGB and HSV), with Gaussian Mixture Model (GMM) classifiers and a Kalman filter. The proposed works are comparable to related works with testing performed on benchmark datasets. In the tracking domain for such platforms, tracking alone is insufficient. Adaptive detection and classification can assist in search space reduction, building of knowledge priors and improved target representations. Results show that the proposed approach improves performance and robustness. Findings also indicate potential further enhancements such as a multi-mode tracker with global and local tracking based on a combination of both papers

    Cortical Dynamics of Navigation and Steering in Natural Scenes: Motion-Based Object Segmentation, Heading, and Obstacle Avoidance

    Full text link
    Visually guided navigation through a cluttered natural scene is a challenging problem that animals and humans accomplish with ease. The ViSTARS neural model proposes how primates use motion information to segment objects and determine heading for purposes of goal approach and obstacle avoidance in response to video inputs from real and virtual environments. The model produces trajectories similar to those of human navigators. It does so by predicting how computationally complementary processes in cortical areas MT-/MSTv and MT+/MSTd compute object motion for tracking and self-motion for navigation, respectively. The model retina responds to transients in the input stream. Model V1 generates a local speed and direction estimate. This local motion estimate is ambiguous due to the neural aperture problem. Model MT+ interacts with MSTd via an attentive feedback loop to compute accurate heading estimates in MSTd that quantitatively simulate properties of human heading estimation data. Model MT interacts with MSTv via an attentive feedback loop to compute accurate estimates of speed, direction and position of moving objects. This object information is combined with heading information to produce steering decisions wherein goals behave like attractors and obstacles behave like repellers. These steering decisions lead to navigational trajectories that closely match human performance.National Science Foundation (SBE-0354378, BCS-0235398); Office of Naval Research (N00014-01-1-0624); National Geospatial Intelligence Agency (NMA201-01-1-2016

    Shadow removal utilizing multiplicative fusion of texture and colour features for surveillance image

    Get PDF
    Automated surveillance systems often identify shadows as parts of a moving object which jeopardized subsequent image processing tasks such as object identification and tracking. In this thesis, an improved shadow elimination method for an indoor surveillance system is presented. This developed method is a fusion of several image processing methods. Firstly, the image is segmented using the Statistical Region Merging algorithm to obtain the segmented potential shadow regions. Next, multiple shadow identification features which include Normalized Cross-Correlation, Local Color Constancy and Hue-Saturation-Value shadow cues are applied on the images to generate feature maps. These feature maps are used for identifying and removing cast shadows according to the segmented regions. The video dataset used is the Autonomous Agents for On-Scene Networked Incident Management which covers both indoor and outdoor video scenes. The benchmarking result indicates that the developed method is on-par with several normally used shadow detection methods. The developed method yields a mean score of 85.17% for the video sequence in which the strongest shadow is present and a mean score of 89.93% for the video having the most complex textured background. This research contributes to the development and improvement of a functioning shadow eliminator method that is able to cope with image noise and various illumination changes

    Neuromorphic perception for greenhouse technology using event-based sensors

    Get PDF
    Event-Based Cameras (EBCs), unlike conventional cameras, feature independent pixels that asynchronously generate outputs upon detecting changes in their field of view. Short calculations are performed on each event to mimic the brain. The output is a sparse sequence of events with high temporal precision. Conventional computer vision algorithms do not leverage these properties. Thus a new paradigm has been devised. While event cameras are very efficient in representing sparse sequences of events with high temporal precision, many approaches are challenged in applications where a large amount of spatially-temporally rich information must be processed in real-time. In reality, most tasks in everyday life take place in complex and uncontrollable environments, which require sophisticated models and intelligent reasoning. Typical hard problems in real-world scenes are detecting various non-uniform objects or navigation in an unknown and complex environment. In addition, colour perception is an essential fundamental property in distinguishing objects in natural scenes. Colour is a new aspect of event-based sensors, which work fundamentally differently from standard cameras, measuring per-pixel brightness changes per colour filter asynchronously rather than measuring ā€œabsoluteā€ brightness at a constant rate. This thesis explores neuromorphic event-based processing methods for high-noise and cluttered environments with imbalanced classes. A fully event-driven processing pipeline was developed for agricultural applications to perform fruits detection and classification to unlock the outstanding properties of event cameras. The nature of features in such data was explored, and methods to represent and detect features were demonstrated. A framework for detecting and classifying features was developed and evaluated on the N-MNIST and Dynamic Vision Sensor (DVS) gesture datasets. The same network was evaluated on laboratory recorded and real-world data with various internal variations for fruits detection such as overlap, variation in size and appearance. In addition, a method to handle highly imbalanced data was developed. We examined the characteristics of spatio-temporal patterns for each colour filter to help expand our understanding of this novel data and explored their applications in classification tasks where colours were more relevant features than shapes and appearances. The results presented in this thesis demonstrate the potential and efficacy of event- based systems by demonstrating the applicability of colour event data and the viability of event-driven classification

    Advances in video motion analysis research for mature and emerging application areas

    Get PDF

    Feature-based image patch classiļ¬cation for moving shadow detection

    Get PDF
    Moving object detection is a ļ¬rst step towards many computer vision applications, such as human interaction and tracking, video surveillance, and traļ¬ƒc monitoring systems. Accurate estimation of the target objectā€™s size and shape is often required before higher-level tasks (e.g., object tracking or recog nition) can be performed. However, these properties can be derived only when the foreground object is detected precisely. Background subtraction is a common technique to extract foreground objects from image sequences. The purpose of background subtraction is to detect changes in pixel values within a given frame. The main problem with background subtraction and other related object detection techniques is that cast shadows tend to be misclassiļ¬ed as either parts of the foreground objects (if objects and their cast shadows are bonded together) or independent foreground objects (if objects and shadows are separated). The reason for this phenomenon is the presence of similar characteristics between the target object and its cast shadow, i.e., shadows have similar motion, attitude, and intensity changes as the moving objects that cast them. Detecting shadows of moving objects is challenging because of problem atic situations related to shadows, for example, chromatic shadows, shadow color blending, foreground-background camouļ¬‚age, nontextured surfaces and dark surfaces. Various methods for shadow detection have been proposed in the liter ature to address these problems. Many of these methods use general-purpose image feature descriptors to detect shadows. These feature descriptors may be eļ¬€ective in distinguishing shadow points from the foreground object in a speciļ¬c problematic situation; however, such methods often fail to distinguish shadow points from the foreground object in other situations. In addition, many of these moving shadow detection methods require prior knowledge of the scene condi tions and/or impose strong assumptions, which make them excessively restrictive in practice. The aim of this research is to develop an eļ¬ƒcient method capable of addressing possible environmental problems associated with shadow detection while simultaneously improving the overall accuracy and detection stability. In this research study, possible problematic situations for dynamic shad ows are addressed and discussed in detail. On the basis of the analysis, a ro bust method, including change detection and shadow detection, is proposed to address these environmental problems. A new set of two local feature descrip tors, namely, binary patterns of local color constancy (BPLCC) and light-based gradient orientation (LGO), is introduced to address the identiļ¬ed problematic situations by incorporating intensity, color, texture, and gradient information. The feature vectors are concatenated in a column-by-column manner to con struct one dictionary for the objects and another dictionary for the shadows. A new sparse representation framework is then applied to ļ¬nd the nearest neighbor of the test image segment by computing a weighted linear combination of the reference dictionary. Image segment classiļ¬cation is then performed based on the similarity between the test image and the sparse representations of the two classes. The performance of the proposed framework on common shadow detec tion datasets is evaluated, and the method shows improved performance com pared with state-of-the-art methods in terms of the shadow detection rate, dis crimination rate, accuracy, and stability. By achieving these signiļ¬cant improve ments, the proposed method demonstrates its ability to handle various problems associated with image processing and accomplishes the aim of this thesis
    • ā€¦