1,159 research outputs found

    Grounding deep models of visual data

    Get PDF
    Deep models are state-of-the-art for many computer vision tasks including object classification, action recognition, and captioning. As Artificial Intelligence systems that utilize deep models are becoming ubiquitous, it is also becoming crucial to explain why they make certain decisions: Grounding model decisions. In this thesis, we study: 1) Improving Model Classification. We show that by utilizing web action images along with videos in training for action recognition, significant performance boosts of convolutional models can be achieved. Without explicit grounding, labeled web action images tend to contain discriminative action poses, which highlight discriminative portions of a video’s temporal progression. 2) Spatial Grounding. We visualize spatial evidence of deep model predictions using a discriminative top-down attention mechanism, called Excitation Backprop. We show how such visualizations are equally informative for correct and incorrect model predictions, and highlight the shift of focus when different training strategies are adopted. 3) Spatial Grounding for Improving Model Classification at Training Time. We propose a guided dropout regularizer for deep networks based on the evidence of a network prediction. This approach penalizes neurons that are most relevant for model prediction. By dropping such high-saliency neurons, the network is forced to learn alternative paths in order to maintain loss minimization. We demonstrate better generalization ability, an increased utilization of network neurons, and a higher resilience to network compression. 4) Spatial Grounding for Improving Model Classification at Test Time. We propose Guided Zoom, an approach that utilizes spatial grounding to make more informed predictions at test time. Guided Zoom compares the evidence used to make a preliminary decision with the evidence of correctly classified training examples to ensure evidenceprediction consistency, otherwise refines the prediction. We demonstrate accuracy gains for fine-grained classification. 5) Spatiotemporal Grounding. We devise a formulation that simultaneously grounds evidence in space and time, in a single pass, using top-down saliency. We visualize the spatiotemporal cues that contribute to a deep recurrent neural network’s classification/captioning output. Based on these spatiotemporal cues, we are able to localize segments within a video that correspond with a specific action, or phrase from a caption, without explicitly optimizing/training for these tasks

    Advances in Object and Activity Detection in Remote Sensing Imagery

    Get PDF
    The recent revolution in deep learning has enabled considerable development in the fields of object and activity detection. Visual object detection tries to find objects of target classes with precise localisation in an image and assign each object instance a corresponding class label. At the same time, activity recognition aims to determine the actions or activities of an agent or group of agents based on sensor or video observation data. It is a very important and challenging problem to detect, identify, track, and understand the behaviour of objects through images and videos taken by various cameras. Together, objects and their activity recognition in imaging data captured by remote sensing platforms is a highly dynamic and challenging research topic. During the last decade, there has been significant growth in the number of publications in the field of object and activity recognition. In particular, many researchers have proposed application domains to identify objects and their specific behaviours from air and spaceborne imagery. This Special Issue includes papers that explore novel and challenging topics for object and activity detection in remote sensing images and videos acquired by diverse platforms

    Multispectral persistent surveillance

    Get PDF
    The goal of a successful surveillance system to achieve persistence is to track everything that moves, all of the time, over the entire area of interest. The thrust of this thesis is to identify and improve upon the motion detection and object association aspect of this challenge by adding spectral information to the equation. Traditional motion detection and tracking systems rely primarily on single-band grayscale video, while more current research has focused on sensor fusion, specifically combining visible and IR data sources. A further challenge in covering an entire area of responsibility (AOR) is a limited sensor field of view, which can be overcome by either adding more sensors or multi-tasking a single sensor over multiple areas at a reduced frame rate. As an essential tool for sensor design and mission development, a trade study was conducted to measure the potential advantages of adding spectral bands of information in a single sensor with the intention of reducing sensor frame rates. Thus, traditional motion detection and object association algorithms were modified to evaluate system performance using five spectral bands (visible through thermal IR), while adjusting frame rate as a second variable. The goal of this research was to produce an evaluation of system performance as a function of the number of bands and frame rate. As such, performance surfaces were generated to assess relative performance as a function of the number of bands and frame rate

    Automatically Detecting Changes and Anomalies in Unmanned Aerial Vehicle Images

    Get PDF
    The use of unmanned aerial vehicles (UAVs) in civil aviation is growing up quickly, enabling new scenarios, especially in environmental monitoring and public surveillance services. So far, Earth observation has been carried out only through satellite images, which are limited in resolution and suffer from important barriers such as cloud occlusion. Microdrone solutions, providing video streaming capabilities, are already available on the marketplace, but they are limited to altitudes of a few hundred feet. In contrast, UAVs equipped with high quality cameras can fly at altitudes of a few thousand feet and can fill the gap between satellite observations and ground sensors. Therefore, new needs for data processing arise, spanning from computer vision algorithms to sensor and mission management. This paper presents a solution for automatically detecting changes in images acquired at different times by patrolling UAVs flying over the same targets (but not necessarily along the same path or at the same altitude). Change detection in multi-temporal images is a prerequisite for land cover inspection, which, in turn, sets up the basis for detecting potentially dangerous or threatening situations

    Recent Advances in mmWave-Radar-Based Sensing, Its Applications, and Machine Learning Techniques: A Review

    Get PDF
    Human gesture detection, obstacle detection, collision avoidance, parking aids, automotive driving, medical, meteorological, industrial, agriculture, defense, space, and other relevant fields have all benefited from recent advancements in mmWave radar sensor technology. A mmWave radar has several advantages that set it apart from other types of sensors. A mmWave radar can operate in bright, dazzling, or no-light conditions. A mmWave radar has better antenna miniaturization than other traditional radars, and it has better range resolution. However, as more data sets have been made available, there has been a significant increase in the potential for incorporating radar data into different machine learning methods for various applications. This review focuses on key performance metrics in mmWave-radar-based sensing, detailed applications, and machine learning techniques used with mmWave radar for a variety of tasks. This article starts out with a discussion of the various working bands of mmWave radars, then moves on to various types of mmWave radars and their key specifications, mmWave radar data interpretation, vast applications in various domains, and, in the end, a discussion of machine learning algorithms applied with radar data for various applications. Our review serves as a practical reference for beginners developing mmWave-radar-based applications by utilizing machine learning techniques.publishedVersio

    Choosing Your Poison: Optimizing Simulator Visual System Selection as a Function of Operational Tasks

    Get PDF
    Although current technology simulator visual systems can achieve extremely realistic levels they do not completely replicate the experience of a pilot sitting in the cockpit, looking at the outside world. Some differences in experience are due to visual artifacts, or perceptual features that would not be present in a naturally viewed scene. Others are due to features that are missing from the simulated scene. In this paper, these differences will be defined and discussed. The significance of these differences will be examined as a function of several particular operational tasks. A framework to facilitate the choice of visual system characteristics based on operational task requirements will be proposed

    Aerospace medicine and biology: A continuing bibliography with indexes (supplement 356)

    Get PDF
    This bibliography lists 192 reports, articles and other documents introduced into the NASA Scientific and Technical Information System during November 1991. Subject coverage includes: aerospace medicine and psychology, life support systems and controlled environments, safety equipment, exobiology and extraterrestrial life, and flight crew behavior and performance
    • …
    corecore