915 research outputs found
Weakly Supervised Object Localization with Multi-fold Multiple Instance Learning
Object category localization is a challenging problem in computer vision.
Standard supervised training requires bounding box annotations of object
instances. This time-consuming annotation process is sidestepped in weakly
supervised learning. In this case, the supervised information is restricted to
binary labels that indicate the absence/presence of object instances in the
image, without their locations. We follow a multiple-instance learning approach
that iteratively trains the detector and infers the object locations in the
positive training images. Our main contribution is a multi-fold multiple
instance learning procedure, which prevents training from prematurely locking
onto erroneous object locations. This procedure is particularly important when
using high-dimensional representations, such as Fisher vectors and
convolutional neural network features. We also propose a window refinement
method, which improves the localization accuracy by incorporating an objectness
prior. We present a detailed experimental evaluation using the PASCAL VOC 2007
dataset, which verifies the effectiveness of our approach.Comment: To appear in IEEE Transactions on Pattern Analysis and Machine
Intelligence (TPAMI
Improvements to MLE Algorithm for Localizing Radiation Sources with a Distributed Detector Network
Maximum Likelihood Estimation (MLE) is a widely used method for the localization of radiation sources using distributed detector networks. While robust, MLE is computationally intensive, requiring an exhaustive search over parameter space. To mitigate the computational load of MLE, many techniques have been presented, including iterative and multi-resolution methods.
In this work, we present two ways to improve the MLE localization of radiation sources. First, we present a method to mitigate the pitfalls of a standard multi-resolution algorithm. Our method expands the search region of each layer before performing the MLE search. Doing so allows the multi-resolution algorithm to correct an incorrect selection made in a prior layer. We test our proposed method against single-resolution MLE and standard multi-resolution MLE algorithms, and find that the use of grid expansion incurs a general decrease in localization error and a negligible increase in computation time over the standard multi-resolution algorithm.
Second, we present a method to perform the MLE localization without prior knowledge of the background radiation intensity. We estimate the source and background intensities using linear regression (LR) and then use these estimates to initialize the intensity parameter search for MLE. We test this method using single-resolution, multi-resolution, and multi-resolution with grid expansion MLE algorithms and compare performance to MLE algorithms that don\u27t use the LR initialization method. We found that using the LR estimates to initialize the intensity parameter search caused a marginal increase in both localization error and computation time for the tested algorithms. The technique is only beneficial in the case of an unknown background intensity
Object Localization, Segmentation, and Classification in 3D Images
We address the problem of identifying objects of interest in 3D images as a set of related tasks involving localization of objects within a scene, segmentation of observed object instances from other scene elements, classifying detected objects into semantic categories, and estimating the 3D pose of detected objects within the scene. The increasing availability of 3D sensors motivates us to leverage large amounts of 3D data to train machine learning models to address these tasks in 3D images. Leveraging recent advances in deep learning has allowed us to develop models capable of addressing these tasks and optimizing these tasks jointly to reduce potential errors propagated when solving these tasks independently
Methods for efficient object categorization, detection, scene recognition, and image search
In the past few years there has been a tremendous growth in the usage of digital images. Users can now access millions of photos, a fact that poses the need of having methods that can efficiently and effectively search the visual information of interest. In this thesis, we propose methods to learn image representations to compactly represent a large collection of images, enabling accurate image recognition with linear classification models which offer the advantage of being efficient to both train and test. The entries of our descriptors are the output of a set of basis classifiers evaluated on the image, which capture the presence or absence of a set of high-level visual concepts. We propose two different techniques to automatically discover the visual concepts and learn the basis classifiers from a given labeled dataset of pictures, producing descriptors that highly-discriminate the original categories of the dataset. We empirically show that these descriptors are able to encode new unseen pictures, and produce state-of-the-art results in conjunct with cheap linear classifiers. We describe several strategies to aggregate the outputs of basis classifiers evaluated on multiple subwindows of the image in order to handle cases when the photo contains multiple objects and large amounts of clutter. We extend this framework for the task of object detection, where the goal is to spatially localize an object within an image. We use the output of a collection of detectors trained in an offline stage as features for new detection problems, showing competitive results with the current state of the art. Since generating rich manual annotations for an image dataset is a crucial limit of modern methods in object localization and detection, in this thesis we also propose a method to automatically generate training data for an object detector in a weakly-supervised fashion, yielding considerable savings in human annotation efforts. We show that our automatically-generated regions can be used to train object detectors with recognition results remarkably close to those obtained by training on manually annotated bounding boxes
EGO-TOPO: Environment Affordances from Egocentric Video
First-person video naturally brings the use of a physical environment to the
forefront, since it shows the camera wearer interacting fluidly in a space
based on his intentions. However, current methods largely separate the observed
actions from the persistent space itself. We introduce a model for environment
affordances that is learned directly from egocentric video. The main idea is to
gain a human-centric model of a physical space (such as a kitchen) that
captures (1) the primary spatial zones of interaction and (2) the likely
activities they support. Our approach decomposes a space into a topological map
derived from first-person activity, organizing an ego-video into a series of
visits to the different zones. Further, we show how to link zones across
multiple related environments (e.g., from videos of multiple kitchens) to
obtain a consolidated representation of environment functionality. On
EPIC-Kitchens and EGTEA+, we demonstrate our approach for learning scene
affordances and anticipating future actions in long-form video.Comment: Published in CVPR 2020, project page:
http://vision.cs.utexas.edu/projects/ego-topo
GNSS Shadow Matching: The Challenges Ahead
GNSS shadow matching is a new technique that uses 3D mapping to improve positioning accuracy in dense urban areas from tens of meters to within five meters, potentially less. This paper presents the first comprehensive review of shadow matching’s error sources and proposes a program of research and development to take the technology from proof of concept to a robust, reliable and accurate urban positioning product. A summary of the state of the art is also included. Error sources in shadow matching may be divided into six categories: initialization, modelling, propagation, environmental complexity, observation, and algorithm approximations. Performance is also affected by the environmental geometry and it is sometimes necessary to handle solution ambiguity. For each error source, the cause and how it impacts the position solution is explained. Examples are presented, where available, and improvements to the shadow-matching algorithms to mitigate each error are proposed. Methods of accommodating quality control within shadow matching are then proposed, including uncertainty determination, ambiguity detection, and outlier detection. This is followed by a discussion of how shadow matching could be integrated with conventional ranging-based GNSS and other navigation and positioning technologies. This includes a brief review of methods to enhance ranging-based GNSS using 3D mapping. Finally, the practical engineering challenges of shadow matching are assessed, including the system architecture, efficient GNSS signal prediction and the acquisition of 3D mapping data
- …