915 research outputs found

    Weakly Supervised Object Localization with Multi-fold Multiple Instance Learning

    Get PDF
    Object category localization is a challenging problem in computer vision. Standard supervised training requires bounding box annotations of object instances. This time-consuming annotation process is sidestepped in weakly supervised learning. In this case, the supervised information is restricted to binary labels that indicate the absence/presence of object instances in the image, without their locations. We follow a multiple-instance learning approach that iteratively trains the detector and infers the object locations in the positive training images. Our main contribution is a multi-fold multiple instance learning procedure, which prevents training from prematurely locking onto erroneous object locations. This procedure is particularly important when using high-dimensional representations, such as Fisher vectors and convolutional neural network features. We also propose a window refinement method, which improves the localization accuracy by incorporating an objectness prior. We present a detailed experimental evaluation using the PASCAL VOC 2007 dataset, which verifies the effectiveness of our approach.Comment: To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI

    Improvements to MLE Algorithm for Localizing Radiation Sources with a Distributed Detector Network

    Get PDF
    Maximum Likelihood Estimation (MLE) is a widely used method for the localization of radiation sources using distributed detector networks. While robust, MLE is computationally intensive, requiring an exhaustive search over parameter space. To mitigate the computational load of MLE, many techniques have been presented, including iterative and multi-resolution methods. In this work, we present two ways to improve the MLE localization of radiation sources. First, we present a method to mitigate the pitfalls of a standard multi-resolution algorithm. Our method expands the search region of each layer before performing the MLE search. Doing so allows the multi-resolution algorithm to correct an incorrect selection made in a prior layer. We test our proposed method against single-resolution MLE and standard multi-resolution MLE algorithms, and find that the use of grid expansion incurs a general decrease in localization error and a negligible increase in computation time over the standard multi-resolution algorithm. Second, we present a method to perform the MLE localization without prior knowledge of the background radiation intensity. We estimate the source and background intensities using linear regression (LR) and then use these estimates to initialize the intensity parameter search for MLE. We test this method using single-resolution, multi-resolution, and multi-resolution with grid expansion MLE algorithms and compare performance to MLE algorithms that don\u27t use the LR initialization method. We found that using the LR estimates to initialize the intensity parameter search caused a marginal increase in both localization error and computation time for the tested algorithms. The technique is only beneficial in the case of an unknown background intensity

    Object Localization, Segmentation, and Classification in 3D Images

    Full text link
    We address the problem of identifying objects of interest in 3D images as a set of related tasks involving localization of objects within a scene, segmentation of observed object instances from other scene elements, classifying detected objects into semantic categories, and estimating the 3D pose of detected objects within the scene. The increasing availability of 3D sensors motivates us to leverage large amounts of 3D data to train machine learning models to address these tasks in 3D images. Leveraging recent advances in deep learning has allowed us to develop models capable of addressing these tasks and optimizing these tasks jointly to reduce potential errors propagated when solving these tasks independently

    Methods for efficient object categorization, detection, scene recognition, and image search

    Get PDF
    In the past few years there has been a tremendous growth in the usage of digital images. Users can now access millions of photos, a fact that poses the need of having methods that can efficiently and effectively search the visual information of interest. In this thesis, we propose methods to learn image representations to compactly represent a large collection of images, enabling accurate image recognition with linear classification models which offer the advantage of being efficient to both train and test. The entries of our descriptors are the output of a set of basis classifiers evaluated on the image, which capture the presence or absence of a set of high-level visual concepts. We propose two different techniques to automatically discover the visual concepts and learn the basis classifiers from a given labeled dataset of pictures, producing descriptors that highly-discriminate the original categories of the dataset. We empirically show that these descriptors are able to encode new unseen pictures, and produce state-of-the-art results in conjunct with cheap linear classifiers. We describe several strategies to aggregate the outputs of basis classifiers evaluated on multiple subwindows of the image in order to handle cases when the photo contains multiple objects and large amounts of clutter. We extend this framework for the task of object detection, where the goal is to spatially localize an object within an image. We use the output of a collection of detectors trained in an offline stage as features for new detection problems, showing competitive results with the current state of the art. Since generating rich manual annotations for an image dataset is a crucial limit of modern methods in object localization and detection, in this thesis we also propose a method to automatically generate training data for an object detector in a weakly-supervised fashion, yielding considerable savings in human annotation efforts. We show that our automatically-generated regions can be used to train object detectors with recognition results remarkably close to those obtained by training on manually annotated bounding boxes

    EGO-TOPO: Environment Affordances from Egocentric Video

    Full text link
    First-person video naturally brings the use of a physical environment to the forefront, since it shows the camera wearer interacting fluidly in a space based on his intentions. However, current methods largely separate the observed actions from the persistent space itself. We introduce a model for environment affordances that is learned directly from egocentric video. The main idea is to gain a human-centric model of a physical space (such as a kitchen) that captures (1) the primary spatial zones of interaction and (2) the likely activities they support. Our approach decomposes a space into a topological map derived from first-person activity, organizing an ego-video into a series of visits to the different zones. Further, we show how to link zones across multiple related environments (e.g., from videos of multiple kitchens) to obtain a consolidated representation of environment functionality. On EPIC-Kitchens and EGTEA+, we demonstrate our approach for learning scene affordances and anticipating future actions in long-form video.Comment: Published in CVPR 2020, project page: http://vision.cs.utexas.edu/projects/ego-topo

    GNSS Shadow Matching: The Challenges Ahead

    Get PDF
    GNSS shadow matching is a new technique that uses 3D mapping to improve positioning accuracy in dense urban areas from tens of meters to within five meters, potentially less. This paper presents the first comprehensive review of shadow matching’s error sources and proposes a program of research and development to take the technology from proof of concept to a robust, reliable and accurate urban positioning product. A summary of the state of the art is also included. Error sources in shadow matching may be divided into six categories: initialization, modelling, propagation, environmental complexity, observation, and algorithm approximations. Performance is also affected by the environmental geometry and it is sometimes necessary to handle solution ambiguity. For each error source, the cause and how it impacts the position solution is explained. Examples are presented, where available, and improvements to the shadow-matching algorithms to mitigate each error are proposed. Methods of accommodating quality control within shadow matching are then proposed, including uncertainty determination, ambiguity detection, and outlier detection. This is followed by a discussion of how shadow matching could be integrated with conventional ranging-based GNSS and other navigation and positioning technologies. This includes a brief review of methods to enhance ranging-based GNSS using 3D mapping. Finally, the practical engineering challenges of shadow matching are assessed, including the system architecture, efficient GNSS signal prediction and the acquisition of 3D mapping data
    • …
    corecore