12 research outputs found

    Object Counting and Localization: A Statistical Approach

    Get PDF
    Scene understanding is fundamental to many computer vision applications such as autonomous driving, robot navigation and human-machine interaction; visual object counting and localization are important building blocks of scene understanding. In this dissertation, we present: (1) a framework that employs doubly stochastic Poisson (Cox) processes to estimate the number of instances of an object in an image and (2) a Bayesian model that localizes multiple instances of an object using counts from image sub-regions. Poisson processes are well-suited for modeling events that occur randomly in space, such as the location of objects in an image or the enumeration of objects in a scene. The proposed algorithm selects a subset of bounding boxes in the image domain, then queries them for the presence of the object of interest by running a pre-trained convolutional neural net (CNN) classifier. The resulting observations are then aggregated, and a posterior distribution over the intensity of a Cox process is computed. This intensity function is summed up, providing an estimator of the number of instances of the object over the entire image. Despite the flexibility and versatility of Poisson processes, their application to large datasets is limited, as their computational complexity and storage requirements do not easily scale with image size, typically requiring O(n3)O(n^3) computation time and O(n2)O(n^2) storage, where nn is the number of observations. To mitigate this problem, we employ the Kronecker algebra, which takes advantage of the tensor product structure of covariance matrices. As the likelihood is non-Gaussian, the Laplace approximation is used for inference, employing the conjugate gradient and Newton's method. Our approach has then close to linear performance, requiring only O(n3/2)O(n^{3/2}) computation time and O(n)O(n) memory. We demonstrate the counting results on both simulated data and real-world datasets, comparing the results with state-of-the-art counting methods. We then extend this framework by noting that most object detection and classification systems rely upon the use of region proposal networks or upon classifying the ``objectness'' of specific sub-windows to help detect potential object locations within an image. We use our Cox model to convert such region proposals to a well-defined Poisson intensity. This output can be used as-is to directly estimate object counts, or can be plugged into pre-existing object detection frameworks to improve their counting and detection performance. This remapping does not require the original network to be re-trained: the parameters of the model can be estimated analytically from the training data. Furthermore, we consider the problem of quickly localizing multiple instances of an object by asking questions of the form ``How many instances are there in this set?", while obtaining noisy answers. We evaluate the performance of the partitioning \textit{policy} using the expected entropy of the posterior distribution after a fixed number of questions with noisy answers. We derive a lower bound for the value of this problem and study a specific policy, named the \textit{dyadic policy}. We show that this policy achieves a value which is no more than twice this lower bound when answers are noise-free, and show a more general constant factor approximation guarantee for the noisy setting. We present an empirical evaluation of this policy on simulated data for the problem of detecting multiple instances of the same object in an image. Finally, we present experiments on localizing multiple objects simultaneously on real images

    Cox Processes for Counting by Detection

    Get PDF
    In this work, doubly stochastic Poisson (Cox) processes and convolutional neural net (CNN) classifiers are used to estimate the number of instances of an object in an image. Poisson processes are well suited to model events that occur randomly in space, such as the location of objects in an image or the enumeration of objects in a scene. The proposed algorithm selects a subset of bounding boxes in the image domain, then queries them for the presence of the object of interest by running a pre-trained CNN classifier. The resulting observations are then aggregated, and a posterior distribution over the intensity of a Cox process is computed. This intensity function is summed up, providing an estimator of the number of instances of the object over the entire image. Despite the flexibility and versatility of Cox processes, their application to large datasets is limited as their computational complexity and storage requirements do not easily scale with image size, typically requiring O(n3) computation time and O(n2) storage, where n is the number of observations. To mitigate this problem, we employ the Kronecker algebra, which takes advantage of direct product structures. As the likelihood is non-Gaussian, the Laplace approximation is used for inference, employing the conjugate gradient and Newton’s method. Our approach has then close to linear performance, requiring only O(n3/2) computation time and O(n) memory. Results are presented on simulated data and on images from the publicly available MS COCO dataset. We compare our counting results with the state-of-the-art detection method, Faster RCNN, and demonstrate superior performance

    Bayesian Group Testing Under Sum Observations: A Parallelizable Two-Approximation for Entropy Loss

    No full text
    We consider the problem of group testing with sum observations and noiseless answers, in which we aim to locate multiple objects by querying the number of objects in each of a sequence of chosen sets. We study a probabilistic setting with entropy loss, in which we assume a joint Bayesian prior density on the locations of the objects and seek to choose the sets queried to minimize the expected entropy of the Bayesian posterior distribution after a fixed number of questions. We present a new non-adaptive policy, called the dyadic policy, show that it is optimal among non-adaptive policies, and is within a factor of two of optimal among adaptive policies. This policy is quick to compute, its nonadaptive nature makes it easy to parallelize, and our bounds show that it performs well even when compared with adaptive policies. We also study an adaptive greedy policy, which maximizes the one-step expected reduction in entropy, and show that it performs at least as well as the dyadic policy, offering greater query efficiency but reduced parallelism. Numerical experiments demonstrate that both procedures outperform a divide-and-conquer benchmark policy from the literature, called sequential bifurcation, and show how these procedures may be applied in a stylized computer vision problem

    Evaluation of a perineal access device for MRI-guided prostate interventions

    No full text
    This paper describes a perineal access tool for MRI-guided prostate interventions and evaluates it using a phantom study. The development of this device has been driven by the clinical need and a close collaboration effort. The device seamlessly fits into the workflow of MRI-guided prostate procedures such as cryoablation and biopsies. It promises a significant cut in the procedure time, accurate needle placement, lower number of insertions, and a potential for better patient outcomes. The current embodiment includes a frame which is placed next to the perineum and incorporates both visual and MRI-visible markers. These markers are automatically detected both in MRI and by a pair of stereo cameras (optical head) allowing for automatic optical registration. The optical head illuminates the procedure area and can track instruments and ultrasound probes. The frame has a window to access the perineum. Multiple swappable grids may be placed in this window depending on the application. It is also possible to entirely remove the grid for freehand procedures. All the components are designed to be used inside the MRI suite. To test this system, we built a custom phantom with MRI visible targets and planned 21 needle insertions with three grid types using the SCENERGY software. With an average insertion depth of about 85 mm, the average error of needle tip placement was 2.74 mm. We estimated the error by manually segmenting the needle tip in post-insertion MRIs of the phantom and comparing that to the plan
    corecore