7 research outputs found

    Complexity of Representation and Inference in Compositional Models with Part Sharing

    Get PDF
    This paper performs a complexity analysis of a class of serial and parallel compositional models of multiple objects and shows that they enable efficient representation and rapid inference. Compositional models are generative and represent objects in a hierarchically distributed manner in terms of parts and subparts, which are constructed recursively by part-subpart compositions. Parts are represented more coarsely at higher level of the hierarchy, so that the upper levels give coarse summary descriptions (e.g., there is a horse in the image) while the lower levels represents the details (e.g., the positions of the legs of the horse). This hierarchically distributed representation obeys the executive summary principle, meaning that a high level executive only requires a coarse summary description and can, if necessary, get more details by consulting lower level executives. The parts and subparts are organized in terms of hierarchical dictionaries which enables part sharing between different objects allowing efficient representation of many objects. The first main contribution of this paper is to show that compositional models can be mapped onto a parallel visual architecture similar to that used by bio-inspired visual models such as deep convolutional networks but more explicit in terms of representation, hence enabling part detection as well as object detection, and suitable for complexity analysis. Inference algorithms can be run on this architecture to exploit the gains caused by part sharing and executive summary. Effectively, this compositional architecture enables us to perform exact inference simultaneously over a large class of generative models of objects.The second contribution is an analysis of the complexity of compositional models in terms of computation time (for serial computers) and numbers of nodes (e.g., ``neurons") for parallel computers. In particular, we compute the complexity gains by part sharing and executive summary and their dependence on how the dictionary scales with the level of the hierarchy. We explore three regimes of scaling behavior where the dictionary size (i) increases exponentially with the level of the hierarchy, (ii) is determined by an unsupervised compositional learning algorithm applied to real data, (iii) decreases exponentially with scale. This analysis shows that in some regimes the use of shared parts enables algorithms which can perform inference in time linear in the number of levels for an exponential number of objects. In other regimes part sharing has little advantage for serial computers but can enable linear processing on parallel computers.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216 and also by ARO 62250-CS

    Internal structure and dynamics of extragalactic relativistic jets

    Get PDF
    Radio-loud AGN typically manifest powerful relativistic jets extending up to millions of light years and often showing superluminal motions organised in a complex kinematic pattern. A number of physical models are still competing to explain the jet structure and kinematics revealed by radio images using the Very Long Baseline Interferometer (VLBI) technique. Robust measurements of longitudinal and transverse velocity field in the jets would provide crucial information for these models. This is a difficult task, particularly for transversely resolved jets in objects like 3C\,273 and M87. To address this task, we have developed a new wavelet-based image segmentation and evaluation (WISE) technique for identifying significant structural patterns (SSP) of smooth, transversely resolved flows and obtaining a velocity field from cross-correlation of these regions in multi-epoch observations. Detection of individual SSP is performed using the wavelet decomposition and multiscale segmentation of the observed structure. The cross-correlation algorithm combines structural information on different scales of the wavelet decomposition, providing a robust and reliable identification of related SSP in multi-epoch images. A stacked cross correlation (SCC) method is also introduced to recover multiple velocity components from partially overlapping, optically thin emitting regions. Test performed on simulated image of jets revealed excellent performance of WISE. The algorithm enables recovering structural evolution on scales down to 0.25 FWHM of the image point spread function (PSF). It also performs well on sparse or irregular sets of observations, providing robust identification for structural displacements as large as 3 PSF size. on astronomical images by applying it to several image sequences obtained as part of the MOJAVE long-term monitoring program of extragalactic jets with VLBI observations. The particular focus of the analysis was made on prominent radio jets in the quasar 3C\,273 and the radio galaxies 3C\,120 and 3C\,111. Results showed the robustness and fidelity of results obtained from WISE compared with those coming from the ``standard'' procedure of using multiple Gaussian components to represent the observed structure. These tests demonstrated also the excellent efficiency of the method, with WISE analysis taking only a few minutes of computing time to recover the same structural information which required weeks of model fitting efforts. Going beyond global one dimensional kinematic analysis, WISE revealed transverse structure in the the jet of 3C\,273, with three distinct flow lines clearly present inside the jet and evolving in a regular fashion, suggesting a pattern that may rise as a result of Kelvin-Helmholtz (K-H) instability that has previously been detected in this jet. The positional precision of the WISE decomposition was also critical on modeling the helical trajectory of the components in the jet of 3C\,120, revealing an helical surface mode of the K-H instability with an apparent wavelength λapp=15.8mas\lambda_{\mathrm{app}} = 15.8\,\mathrm{mas} and evolving at an apparent speed βappw=2.0c\beta^w_{\mathrm{app}} = 2.0\,\mathrm{c}. We finally present in this thesis the first detailed transverse velocity fields of the jet in M87 on scales of 103104rg10^{3} \text{--} 10^{4} r_g. Its proximity combined with a large mass of the central black hole make it one of the primary source to probe the jet formation and acceleration region. We analyzed 11 epochs of the M87 jet VLBA movie project observed at 3 weeks intervals revealing a structured and stratified flow, compatible with a magnetically launched, accelerated and collimated jet. Based on the structural analysis obtained with WISE, important physical parameters of the flow could be determined. Using the velocity detected in the counter-jet and the intensity ratio between the jet and counter-jet, we estimated the viewing angle θ=18°\theta = 18\degree. Differential velocity in the northern and southern limbs of the flow was explained by the jet rotation consistent with a field line with angular velocity Ω106s1\Omega \sim 10^{-6} s^ {-1} and corresponding to a launching location in the inner part of the accretion disk. The stratification in the flow was unveiled from a SCC analysis that detected a slow mildly relativistic layer (β0.5c\beta \sim 0.5 c) associated either with the instability pattern speed or an outer wind, and a fast accelerating stream line (γ2.5\gamma \sim 2.5). The acceleration of the jet together with the collimation profile of the flow, was successfully modeled by solving the magnetohydrodynamics wind equation, indicating a total specific energy μ6\mu \sim 6, and a transition from Poynting to kinetic energy at a distance zeq3000Rsz_{eq} \sim 3000 R_s, in a good agreement with previous analytic and simulation work

    Statistical models for natural scene data

    Get PDF
    This thesis considers statistical modelling of natural image data. Obtaining advances in this field can have significant impact for both engineering applications, and for the understanding of the human visual system. Several recent advances in natural image modelling have been obtained with the use of unsupervised feature learning. We consider a class of such models, restricted Boltzmann machines (RBMs), used in many recent state-of-the-art image models. We develop extensions of these stochastic artificial neural networks, and use them as a basis for building more effective image models, and tools for computational vision. We first develop a novel framework for obtaining Boltzmann machines, in which the hidden unit activations co-transform with transformed input stimuli in a stable and predictable way throughout the network. We define such models to be transformation equivariant. Such properties have been shown useful for computer vision systems, and have been motivational for example in the development of steerable filters, a widely used classical feature extraction technique. Translation equivariant feature sharing has been the standard method for scaling image models beyond patch-sized data to large images. In our framework we extend shallow and deep models to account for other kinds of transformations as well, focusing on in-plane rotations. Motivated by the unsatisfactory results of current generative natural image models, we take a step back, and evaluate whether they are able to model a subclass of the data, natural image textures. This is a necessary subcomponent of any credible model for visual scenes. We assess the performance of a state- of-the-art model of natural images for texture generation, using a dataset and evaluation techniques from in prior work. We also perform a dissection of the model architecture, uncovering the properties important for good performance. Building on this, we develop structured extensions for more complicated data comprised of textures from multiple classes, using the single-texture model architecture as a basis. These models are shown to be able to produce state-of-the-art texture synthesis results quantitatively, and are also effective qualitatively. It is demonstrated empirically that the developed multiple-texture framework provides a means to generate images of differently textured regions, more generic globally varying textures, and can also be used for texture interpolation, where the approach is radically dfferent from the others in the area. Finally we consider visual boundary prediction from natural images. The work aims to improve understanding of Boltzmann machines in the generation of image segment boundaries, and to investigate deep neural network architectures for learning the boundary detection problem. The developed networks (which avoid several hand-crafted model and feature designs commonly used for the problem), produce the fastest reported inference times in the literature, combined with state-of-the-art performance

    Learning generative models of mid-level structure in natural images

    Get PDF
    Natural images arise from complicated processes involving many factors of variation. They reflect the wealth of shapes and appearances of objects in our three-dimensional world, but they are also affected by factors such as distortions due to perspective, occlusions, and illumination, giving rise to structure with regularities at many different levels. Prior knowledge about these regularities and suitable representations that allow efficient reasoning about the properties of a visual scene are important for many image processing and computer vision tasks. This thesis focuses on models of image structure at intermediate levels of complexity as required, for instance, for image inpainting or segmentation. It aims at developing generative, probabilistic models of this kind of structure, and, in particular, at devising strategies for learning such models in a largely unsupervised manner from data. One hallmark of natural images is that they can often be decomposed into regions with very different visual characteristics. The main approach of this thesis is therefore to represent images in terms of regions that are characterized by their shapes and appearances, and an image is then composed from many such regions. We explore approaches to learn about the appearance of regions, to learn about region shapes, and ways to combine several regions to form a full image. To achieve this goal, we make use of some ideas for unsupervised learning developed in the literature on models of low-level image structure and in the “deep learning” literature. These models are used as building blocks of more structured model formulations that incorporate additional prior knowledge of how images are formed. The thesis makes the following contributions: Firstly, we investigate a popular, MRF based prior of natural image structure, the Field-of Experts, with respect to its ability to model image textures, and propose an extended formulation that is considerably more successful at this task. This formulation gives rise to a fully parametric, translation-invariant probabilistic generative model of image textures. We illustrate how this model can be used as a component of a more comprehensive model of images comprising multiple textured regions. Secondly, we develop a model of region shape. This work is an extension of the “Masked Restricted Boltzmann Machine” proposed by Le Roux et al. (2011) and it allows explicit reasoning about the independent shapes and relative depths of occluding objects. We develop an inference and unsupervised learning scheme and demonstrate how this shape model, in combination with the masked RBM gives rise to a good model of natural image patches. Finally, we demonstrate how this model of region shape can be extended to model shapes in large images. The result is a generative model of large images which are formed by composition from many small, partially overlapping and occluding objects

    Graphical models for visual object recognition and tracking

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 277-301).We develop statistical methods which allow effective visual detection, categorization, and tracking of objects in complex scenes. Such computer vision systems must be robust to wide variations in object appearance, the often small size of training databases, and ambiguities induced by articulated or partially occluded objects. Graphical models provide a powerful framework for encoding the statistical structure of visual scenes, and developing corresponding learning and inference algorithms. In this thesis, we describe several models which integrate graphical representations with nonparametric statistical methods. This approach leads to inference algorithms which tractably recover high-dimensional, continuous object pose variations, and learning procedures which transfer knowledge among related recognition tasks. Motivated by visual tracking problems, we first develop a nonparametric extension of the belief propagation (BP) algorithm. Using Monte Carlo methods, we provide general procedures for recursively updating particle-based approximations of continuous sufficient statistics. Efficient multiscale sampling methods then allow this nonparametric BP algorithm to be flexibly adapted to many different applications.(cont.) As a particular example, we consider a graphical model describing the hand's three-dimensional (3D) structure, kinematics, and dynamics. This graph encodes global hand pose via the 3D position and orientation of several rigid components, and thus exposes local structure in a high-dimensional articulated model. Applying nonparametric BP, we recover a hand tracking algorithm which is robust to outliers and local visual ambiguities. Via a set of latent occupancy masks, we also extend our approach to consistently infer occlusion events in a distributed fashion. In the second half of this thesis, we develop methods for learning hierarchical models of objects, the parts composing them, and the scenes surrounding them. Our approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints. By building integrated scene models, we may discover contextual relationships, and better exploit partially labeled training images. We first consider images of isolated objects, and show that sharing parts among object categories improves accuracy when learning from few examples.(cont.) Turning to multiple object scenes, we propose nonparametric models which use Dirichlet processes to automatically learn the number of parts underlying each object category, and objects composing each scene. Adapting these transformed Dirichlet processes to images taken with a binocular stereo camera, we learn integrated, 3D models of object geometry and appearance. This leads to a Monte Carlo algorithm which automatically infers 3D scene structure from the predictable geometry of known object categories.by Erik B. Sudderth.Ph.D

    Dynamic Trees for Image Modelling

    No full text
    This paper introduces a new class of image model which we call dynamic trees or DTs. A dynamic tree model specifies a prior over structures of trees, each of which is a forest of one or more tree-structured belief networks (TSBN). In the literature standard tree-structured belief network models have been found to produce "blocky" segmentations when naturally occurring boundaries within an image did not coincide with those of the subtrees in the rigid fixed structure of the network. Dynamic trees have a flexible architecture which allows the structure to vary to create configurations where the subtree and image boundaries align, and experimentation with the model has shown significant improvements. For large
    corecore