258 research outputs found

    Randomized Optimum Models for Structured Prediction

    Get PDF
    One approach to modeling structured discrete data is to describe the probability of states via an energy function and Gibbs distribution. A recurring difficulty in these models is the computation of the partition function, which may require an intractable sum. However, in many such models, the mode can be found efficiently even when the partition function is unavailable. Recent work on Perturb-and-MAP (PM) models (Papandreou and Yuille, 2011) has exploited this discrepancy to approximate the Gibbs distribution for Markov random fields (MRFs). Here, we explore a broader class of models, called Randomized Optimum models (RandOMs), which include PM as a special case. This new class of models encompasses not only MRFs, but also other models that have intractable partition functions yet permit efficient mode-finding, such as those based on bipartite matchings, shortest paths, or connected components in a graph. We develop likelihood-based learning algorithms for RandOMs, which, empirical results indicate, can produce better models than PM.Engineering and Applied Science

    Spectrally approximating large graphs with smaller graphs

    Get PDF
    How does coarsening affect the spectrum of a general graph? We provide conditions such that the principal eigenvalues and eigenspaces of a coarsened and original graph Laplacian matrices are close. The achieved approximation is shown to depend on standard graph-theoretic properties, such as the degree and eigenvalue distributions, as well as on the ratio between the coarsened and actual graph sizes. Our results carry implications for learning methods that utilize coarsening. For the particular case of spectral clustering, they imply that coarse eigenvectors can be used to derive good quality assignments even without refinement---this phenomenon was previously observed, but lacked formal justification.Comment: 22 pages, 10 figure

    Structured learning for information retrieval

    Get PDF
    Information retrieval is the area of study concerned with the process of searching, recovering and interpreting information from large amounts of data. In this Thesis we show that many of the problems in information retrieval consist of structured learning, where the goal is to learn predictors of complex output structures, consisting of many inter-dependent variables. We then attack these problems using principled machine learning methods that are specifically suited for such scenarios. In the process of doing so, we develop new models, new model extensions and new algorithms that, when integrated with existing methodology, comprise a new set of tools for solving a variety of information retrieval problems. Firstly, we cover the multi-label classification problem, where we seek to predict a set of labels associated with a given object; the output in this case is structured, as the output variables are interdependent. Secondly, we focus on document ranking, where given a query and a set of documents associated with it we want to rank them according to their relevance with respect to the query; here, again, we have a structured output - a ranking of documents. Thirdly, we address topic models, where we are given a set of documents and attempt to find a compact representation of them, by learning latent topics and associating a topic distribution to each document; the output is again structured, consisting of word and topic distributions. For all the above problems, we obtain state-of-the-art solutions as attested by empirical performance in publicly available real-world datasets

    Seventh Biennial Report : June 2003 - March 2005

    No full text

    Tree-walk kernels for computer vision

    Get PDF
    International audienceWe propose a family of positive-definite kernels between images, allowing to compute image similarity measures respectively in terms of color and of shape. The kernels consists in matching subtree-patterns called "tree-walks" of graphs extracted from the images, e.g. the segmentation graphs for color similarity and graphs of the discretized shapes or the point clouds in general for shape similarity. In both cases, we are able to design computationally efficient kernels which can be computed in polynomial-time in the size of the graphs, by leveraging specific properties of the graphs at hand such as planarity for adjacency graphs (segmentation graphs) or factorizability of the associated graphical model for point clouds. Our kernels can be used by any kernel-based learning method, and hence we present experimental results for supervised and semi-supervised classification as well as clustering of natural images and supervised classification of handwritten digits and Chinese characters from few training examples

    Single View Modeling and View Synthesis

    Get PDF
    This thesis develops new algorithms to produce 3D content from a single camera. Today, amateurs can use hand-held camcorders to capture and display the 3D world in 2D, using mature technologies. However, there is always a strong desire to record and re-explore the 3D world in 3D. To achieve this goal, current approaches usually make use of a camera array, which suffers from tedious setup and calibration processes, as well as lack of portability, limiting its application to lab experiments. In this thesis, I try to produce the 3D contents using a single camera, making it as simple as shooting pictures. It requires a new front end capturing device rather than a regular camcorder, as well as more sophisticated algorithms. First, in order to capture the highly detailed object surfaces, I designed and developed a depth camera based on a novel technique called light fall-off stereo (LFS). The LFS depth camera outputs color+depth image sequences and achieves 30 fps, which is necessary for capturing dynamic scenes. Based on the output color+depth images, I developed a new approach that builds 3D models of dynamic and deformable objects. While the camera can only capture part of a whole object at any instance, partial surfaces are assembled together to form a complete 3D model by a novel warping algorithm. Inspired by the success of single view 3D modeling, I extended my exploration into 2D-3D video conversion that does not utilize a depth camera. I developed a semi-automatic system that converts monocular videos into stereoscopic videos, via view synthesis. It combines motion analysis with user interaction, aiming to transfer as much depth inferring work from the user to the computer. I developed two new methods that analyze the optical flow in order to provide additional qualitative depth constraints. The automatically extracted depth information is presented in the user interface to assist with user labeling work. In this thesis, I developed new algorithms to produce 3D contents from a single camera. Depending on the input data, my algorithm can build high fidelity 3D models for dynamic and deformable objects if depth maps are provided. Otherwise, it can turn the video clips into stereoscopic video

    Challenges in 3D scanning: Focusing on Ears and Multiple View Stereopsis

    Get PDF
    • …
    corecore