5,873 research outputs found

    ShapeCodes: Self-Supervised Feature Learning by Lifting Views to Viewgrids

    Full text link
    We introduce an unsupervised feature learning approach that embeds 3D shape information into a single-view image representation. The main idea is a self-supervised training objective that, given only a single 2D image, requires all unseen views of the object to be predictable from learned features. We implement this idea as an encoder-decoder convolutional neural network. The network maps an input image of an unknown category and unknown viewpoint to a latent space, from which a deconvolutional decoder can best "lift" the image to its complete viewgrid showing the object from all viewing angles. Our class-agnostic training procedure encourages the representation to capture fundamental shape primitives and semantic regularities in a data-driven manner---without manual semantic labels. Our results on two widely-used shape datasets show 1) our approach successfully learns to perform "mental rotation" even for objects unseen during training, and 2) the learned latent space is a powerful representation for object recognition, outperforming several existing unsupervised feature learning methods.Comment: To appear at ECCV 201

    Towards Incremental Parsing of Natural Language using Recursive Neural Networks

    Get PDF
    In this paper we develop novel algorithmic ideas for building a natural language parser grounded upon the hypothesis of incrementality. Although widely accepted and experimentally supported under a cognitive perspective as a model of the human parser, the incrementality assumption has never been exploited for building automatic parsers of unconstrained real texts. The essentials of the hypothesis are that words are processed in a left-to-right fashion, and the syntactic structure is kept totally connected at each step. Our proposal relies on a machine learning technique for predicting the correctness of partial syntactic structures that are built during the parsing process. A recursive neural network architecture is employed for computing predictions after a training phase on examples drawn from a corpus of parsed sentences, the Penn Treebank. Our results indicate the viability of the approach andlay out the premises for a novel generation of algorithms for natural language processing which more closely model human parsing. These algorithms may prove very useful in the development of eÆcient parsers

    Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision

    Get PDF
    Contemporary approaches to natural language processing are predominantly based on statistical machine learning from large amounts of text, which has been manually annotated with the linguistic structure of interest. However, such complete supervision is currently only available for the world's major languages, in a limited number of domains and for a limited range of tasks. As an alternative, this dissertation considers methods for linguistic structure prediction that can make use of incomplete and cross-lingual supervision, with the prospect of making linguistic processing tools more widely available at a lower cost. An overarching theme of this work is the use of structured discriminative latent variable models for learning with indirect and ambiguous supervision; as instantiated, these models admit rich model features while retaining efficient learning and inference properties. The first contribution to this end is a latent-variable model for fine-grained sentiment analysis with coarse-grained indirect supervision. The second is a model for cross-lingual word-cluster induction and the application thereof to cross-lingual model transfer. The third is a method for adapting multi-source discriminative cross-lingual transfer models to target languages, by means of typologically informed selective parameter sharing. The fourth is an ambiguity-aware self- and ensemble-training algorithm, which is applied to target language adaptation and relexicalization of delexicalized cross-lingual transfer parsers. The fifth is a set of sequence-labeling models that combine constraints at the level of tokens and types, and an instantiation of these models for part-of-speech tagging with incomplete cross-lingual and crowdsourced supervision. In addition to these contributions, comprehensive overviews are provided of structured prediction with no or incomplete supervision, as well as of learning in the multilingual and cross-lingual settings. Through careful empirical evaluation, it is established that the proposed methods can be used to create substantially more accurate tools for linguistic processing, compared to both unsupervised methods and to recently proposed cross-lingual methods. The empirical support for this claim is particularly strong in the latter case; our models for syntactic dependency parsing and part-of-speech tagging achieve the hitherto best published results for a wide number of target languages, in the setting where no annotated training data is available in the target language

    Long-Term Localization for Self-Driving Cars

    Get PDF
    Long-term localization is hard due to changing conditions, while relative localization within time sequences is much easier. To achieve long-term localization in a sequential setting, such as, for self-driving cars, relative localization should be used to the fullest extent, whenever possible.This thesis presents solutions and insights both for long-term sequential visual localization, and localization using global navigational satellite systems (GNSS), that push us closer to the goal of accurate and reliable localization for self-driving cars. It addresses the question: How to achieve accurate and robust, yet cost-effective long-term localization for self-driving cars?Starting in this question, the thesis explores how existing sensor suites for advanced driver-assistance systems (ADAS) can be used most efficiently, and how landmarks in maps can be recognized and used for localization even after severe changes in appearance. The findings show that:* State-of-the-art ADAS sensors are insufficient to meet the requirements for localization of a self-driving car in less than ideal conditions.GNSS and visual localization are identified as areas to improve.\ua0* Highly accurate relative localization with no convergence delay is possible by using time relative GNSS observations with a single band receiver, and no base stations.\ua0* Sequential semantic localization is identified as a promising focus point for further research based on a benchmark study comparing state-of-the-art visual localization methods in challenging autonomous driving scenarios including day-to-night and seasonal changes.\ua0* A novel sequential semantic localization algorithm improves accuracy while significantly reducing map size compared to traditional methods based on matching of local image features.\ua0* Improvements for semantic segmentation in challenging conditions can be made efficiently by automatically generating pixel correspondences between images from a multitude of conditions and enforcing a consistency constraint during training.\ua0* A segmentation algorithm with automatically defined and more fine-grained classes improves localization performance.\ua0* The performance advantage seen in single image localization for modern local image features, when compared to traditional ones, is all but erased when considering sequential data with odometry, thus, encouraging to focus future research more on sequential localization, rather than pure single image localization

    Iterative annotation to ease neural network training: Specialized machine learning in medical image analysis

    Get PDF
    Neural networks promise to bring robust, quantitative analysis to medical fields, but adoption is limited by the technicalities of training these networks. To address this translation gap between medical researchers and neural networks in the field of pathology, we have created an intuitive interface which utilizes the commonly used whole slide image (WSI) viewer, Aperio ImageScope (Leica Biosystems Imaging, Inc.), for the annotation and display of neural network predictions on WSIs. Leveraging this, we propose the use of a human-in-the-loop strategy to reduce the burden of WSI annotation. We track network performance improvements as a function of iteration and quantify the use of this pipeline for the segmentation of renal histologic findings on WSIs. More specifically, we present network performance when applied to segmentation of renal micro compartments, and demonstrate multi-class segmentation in human and mouse renal tissue slides. Finally, to show the adaptability of this technique to other medical imaging fields, we demonstrate its ability to iteratively segment human prostate glands from radiology imaging data.Comment: 15 pages, 7 figures, 2 supplemental figures (on the last page

    Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement and Retrieval

    Get PDF
    Where previous reviews on content-based image retrieval emphasize on what can be seen in an image to bridge the semantic gap, this survey considers what people tag about an image. A comprehensive treatise of three closely linked problems, i.e., image tag assignment, refinement, and tag-based image retrieval is presented. While existing works vary in terms of their targeted tasks and methodology, they rely on the key functionality of tag relevance, i.e. estimating the relevance of a specific tag with respect to the visual content of a given image and its social context. By analyzing what information a specific method exploits to construct its tag relevance function and how such information is exploited, this paper introduces a taxonomy to structure the growing literature, understand the ingredients of the main works, clarify their connections and difference, and recognize their merits and limitations. For a head-to-head comparison between the state-of-the-art, a new experimental protocol is presented, with training sets containing 10k, 100k and 1m images and an evaluation on three test sets, contributed by various research groups. Eleven representative works are implemented and evaluated. Putting all this together, the survey aims to provide an overview of the past and foster progress for the near future.Comment: to appear in ACM Computing Survey

    Learning Efficient Disambiguation

    Get PDF
    This dissertation analyses the computational properties of current performance-models of natural language parsing, in particular Data Oriented Parsing (DOP), points out some of their major shortcomings and suggests suitable solutions. It provides proofs that various problems of probabilistic disambiguation are NP-Complete under instances of these performance-models, and it argues that none of these models accounts for attractive efficiency properties of human language processing in limited domains, e.g. that frequent inputs are usually processed faster than infrequent ones. The central hypothesis of this dissertation is that these shortcomings can be eliminated by specializing the performance-models to the limited domains. The dissertation addresses "grammar and model specialization" and presents a new framework, the Ambiguity-Reduction Specialization (ARS) framework, that formulates the necessary and sufficient conditions for successful specialization. The framework is instantiated into specialization algorithms and applied to specializing DOP. Novelties of these learning algorithms are 1) they limit the hypotheses-space to include only "safe" models, 2) are expressed as constrained optimization formulae that minimize the entropy of the training tree-bank given the specialized grammar, under the constraint that the size of the specialized model does not exceed a predefined maximum, and 3) they enable integrating the specialized model with the original one in a complementary manner. The dissertation provides experiments with initial implementations and compares the resulting Specialized DOP (SDOP) models to the original DOP models with encouraging results.Comment: 222 page
    • …
    corecore