11 research outputs found

    Collaborative Feature Learning from Social Media

    Full text link
    Image feature representation plays an essential role in image recognition and related tasks. The current state-of-the-art feature learning paradigm is supervised learning from labeled data. However, this paradigm requires large-scale category labels, which limits its applicability to domains where labels are hard to obtain. In this paper, we propose a new data-driven feature learning paradigm which does not rely on category labels. Instead, we learn from user behavior data collected on social media. Concretely, we use the image relationship discovered in the latent space from the user behavior data to guide the image feature learning. We collect a large-scale image and user behavior dataset from Behance.net. The dataset consists of 1.9 million images and over 300 million view records from 1.9 million users. We validate our feature learning paradigm on this dataset and find that the learned feature significantly outperforms the state-of-the-art image features in learning better image similarities. We also show that the learned feature performs competitively on various recognition benchmarks

    Exemplar codes for facial attributes and tattoo recognition

    Get PDF
    Abstract When implementing real-world computer vision systems, researchers can use mid-level representations as a tool to adjust the trade-off between accuracy and efficiency. Unfortunately, existing mid-level representations that improve accuracy tend to decrease efficiency, or are specifically tailored to work well within one pipeline or vision problem at the exclusion of others. We introduce a novel, efficient mid-level representation that improves classification efficiency without sacrificing accuracy. Our Exemplar Codes are based on linear classifiers and probability normalization from extreme value theory. We apply Exemplar Codes to two problems: facial attribute extraction and tattoo classification. In these settings, our Exemplar Codes are competitive with the state of the art and offer efficiency benefits, making it possible to achieve high accuracy even on commodity hardware with a low computational budget

    DaMN – Discriminative and Mutually Nearest: Exploiting Pairwise Category Proximity for Video Action Recognition

    Full text link
    We propose a method for learning discriminative category-level features and demonstrate state-of-the-art results on large-scale action recognition in video. The key observation is that one-vs-rest classifiers, which are ubiquitously employed for this task, face challenges in separating very similar categories (such as running vs. jogging). Our proposed method automatically identifies such pairs of categories using a criterion of mutual pairwise proximity in the (kernelized) feature space, using a category-level similarity matrix where each entry corresponds to the one-vs-one SVM margin for pairs of categories. We then exploit the observation that while splitting such Siamese Twin categories may be difficult, separating them from the remaining categories in a two-vs-rest framework is not. This enables us to augment one-vs-rest classifiers with a judicious selection of two-vs-rest classifier outputs, formed from such discriminative and mutually nearest (DaMN) pairs. By combining one-vs-rest and two-vs-rest features in a principled probabilistic manner, we achieve state-of-the-art results on the UCF101 and HMDB51 datasets. More importantly, the same DaMN features, when treated as a mid-level representation also outperform existing methods in knowledge transfer experiments, both cross-dataset from UCF101 to HMDB51 and to new categories with limited training data (one-shot and few-shot learning). Finally, we study the generality of the proposed approach by applying DaMN to other classification tasks; our experiments show that DaMN outperforms related approaches in direct comparisons, not only on video action recognition but also on their original image dataset tasks. © 2014 Springer International Publishing

    Active Object Recognition with a Space-Variant Retina

    Get PDF

    Convolutional neural networks for style classification

    Get PDF
    Amb la col·laboració d'aquestes universitats: UNIVERSITAT DE BARCELONA UNIVERSITAT ROVIRA I VIRGILIIn recent years convolutional neural networks have enjoyed great success. Especially in the field of object recognition great leaps forward have been made. Researchers were able to exploit the object detection features from such networks for many useful and interesting applications like sentiment analysis and information retrieval. Unfortunately, many times the importance of style is not being considered adequately in these systems. This is partly because style is a concept that is difficult to define and labeled data is scarce. Recent developments in texture synthesis and style transfer, however, sparked new interest in the field. In particular feature correlations from convolutional neural networks, which were trained on object recognition, have been shown to work well on these tasks. I propose that such techniques can help in classifying style. In the course of this thesis I setup a experiment to show that this is indeed the case. Furthermore, I show that the performance of the CNN and the depth of the layer from which the feature correlations are taken from influences the classification performance

    Methods for efficient object categorization, detection, scene recognition, and image search

    Get PDF
    In the past few years there has been a tremendous growth in the usage of digital images. Users can now access millions of photos, a fact that poses the need of having methods that can efficiently and effectively search the visual information of interest. In this thesis, we propose methods to learn image representations to compactly represent a large collection of images, enabling accurate image recognition with linear classification models which offer the advantage of being efficient to both train and test. The entries of our descriptors are the output of a set of basis classifiers evaluated on the image, which capture the presence or absence of a set of high-level visual concepts. We propose two different techniques to automatically discover the visual concepts and learn the basis classifiers from a given labeled dataset of pictures, producing descriptors that highly-discriminate the original categories of the dataset. We empirically show that these descriptors are able to encode new unseen pictures, and produce state-of-the-art results in conjunct with cheap linear classifiers. We describe several strategies to aggregate the outputs of basis classifiers evaluated on multiple subwindows of the image in order to handle cases when the photo contains multiple objects and large amounts of clutter. We extend this framework for the task of object detection, where the goal is to spatially localize an object within an image. We use the output of a collection of detectors trained in an offline stage as features for new detection problems, showing competitive results with the current state of the art. Since generating rich manual annotations for an image dataset is a crucial limit of modern methods in object localization and detection, in this thesis we also propose a method to automatically generate training data for an object detector in a weakly-supervised fashion, yielding considerable savings in human annotation efforts. We show that our automatically-generated regions can be used to train object detectors with recognition results remarkably close to those obtained by training on manually annotated bounding boxes

    Large-Margin Learning of Compact Binary Image Encodings

    Full text link
    corecore