20 research outputs found

    Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers

    Full text link
    Scene parsing, or semantic segmentation, consists in labeling each pixel in an image with the category of the object it belongs to. It is a challenging task that involves the simultaneous detection, segmentation and recognition of all the objects in the image. The scene parsing method proposed here starts by computing a tree of segments from a graph of pixel dissimilarities. Simultaneously, a set of dense feature vectors is computed which encodes regions of multiple sizes centered on each pixel. The feature extractor is a multiscale convolutional network trained from raw pixels. The feature vectors associated with the segments covered by each node in the tree are aggregated and fed to a classifier which produces an estimate of the distribution of object categories contained in the segment. A subset of tree nodes that cover the image are then selected so as to maximize the average "purity" of the class distributions, hence maximizing the overall likelihood that each segment will contain a single object. The convolutional network feature extractor is trained end-to-end from raw pixels, alleviating the need for engineered features. After training, the system is parameter free. The system yields record accuracies on the Stanford Background Dataset (8 classes), the Sift Flow Dataset (33 classes) and the Barcelona Dataset (170 classes) while being an order of magnitude faster than competing approaches, producing a 320 \times 240 image labeling in less than 1 second.Comment: 9 pages, 4 figures - Published in 29th International Conference on Machine Learning (ICML 2012), Jun 2012, Edinburgh, United Kingdo

    Teaching Compositionality to CNNs

    Full text link
    Convolutional neural networks (CNNs) have shown great success in computer vision, approaching human-level performance when trained for specific tasks via application-specific loss functions. In this paper, we propose a method for augmenting and training CNNs so that their learned features are compositional. It encourages networks to form representations that disentangle objects from their surroundings and from each other, thereby promoting better generalization. Our method is agnostic to the specific details of the underlying CNN to which it is applied and can in principle be used with any CNN. As we show in our experiments, the learned representations lead to feature activations that are more localized and improve performance over non-compositional baselines in object recognition tasks.Comment: Preprint appearing in CVPR 201

    Data-driven Crowd Analysis in Videos

    Get PDF
    International audienceIn this work we present a new crowd analysis algorithm powered by behavior priors that are learned on a large database of crowd videos gathered from the Internet. The algorithm works by first learning a set of crowd behavior priors off-line. During testing, crowd patches are matched to the database and behavior priors are transferred. We adhere to the insight that despite the fact that the entire space of possible crowd behaviors is infinite, the space of distinguishable crowd motion patterns may not be all that large. For many individuals in a crowd, we are able to find analogous crowd patches in our database which contain similar patterns of behavior that can effectively act as priors to constrain the difficult task of tracking an individual in a crowd. Our algorithm is data-driven and, unlike some crowd characterization methods, does not require us to have seen the test video beforehand. It performs like state-ofthe-art methods for tracking people having common crowd behaviors and outperforms the methods when the tracked individual behaves in an unusual way

    Building a database of 3D scenes from user annotations

    Full text link

    A Tree-Based Context Model for Object Recognition

    Get PDF
    There has been a growing interest in exploiting contextual information in addition to local features to detect and localize multiple object categories in an image. A context model can rule out some unlikely combinations or locations of objects and guide detectors to produce a semantically coherent interpretation of a scene. However, the performance benefit of context models has been limited because most of the previous methods were tested on datasets with only a few object categories, in which most images contain one or two object categories. In this paper, we introduce a new dataset with images that contain many instances of different object categories, and propose an efficient model that captures the contextual information among more than a hundred object categories using a tree structure. Our model incorporates global image features, dependencies between object categories, and outputs of local detectors into one probabilistic framework. We demonstrate that our context model improves object recognition performance and provides a coherent interpretation of a scene, which enables a reliable image querying system by multiple object categories. In addition, our model can be applied to scene understanding tasks that local detectors alone cannot solve, such as detecting objects out of context or querying for the most typical and the least typicalscenes in a dataset.This research was partially funded by Shell International Exploration and Production Inc., by Army Research Office under award W911NF-06-1-0076, by NSF Career Award (ISI 0747120), and by the Air Force Office of Scientific Research under Award No.FA9550-06-1-0324. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the Air Force

    SIFT Flow: Dense Correspondence across Scenes and its Applications

    Get PDF
    While image alignment has been studied in different areas of computer vision for decades, aligning images depicting different scenes remains a challenging problem. Analogous to optical flow where an image is aligned to its temporally adjacent frame, we propose SIFT flow, a method to align an image to its nearest neighbors in a large image corpus containing a variety of scenes. The SIFT flow algorithm consists of matching densely sampled, pixel-wise SIFT features between two images, while preserving spatial discontinuities. The SIFT features allow robust matching across different scene/object appearances, whereas the discontinuity-preserving spatial model allows matching of objects located at different parts of the scene. Experiments show that the proposed approach robustly aligns complex scene pairs containing significant spatial differences. Based on SIFT flow, we propose an alignment-based large database framework for image analysis and synthesis, where image information is transferred from the nearest neighbors to a query image according to the dense scene correspondence. This framework is demonstrated through concrete applications, such as motion field prediction from a single image, motion synthesis via object transfer, satellite image registration and face recognition

    AdaBoost 방법을 통해 학습된 SVM 분류기를 이용한 영상 분류

    Get PDF
    학위논문 (석사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 2. 유석인.This thesis presents the algorithm that categorizes images by objects contained in the images. The images are encoded with bag-of-features (BoF) model which represents an image as a collection of unordered features extracted from the local patches. To deal with the classification of multiple object categories, the one-versus-all method is applied for the implementation of multi-class classifier. The object classifiers are built as the number of object categories, and each classifier decides whether an image is included in the object category or not. The object classifier has been developed on the AdaBoost method. The object classifier is given by the weighted sum of 200 support vector machine (SVM) component classifiers. Among multiple object classifiers, the classifier with the highest output function value finally determines the category of the object image. The classification efficiency of the presented algorithm has been illustrated on the images from Caltech-101 dataset.Abstract i Contents iii List of Figures v List of Tables vi Chapter 1 Introduction 1 Chapter 2 Related Work 3 2.1 Image classification approaches . . . . . . . . . . . 3 2.2 Boosting methods . . . . . . . . . . . . . . . 6 2.3 Background . . . . . . . . . . . . . . . . . 9 2.3.1 Support vector machine . . . . . . . . . . . . . 9 Chapter 3 Proposed Algorithm 12 3.1 SIFT feature extraction . . . . . . . . . . . . . 13 3.2 Codebook construction . . . . . . . . . . . . . 15 3.3 Bag-of-features representation . . . . . . . . . . . 16 3.4 Classifier design . . . . . . . . . . . . . . . 16 Chapter 4 Experiments 20 4.1 Dataset . . . . . . . . . . . . . . . . . . 20 4.2 Bag-of-features representation . . . . . . . . . . . 22 4.3 Classifiers . . . . . . . . . . . . . . . . . 24 4.4 Classification results . . . . . . . . . . . . . . 25 Chapter 5 Conclusion 29 Bibliography 30 Abstract in Korean 34Maste
    corecore