1,108 research outputs found

    Lucid Data Dreaming for Video Object Segmentation

    Full text link
    Convolutional networks reach top quality in pixel-level video object segmentation but require a large amount of training data (1k~100k) to deliver such results. We propose a new training strategy which achieves state-of-the-art results across three evaluation datasets while using 20x~1000x less annotated data than competing methods. Our approach is suitable for both single and multiple object segmentation. Instead of using large training sets hoping to generalize across domains, we generate in-domain training data using the provided annotation on the first frame of each video to synthesize ("lucid dream") plausible future video frames. In-domain per-video training data allows us to train high quality appearance- and motion-based models, as well as tune the post-processing stage. This approach allows to reach competitive results even when training from only a single annotated frame, without ImageNet pre-training. Our results indicate that using a larger training set is not automatically better, and that for the video object segmentation task a smaller training set that is closer to the target domain is more effective. This changes the mindset regarding how many training samples and general "objectness" knowledge are required for the video object segmentation task.Comment: Accepted in International Journal of Computer Vision (IJCV

    Going Deeper into Action Recognition: A Survey

    Full text link
    Understanding human actions in visual data is tied to advances in complementary research areas including object recognition, human dynamics, domain adaptation and semantic segmentation. Over the last decade, human action analysis evolved from earlier schemes that are often limited to controlled environments to nowadays advanced solutions that can learn from millions of videos and apply to almost all daily activities. Given the broad range of applications from video surveillance to human-computer interaction, scientific milestones in action recognition are achieved more rapidly, eventually leading to the demise of what used to be good in a short time. This motivated us to provide a comprehensive review of the notable steps taken towards recognizing human actions. To this end, we start our discussion with the pioneering methods that use handcrafted representations, and then, navigate into the realm of deep learning based approaches. We aim to remain objective throughout this survey, touching upon encouraging improvements as well as inevitable fallbacks, in the hope of raising fresh questions and motivating new research directions for the reader

    High Dimensional Low Rank plus Sparse Matrix Decomposition

    Full text link
    This paper is concerned with the problem of low rank plus sparse matrix decomposition for big data. Conventional algorithms for matrix decomposition use the entire data to extract the low-rank and sparse components, and are based on optimization problems with complexity that scales with the dimension of the data, which limits their scalability. Furthermore, existing randomized approaches mostly rely on uniform random sampling, which is quite inefficient for many real world data matrices that exhibit additional structures (e.g. clustering). In this paper, a scalable subspace-pursuit approach that transforms the decomposition problem to a subspace learning problem is proposed. The decomposition is carried out using a small data sketch formed from sampled columns/rows. Even when the data is sampled uniformly at random, it is shown that the sufficient number of sampled columns/rows is roughly O(r\mu), where \mu is the coherency parameter and r the rank of the low rank component. In addition, adaptive sampling algorithms are proposed to address the problem of column/row sampling from structured data. We provide an analysis of the proposed method with adaptive sampling and show that adaptive sampling makes the required number of sampled columns/rows invariant to the distribution of the data. The proposed approach is amenable to online implementation and an online scheme is proposed.Comment: IEEE Transactions on Signal Processin

    Temporal Model Adaptation for Person Re-Identification

    Full text link
    Person re-identification is an open and challenging problem in computer vision. Majority of the efforts have been spent either to design the best feature representation or to learn the optimal matching metric. Most approaches have neglected the problem of adapting the selected features or the learned model over time. To address such a problem, we propose a temporal model adaptation scheme with human in the loop. We first introduce a similarity-dissimilarity learning method which can be trained in an incremental fashion by means of a stochastic alternating directions methods of multipliers optimization procedure. Then, to achieve temporal adaptation with limited human effort, we exploit a graph-based approach to present the user only the most informative probe-gallery matches that should be used to update the model. Results on three datasets have shown that our approach performs on par or even better than state-of-the-art approaches while reducing the manual pairwise labeling effort by about 80%

    Statistical/Geometric Techniques for Object Representation and Recognition

    Get PDF
    Object modeling and recognition are key areas of research in computer vision and graphics with wide range of applications. Though research in these areas is not new, traditionally most of it has focused on analyzing problems under controlled environments. The challenges posed by real life applications demand for more general and robust solutions. The wide variety of objects with large intra-class variability makes the task very challenging. The difficulty in modeling and matching objects also vary depending on the input modality. In addition, the easy availability of sensors and storage have resulted in tremendous increase in the amount of data that needs to be processed which requires efficient algorithms suitable for large-size databases. In this dissertation, we address some of the challenges involved in modeling and matching of objects in realistic scenarios. Object matching in images require accounting for large variability in the appearance due to changes in illumination and view point. Any real world object is characterized by its underlying shape and albedo, which unlike the image intensity are insensitive to changes in illumination conditions. We propose a stochastic filtering framework for estimating object albedo from a single intensity image by formulating the albedo estimation as an image estimation problem. We also show how this albedo estimate can be used for illumination insensitive object matching and for more accurate shape recovery from a single image using standard shape from shading formulation. We start with the simpler problem where the pose of the object is known and only the illumination varies. We then extend the proposed approach to handle unknown pose in addition to illumination variations. We also use the estimated albedo maps for another important application, which is recognizing faces across age progression. Many approaches which address the problem of modeling and recognizing objects from images assume that the underlying objects are of diffused texture. But most real world objects exhibit a combination of diffused and specular properties. We propose an approach for separating the diffused and specular reflectance from a given color image so that the algorithms proposed for objects of diffused texture become applicable to a much wider range of real world objects. Representing and matching the 2D and 3D geometry of objects is also an integral part of object matching with applications in gesture recognition, activity classification, trademark and logo recognition, etc. The challenge in matching 2D/3D shapes lies in accounting for the different rigid and non-rigid deformations, large intra-class variability, noise and outliers. In addition, since shapes are usually represented as a collection of landmark points, the shape matching algorithm also has to deal with the challenges of missing or unknown correspondence across these data points. We propose an efficient shape indexing approach where the different feature vectors representing the shape are mapped to a hash table. For a query shape, we show how the similar shapes in the database can be efficiently retrieved without the need for establishing correspondence making the algorithm extremely fast and scalable. We also propose an approach for matching and registration of 3D point cloud data across unknown or missing correspondence using an implicit surface representation. Finally, we discuss possible future directions of this research
    • …
    corecore