111 research outputs found

    Sparsity Analysis for Computer Vision Applications

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Unsupervised Learning from Shollow to Deep

    Get PDF
    Machine learning plays a pivotal role in most state-of-the-art systems in many application research domains. With the rising of deep learning, massive labeled data become the solution of feature learning, which enables the model to learn automatically. Unfortunately, the trained deep learning model is hard to adapt to other datasets without fine-tuning, and the applicability of machine learning methods is limited by the amount of available labeled data. Therefore, the aim of this thesis is to alleviate the limitations of supervised learning by exploring algorithms to learn good internal representations, and invariant feature hierarchies from unlabelled data. Firstly, we extend the traditional dictionary learning and sparse coding algorithms onto hierarchical image representations in a principled way. To achieve dictionary atoms capture additional information from extended receptive fields and attain improved descriptive capacity, we present a two-pass multi-resolution cascade framework for dictionary learning and sparse coding. This cascade method allows collaborative reconstructions at different resolutions using only the same dimensional dictionary atoms. The jointly learned dictionary comprises atoms that adapt to the information available at the coarsest layer, where the support of atoms reaches a maximum range, and the residual images, where the supplementary details refine progressively a reconstruction objective. Our method generates flexible and accurate representations using only a small number of coefficients, and is efficient in computation. In the following work, we propose to incorporate the traditional self-expressiveness property into deep learning to explore better representation for subspace clustering. This architecture is built upon deep auto-encoders, which non-linearly map the input data into a latent space. Our key idea is to introduce a novel self-expressive layer between the encoder and the decoder to mimic the ``self-expressiveness'' property that has proven effective in traditional subspace clustering. Being differentiable, our new self-expressive layer provides a simple but effective way to learn pairwise affinities between all data points through a standard back-propagation procedure. Being nonlinear, our neural-network based method is able to cluster data points having complex (often nonlinear) structures. However, Subspace clustering algorithms are notorious for their scalability issues because building and processing large affinity matrices are demanding. We propose two methods to tackle this problem. One method is based on kk-Subspace Clustering, where we introduce a method that simultaneously learns an embedding space along subspaces within it to minimize a notion of reconstruction error, thus addressing the problem of subspace clustering in an end-to-end learning paradigm. This in turn frees us from the need of having an affinity matrix to perform clustering. The other way starts from using a feed forward network to replace the spectral clustering and learn the affinities of each data from "self-expressive" layer. We introduce the Neural Collaborative Subspace Clustering, where it benefits from a classifier which determines whether a pair of points lies on the same subspace under supervision of "self-expressive" layer. Essential to our model is the construction of two affinity matrices, one from the classifier and the other from a notion of subspace self-expressiveness, to supervise training in a collaborative scheme. In summary, we make constributions on how to perform the unsupervised learning in several tasks in this thesis. It starts from traditional sparse coding and dictionary learning perspective in low-level vision. Then, we exploit how to incorporate unsupervised learning in convolutional neural networks without label information and make subspace clustering to large scale dataset. Furthermore, we also extend the clustering on dense prediction task (saliency detection)

    Extracting Information from Multimodal Remote Sensing Data for Sea Ice Characterization

    Get PDF
    Remote sensing is the discipline that studies acquisition, preparation and analysis of spectral, spatial and temporal properties of objects without direct touch or contact. It is a field of great importance to understanding the climate system and its changes, as well as for conducting operations in the Arctic. A current challenge however is that most sensory equipment can only capture one or fewer of the characteristics needed to accurately describe ground objects through their temporal, spatial, spectral and radiometric resolution characteristics. This in turn motivates the fusing of complimentary modalities for potentially improved accuracy and stability in analysis but it also leads to problems when trying to merge heterogeneous data with different statistical, geometric and physical qualities. Another concern in the remote sensing of arctic regions is the scarcity of high quality labeled data but simultaneous abundance of unlabeled data as the gathering of labeled data can be both costly and time consuming. It could therefore be of great value to explore routes that can automate this process in ways that target both the situation regarding available data and the difficulties from fusing of heterogeneous multimodal data. To this end Semi-Supervised methods were considered for their ability to leverage smaller amounts of carefully labeled data in combination with more widely available unlabeled data in achieving greater classification performance. Strengths and limitations of three algorithms for real life applications are assessed through experiments on datasets from arctic and urban areas. The first two algorithms, Deep Semi-Supervised Label Propagation (LP) and MixMatch Holistic SSL (MixMatch), consider simultaneous processing of multimodal remote sensing data with additional extracted Gray Level Co-occurrence Matrix texture features for image classification. LP trains in alternating steps of supervised learning on potentially pseudolabeled data and steps of deciding new labels through node propagation while MixMatch mixes loss terms from several leading algorithms to gain their respective benefits. Another method, Graph Fusion Merriman Bence Osher (GMBO), explores processing of modalities in parallel by constructing a fused graph from complimentary input modalities and Ginzburg-Landau minimization on an approximated Graph Laplacian. Results imply that inclusion of extracted GLCM features could be beneficial for classification of multimodal remote sensing data, and that GMBO has merits for operational use in the Arctic given that certain data prerequisites are met

    Model-driven and Data-driven Approaches for some Object Recognition Problems

    Get PDF
    Recognizing objects from images and videos has been a long standing problem in computer vision. The recent surge in the prevalence of visual cameras has given rise to two main challenges where, (i) it is important to understand different sources of object variations in more unconstrained scenarios, and (ii) rather than describing an object in isolation, efficient learning methods for modeling object-scene `contextual' relations are required to resolve visual ambiguities. This dissertation addresses some aspects of these challenges, and consists of two parts. First part of the work focuses on obtaining object descriptors that are largely preserved across certain sources of variations, by utilizing models for image formation and local image features. Given a single instance of an object, we investigate the following three problems. (i) Representing a 2D projection of a 3D non-planar shape invariant to articulations, when there are no self-occlusions. We propose an articulation invariant distance that is preserved across piece-wise affine transformations of a non-rigid object `parts', under a weak perspective imaging model, and then obtain a shape context-like descriptor to perform recognition; (ii) Understanding the space of `arbitrary' blurred images of an object, by representing an unknown blur kernel of a known maximum size using a complete set of orthonormal basis functions spanning that space, and showing that subspaces resulting from convolving a clean object and its blurred versions with these basis functions are equal under some assumptions. We then view the invariant subspaces as points on a Grassmann manifold, and use statistical tools that account for the underlying non-Euclidean nature of the space of these invariants to perform recognition across blur; (iii) Analyzing the robustness of local feature descriptors to different illumination conditions. We perform an empirical study of these descriptors for the problem of face recognition under lighting change, and show that the direction of image gradient largely preserves object properties across varying lighting conditions. The second part of the dissertation utilizes information conveyed by large quantity of data to learn contextual information shared by an object (or an entity) with its surroundings. (i) We first consider a supervised two-class problem of detecting lane markings from road video sequences, where we learn relevant feature-level contextual information through a machine learning algorithm based on boosting. We then focus on unsupervised object classification scenarios where, (ii) we perform clustering using maximum margin principles, by deriving some basic properties on the affinity of `a pair of points' belonging to the same cluster using the information conveyed by `all' points in the system, and (iii) then consider correspondence-free adaptation of statistical classifiers across domain shifting transformations, by generating meaningful `intermediate domains' that incrementally convey potential information about the domain change

    Analysis of human motion with vision systems: kinematic and dynamic parameters estimation

    Get PDF
    This work presents a multicamera motion capture system able to digitize, measure and analyse the human motion. Key feature of this system is an easy wearable garment printed with a color coded pattern. The pattern of coloured markers allows simultaneous reconstruction of shape and motion of the subject. With the information gathered we can also estimate both kinematic and dynamic motion parameters. In the framework of this research we developed algorithms to: design the color coded pattern, perform 3D shape reconstruction, estimate kinematic and dynamic motion parameters and calibrate the multi-camera system. We paid particular attention to estimate the uncertainty of the kinematics parameters, also comparing the results obtained with commercial systems. The work presents also an overview of some real-world application in which the developed system has been used as measurement tool

    Differential Tracking through Sampling and Linearizing the Local Appearance Manifold

    Get PDF
    Recovering motion information from input camera image sequences is a classic problem of computer vision. Conventional approaches estimate motion from either dense optical flow or sparse feature correspondences identified across successive image frames. Among other things, performance depends on the accuracy of the feature detection, which can be problematic in scenes that exhibit view-dependent geometric or photometric behaviors such as occlusion, semitransparancy, specularity and curved reflections. Beyond feature measurements, researchers have also developed approaches that directly utilize appearance (intensity) measurements. Such appearance-based approaches eliminate the need for feature extraction and avoid the difficulty of identifying correspondences. However the simplicity of on-line processing of image features is usually traded for complexity in off-line modeling of the appearance function. Because the appearance function is typically very nonlinear, learning it usually requires an impractically large number of training samples. I will present a novel appearance-based framework that can be used to estimate rigid motion in a manner that is computationally simple and does not require global modeling of the appearance function. The basic idea is as follows. An n-pixel image can be considered as a point in an n-dimensional appearance space. When an object in the scene or the camera moves, the image point moves along a low-dimensional appearance manifold. While globally nonlinear, the appearance manifold can be locally linearized using a small number of nearby image samples. This linear approximation of the local appearance manifold defines a mapping between the images and the underlying motion parameters, allowing the motion estimation to be formulated as solving a linear system. I will address three key issues related to motion estimation: how to acquire local appearance samples, how to derive a local linear approximation given appearance samples, and whether the linear approximation is sufficiently close to the real local appearance manifold. In addition I will present a novel approach to motion segmentation that utilizes the same appearance-based framework to classify individual image pixels into groups associated with different underlying rigid motions

    Understanding the role of phase function in translucent appearance

    Get PDF
    Multiple scattering contributes critically to the characteristic translucent appearance of food, liquids, skin, and crystals; but little is known about how it is perceived by human observers. This article explores the perception of translucency by studying the image effects of variations in one factor of multiple scattering: the phase function. We consider an expanded space of phase functions created by linear combinations of Henyey-Greenstein and von Mises-Fisher lobes, and we study this physical parameter space using computational data analysis and psychophysics. Our study identifies a two-dimensional embedding of the physical scattering parameters in a perceptually meaningful appearance space. Through our analysis of this space, we find uniform parameterizations of its two axes by analytical expressions of moments of the phase function, and provide an intuitive characterization of the visual effects that can be achieved at different parts of it. We show that our expansion of the space of phase functions enlarges the range of achievable translucent appearance compared to traditional single-parameter phase function models. Our findings highlight the important role phase function can have in controlling translucent appearance, and provide tools for manipulating its effect in material design applications.National Institutes of Health (U.S.) (Award R01-EY019262-02)National Institutes of Health (U.S.) (Award R21-EY019741-02
    corecore