52 research outputs found

    Robust Algorithms for Linear and Nonlinear Regression via Sparse Modeling Methods: Theory, Algorithms and Applications to Image Denoising

    Get PDF
    Η εύρωστη παλινδρόμηση κατέχει έναν πολύ σημαντικό ρόλο στην Επεξεργασία Σήματος, τη Στατιστική και τη Μηχανική Μάθηση. Συνήθεις εκτιμητές, όπως τα «Ελάχιστα Τετράγωνα», αποτυγχάνουν να εκτιμήσουν σωστά παραμέτρους, όταν στα δεδομένα υπεισέρχονται ακραίες παρατηρήσεις, γνωστές ως “outliers”. Το πρόβλημα αυτό είναι γνωστό εδώ και δεκαετίες, μέσα στις οποίες διάφορες μέθοδοι έχουν προταθεί. Παρόλα αυτά, το ενδιαφέρον της επιστημονικής κοινότητας για αυτό αναζωπυρώθηκε όταν επανεξετάστηκε υπό το πρίσμα της αραιής μοντελοποίησης και των αντίστοιχων τεχνικών, η οποία κυριαρχεί στον τομέα της μηχανικής μάθησης εδώ και δύο δεκαετίες. Αυτή είναι και η κατεύθυνση η οποία ακολουθήθηκε στην παρούσα διατριβή. Το αποτέλεσμα αυτής της εργασίας ήταν η ανάπτυξη μιας νέας προσέγγισης, βασισμένης σε άπληστες τεχνικές αραιής μοντελοποίησης. Το μοντέλο που υιοθετείται βασίζεται στην ανάλυση του θορύβου σε δύο συνιστώσες: α) μια για το συμβατικό (αναμενόμενο) θόρυβο και β) μια για τις ακραίες παρατηρήσεις (outliers), οι οποίες θεωρήθηκε ότι είναι λίγες (αραιές) σε σχέση με τον αριθμό των δεδομένων. Με βάση αυτή τη μοντελοποίηση και τον γνωστό άπληστο αλγόριθμο “Orthogonal Matching Pursuit” (OMP), δύο νέοι αλγόριθμοι αναπτύχθηκαν, ένας για το γραμμικό και ένας για το μη γραμμικό πρόβλημα της εύρωστης παλινδρόμησης. Ο προτεινόμενος αλγόριθμος για τη γραμμική παλινδρόμηση ονομάζεται “Greedy Algorithm for Robust Demoising” (GARD) και εναλλάσσει τα βήματά του μεταξύ της μεθόδου Ελαχίστων Τετραγώνων (LS) και της αναγνώρισης των ακραίων παρατηρήσεων, τεχνικής που βασίζεται στον OMP. Στη συνέχεια, ακολουθεί η σύγκριση της νέας μεθόδου με ανταγωνιστικές της. Συγκεκριμένα, από τα αποτελέσματα παρατηρείται ότι ο GARD: α) δείχνει ανοχή σε ακραίες τιμές (εύρωστος), β) καταφέρνει να προσεγγίσει τη λύση με πολύ μικρό λάθος και γ) απαιτεί μικρό υπολογιστικό κόστος. Επιπλέον, προκύπτουν σημαντικά θεωρητικά ευρήματα, τα οποία οφείλονται στην απλότητα της μεθόδου. Αρχικά, αποδεικνύεται ότι η μέθοδος συγκλίνει σε πεπερασμένο αριθμό βημάτων. Στη συνέχεια, η μελέτη επικεντρώνεται στην αναγνώριση των ακραίων παρατηρήσεων. Το γεγονός ότι η περίπτωση απουσίας συμβατικού θορύβου μελετήθηκε ξεχωριστά, οφείλεται κυρίως στα εξής: α) στην απλοποίηση απαιτητικών πράξεων και β) στην ανάδειξη σημαντικών γεωμετρικών ιδιοτήτων. Συγκεκριμένα, προέκυψε κατάλληλο φράγμα για τη σταθερά της συνθήκης «Περιορισμένης Ισομετρίας» (“Restricted Isometry Property” - (RIP)), το οποίο εξασφαλίζει ότι η ανάκτηση του σήματος μέσω του GARD είναι ακριβής (μηδενικό σφάλμα). Τέλος, για την περίπτωση όπου ακραίες τιμές και συμβατικός θόρυβος συνυπάρχουν και με την παραδοχή ότι το διάνυσμα του συμβατικού θορύβου είναι φραγμένο, προέκυψε μια αντίστοιχη συνθήκη η οποία εξασφαλίζει την ανάκτηση του φορέα του αραιού διανύσματος θορύβου (outliers). Δεδομένου ότι μια τέτοια συνθήκη ικανοποιείται, αποδείχθηκε ότι το σφάλμα προσέγγισης είναι φραγμένο και άρα ο εκτιμητής GARD ευσταθής. Για το πρόβλημα της εύρωστης μη γραμμικής παλινδρόμησης, θεωρείται, επιπλέον, ότι η άγνωστη μη γραμμική συνάρτηση ανήκει σε ένα χώρο Hilbert με αναπαραγωγικούς πυρήνες (RKHS). Λόγω της ύπαρξης ακραίων παρατηρήσεων, τεχνικές όπως το Kernel Ridge Regression (KRR) ή το Support Vector Regression (SVR) αποδεικνύονται ανεπαρκείς. Βασισμένοι στην προαναφερθείσα ανάλυση των συνιστωσών του θορύβου και χρησιμοποιώντας την τεχνική της αραιής μοντελοποίησης, πραγματοποιείται η εκτίμηση των ακραίων παρατηρήσεων σύμφωνα με τα βήματα μιας άπληστης επαναληπτικής διαδικασίας. Ο προτεινόμενος αλγόριθμος ονομάζεται “Kernel Greedy Algorithm for Robust Denoising” (KGARD), και εναλλάσσει τα βήματά μεταξύ ενός εκτιμητή KRR και της αναγνώρισης ακραίων παρατηρήσεων, με βάση τον OMP. Αναλύεται θεωρητικά η ικανότητα του αλγορίθμου να αναγνωρίσει τις πιθανές ακραίες παρατηρήσεις. Επιπλέον, ο αλγόριθμος KGARD συγκρίνεται με άλλες μεθόδους αιχμής μέσα από εκτεταμένο αριθμό πειραμάτων, όπου και παρατηρείται η σαφώς καλύτερη απόδοσή του. Τέλος, η προτεινόμενη μέθοδος για την εύρωστη παλινδρόμηση εφαρμόζεται στην αποθορύβωση εικόνας, όπου αναδεικνύονται τα σαφή πλεονεκτήματα της μεθόδου. Τα πειράματα επιβεβαιώνουν ότι ο αλγόριθμος KGARD βελτιώνει σημαντικά την διαδικασία της αποθορύβωσης, στην περίπτωση όπου στον θόρυβο υπεισέρχονται ακραίες παρατηρήσεις.The task of robust regression is of particular importance in signal processing, statistics and machine learning. Ordinary estimators, such as the Least Squares (LS) one, fail to achieve sufficiently good performance in the presence of outliers. Although the problem has been addressed many decades ago and several methods have been established, it has recently attracted more attention in the context of sparse modeling and sparse optimization techniques. The latter is the line that has been followed in the current dissertation. The reported research, led to the development of a novel approach in the context of greedy algorithms. The model adopts the decomposition of the noise into two parts: a) the inlier noise and b) the outliers, which are explicitly modeled by employing sparse modeling arguments. Based on this rationale and inspired by the popular Orthogonal Matching Pursuit (OMP), two novel efficient greedy algorithms are established, one for the linear and another one for the nonlinear robust regression task. The proposed algorithm for the linear task, i.e., Greedy Algorithm for Robust Denoising (GARD), alternates between a Least Squares (LS) optimization criterion and an OMP selection step, that identifies the outliers. The method is compared against state-of-the-art methods through extensive simulations and the results demonstrate that: a) it exhibits tolerance in the presence of outliers, i.e., robustness, b) it attains a very low approximation error and c) it has relatively low computational requirements. Moreover, due to the simplicity of the method, a number of related theoretical properties are derived. Initially, the convergence of the method in a finite number of iteration steps is established. Next, the focus of the theoretical analysis is turned on the identification of the outliers. The case where only outliers are present has been studied separately; this is mainly due to the following reasons: a) the simplification of technically demanding algebraic manipulations and b) the “articulation” of the method’s interesting geometrical properties. In particular, a bound based on the Restricted Isometry Property (RIP) constant guarantees that the recovery of the signal via GARD is exact (zero error). Finally, for the case where outliers as well as inlier noise coexist, and by assuming that the inlier noise vector is bounded, a similar condition that guarantees the recovery of the support for the sparse outlier vector is derived. If such a condition is satisfied, then it is shown that the approximation error is bounded, and thus the denoising estimator is stable. For the robust nonlinear regression task, it is assumed that the unknown nonlinear function belongs to a Reproducing Kernel Hilbert Space (RKHS). Due to the existence of outliers, common techniques such as the Kernel Ridge Regression (KRR), or the Support Vector Regression (SVR) turn out to be inadequate. By employing the aforementioned noise decomposition, sparse modeling arguments are employed so that the outliers are estimated according to the greedy approach. The proposed robust scheme, i.e., Kernel Greedy Algorithm for Robust Denoising (KGARD), alternates between a KRR task and an OMP-like selection step. Theoretical results regarding the identification of the outliers are provided. Moreover, KGARD is compared against other cutting edge methods via extensive simulations, where its enhanced performance is demonstrated. Finally, the proposed robust estimation framework is applied to the task of image denoising, where the advantages of the proposed method are unveiled. The experiments verify that KGARD improves the denoising process significantly, when outliers are present

    Optimization with Sparsity-Inducing Penalties

    Get PDF
    Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. They were first dedicated to linear variable selection but numerous extensions have now emerged such as structured sparsity or kernel selection. It turns out that many of the related estimation problems can be cast as convex optimization problems by regularizing the empirical risk with appropriate non-smooth norms. The goal of this paper is to present from a general perspective optimization tools and techniques dedicated to such sparsity-inducing penalties. We cover proximal methods, block-coordinate descent, reweighted 2\ell_2-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provide an extensive set of experiments to compare various algorithms from a computational point of view

    Kernel methods in machine learning

    Full text link
    We review machine learning methods employing positive definite kernels. These methods formulate learning and estimation problems in a reproducing kernel Hilbert space (RKHS) of functions defined on the data domain, expanded in terms of a kernel. Working in linear spaces of function has the benefit of facilitating the construction and analysis of learning algorithms while at the same time allowing large classes of functions. The latter include nonlinear functions as well as functions defined on nonvectorial data. We cover a wide range of methods, ranging from binary classifiers to sophisticated methods for estimation with structured data.Comment: Published in at http://dx.doi.org/10.1214/009053607000000677 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Subspace Representations and Learning for Visual Recognition

    Get PDF
    Pervasive and affordable sensor and storage technology enables the acquisition of an ever-rising amount of visual data. The ability to extract semantic information by interpreting, indexing and searching visual data is impacting domains such as surveillance, robotics, intelligence, human- computer interaction, navigation, healthcare, and several others. This further stimulates the investigation of automated extraction techniques that are more efficient, and robust against the many sources of noise affecting the already complex visual data, which is carrying the semantic information of interest. We address the problem by designing novel visual data representations, based on learning data subspace decompositions that are invariant against noise, while being informative for the task at hand. We use this guiding principle to tackle several visual recognition problems, including detection and recognition of human interactions from surveillance video, face recognition in unconstrained environments, and domain generalization for object recognition.;By interpreting visual data with a simple additive noise model, we consider the subspaces spanned by the model portion (model subspace) and the noise portion (variation subspace). We observe that decomposing the variation subspace against the model subspace gives rise to the so-called parity subspace. Decomposing the model subspace against the variation subspace instead gives rise to what we name invariant subspace. We extend the use of kernel techniques for the parity subspace. This enables modeling the highly non-linear temporal trajectories describing human behavior, and performing detection and recognition of human interactions. In addition, we introduce supervised low-rank matrix decomposition techniques for learning the invariant subspace for two other tasks. We learn invariant representations for face recognition from grossly corrupted images, and we learn object recognition classifiers that are invariant to the so-called domain bias.;Extensive experiments using the benchmark datasets publicly available for each of the three tasks, show that learning representations based on subspace decompositions invariant to the sources of noise lead to results comparable or better than the state-of-the-art

    Advanced Multilinear Data Analysis and Sparse Representation Approaches and Their Applications

    Get PDF
    Multifactor analysis plays an important role in data analysis since most real-world datasets usually exist with a combination of numerous factors. These factors are usually not independent but interdependent together. Thus, it is a mistake if a method only considers one aspect of the input data while ignoring the others. Although widely used, Multilinear PCA (MPCA), one of the leading multilinear analysis methods, still suffers from three major drawbacks. Firstly, it is very sensitive to outliers and noise and unable to cope with missing values. Secondly, since MPCA deals with huge multidimensional datasets, it is usually computationally expensive. Finally, it loses original local geometry structures due to the averaging process. This thesis sheds new light on the tensor decomposition problem via the ideas of fast low-rank approximation in random projection and tensor completion in compressed sensing. We propose a novel approach called Compressed Submanifold Multifactor Analysis (CSMA) to solve the three problems mentioned above. Our approach is able to deal with the problem of missing values and outliers via our proposed novel sparse Higher-order Singular Value Decomposition approach, named HOSVD-L1 decomposition. The Random Projection method is used to obtain the fast low-rank approximation of a given multifactor dataset. In addition, our method can preserve geometry of the original data. In the second part of this thesis, we present a novel pattern classification approach named Sparse Class-dependent Feature Analysis (SCFA), to connect the advantages of sparse representation in an overcomplete dictionary, with a powerful nonlinear classifier. The classifier is based on the estimation of class-specific optimal filters, by solving an L1-norm optimization problem using the Alternating Direction Method of Multipliers. Our method as well as its Reproducing Kernel Hilbert Space (RKHS) version is tolerant to the presence of noise and other variations in an image. Our proposed methods achieve very high classification accuracies in face recognition on two challenging face databases, i.e. the CMU Pose, Illumination and Expression (PIE) database and the Extended YALE-B that exhibit pose and illumination variations; and the AR database that has occluded images. In addition, they also exhibit robustness on other evaluation modalities, such as object classification on the Caltech101 database. Our method outperforms state-of-the-art methods on all these databases and hence they show their applicability to general computer vision and pattern recognition problems

    Improving the Practicality of Model-Based Reinforcement Learning: An Investigation into Scaling up Model-Based Methods in Online Settings

    Get PDF
    This thesis is a response to the current scarcity of practical model-based control algorithms in the reinforcement learning (RL) framework. As of yet there is no consensus on how best to integrate imperfect transition models into RL whilst mitigating policy improvement instabilities in online settings. Current state-of-the-art policy learning algorithms that surpass human performance often rely on model-free approaches that enjoy unmitigated sampling of transition data. Model-based RL (MBRL) instead attempts to distil experience into transition models that allow agents to plan new policies without needing to return to the environment and sample more data. The initial focus of this investigation is on kernel conditional mean embeddings (CMEs) (Song et al., 2009) deployed in an approximate policy iteration (API) algorithm (Grünewälder et al., 2012a). This existing MBRL algorithm boasts theoretically stable policy updates in continuous state and discrete action spaces. The Bellman operator’s value function and (transition) conditional expectation are modelled and embedded respectively as functions in a reproducing kernel Hilbert space (RKHS). The resulting finite-induced approximate pseudo-MDP (Yao et al., 2014a) can be solved exactly in a dynamic programming algorithm with policy improvement suboptimality guarantees. However model construction and policy planning scale cubically and quadratically respectively with the training set size, rendering the CME impractical for sampleabundant tasks in online settings. Three variants of CME API are investigated to strike a balance between stable policy updates and reduced computational complexity. The first variant models the value function and state-action representation explicitly in a parametric CME (PCME) algorithm with favourable computational complexity. However a soft conservative policy update technique is developed to mitigate policy learning oscillations in the planning process. The second variant returns to the non-parametric embedding and contributes (along with external work) to the compressed CME (CCME); a sparse and computationally more favourable CME. The final variant is a fully end-to-end differentiable embedding trained with stochastic gradient updates. The value function remains modelled in an RKHS such that backprop is driven by a non-parametric RKHS loss function. Actively compressed CME (ACCME) satisfies the pseudo-MDP contraction constraint using a sparse softmax activation function. The size of the pseudo-MDP (i.e. the size of the embedding’s last layer) is controlled by sparsifying the last layer weight matrix by extending the truncated gradient method (Langford et al., 2009) with group lasso updates in a novel ‘use it or lose it’ neuron pruning mechanism. Surprisingly this technique does not require extensive fine-tuning between control tasks

    New methods for deep dictionary learning and for image completion

    Get PDF
    Digital imaging plays an essential role in many aspects of our daily life. However due to the hardware limitations of the imaging devices, the image measurements are usually inpaired and require further processing to enhance the quality of the raw images in order to enable applications on the user side. Image enhancement aims to improve the information content within image measurements by exploiting the properties of the target image and the forward model of the imaging device. In this thesis, we aim to tackle two specific image enhancement problems, that is, single image super-resolution and image completion. First, we present a new Deep Analysis Dictionary Model (DeepAM) which consists of multiple layers of analysis dictionaries with associated soft-thresholding operators and a single layer of synthesis dictionary for single image super-resolution. To achieve an effective deep model, each analysis dictionary has been designed to be composed of an Information Preserving Analysis Dictionary (IPAD) which passes essential information from the input signal to output and a Clustering Analysis Dictionary (CAD) which generates discriminative feature representation. The parameters of the deep analysis dictionary model are optimized using a layer-wise learning strategy. We demonstrate that both the proposed deep dictionary design and the learning algorithm are effective. Simulation results show that the proposed method achieves comparable performance with Deep Neural Networks and other existing methods. We then generalize DeepAM to a Deep Convolutional Analysis Dictionary Model (DeepCAM) by learning convolutional dictionaries instead of unstructured dictionaries. The convolutional dictionary is more suitable for processing high-dimensional signals like images and has only a small number of free parameters. By exploiting the properties of a convolutional dictionary, we present an efficient convolutional analysis dictionary learning algorithm. The IPAD and the CAD parts are learned using variations of the proposed convolutional analysis dictionary learning algorithm. We demonstrate that DeepCAM is an effective multi-layer convolutional model and achieves better performance than DeepAM while using a smaller number of parameters. Finally, we present an image completion algorithm based on dense correspondence between the input image and an exemplar image retrieved from Internet which has been taken at a similar position. The dense correspondence which is estimated using a hierarchical PatchMatch algorithm is usually noisy and with a large occlusion area corresponding to the region to be completed. By modelling the dense correspondence as a smooth field, an Expectation-Maximization (EM) based method is presented to interpolate a smooth field over the occlusion area which is then used to transfer image content from the exemplar image to the input image. Color correction is further applied to diminish the possible color differences between the input image and the exemplar image. Numerical results demonstrate that the proposed image completion algorithm is able to achieve photo realistic image completion results.Open Acces
    corecore