15 research outputs found

    An Efficient Alternating Riemannian/Projected Gradient Descent Ascent Algorithm for Fair Principal Component Analysis

    Full text link
    Fair principal component analysis (FPCA), a ubiquitous dimensionality reduction technique in signal processing and machine learning, aims to find a low-dimensional representation for a high-dimensional dataset in view of fairness. The FPCA problem involves optimizing a non-convex and non-smooth function over the Stiefel manifold. The state-of-the-art methods for solving the problem are subgradient methods and semidefinite relaxation-based methods. However, these two types of methods have their obvious limitations and thus are only suitable for efficiently solving the FPCA problem in special scenarios. This paper aims at developing efficient algorithms for solving the FPCA problem in general, especially large-scale, settings. In this paper, we first transform FPCA into a smooth non-convex linear minimax optimization problem over the Stiefel manifold. To solve the above general problem, we propose an efficient alternating Riemannian/projected gradient descent ascent (ARPGDA) algorithm, which performs a Riemannian gradient descent step and an ordinary projected gradient ascent step at each iteration. We prove that ARPGDA can find an Δ\varepsilon-stationary point of the above problem within O(Δ−3)\mathcal{O}(\varepsilon^{-3}) iterations. Simulation results show that, compared with the state-of-the-art methods, our proposed ARPGDA algorithm can achieve a better performance in terms of solution quality and speed for solving the FPCA problems.Comment: 5 pages, 8 figures, submitted for possible publicatio

    Geometry Aware Deep Metric Learning

    Get PDF
    A diverse range of applications in computer vision benefit from the data representations which are dense and compact, yet discriminative enough to learn the subtle changes in the data. Such representation learning seems necessary especially in the Zero Shot Learning applications where the train and the test classes are mutually exclusive. In other words, the learned representations should be discriminative enough to identify the minute cues in the data samples such that the unseen data can be properly categorized accordingly. With the advent of Deep Neural Networks over the last few years, several metric learning algorithms have been developed to address the aforementioned challenging objective. These algorithms learn the embedding space whilst considering the relative similarity/dissimilarity relationships between the data points across the various classes. Although successful, they suffer from a number of serious drawbacks, some of which have been addressed in this thesis. As our first objective, we extended two popular optimizers, namely Stochastic Gradient Descent with Momentum (SGD-M) and RMSProp, to their respective Riemannian counterparts. Such extension deems necessary while trying to optimize a model under the constrained problem settings. Our proposal reaps the benefits of standard manifold operations while optimizing the parameters of the network that are constrained to reside on a Riemannian manifold. The experimental evaluations vividly showed that the constrained optimizers clearly outperform their non-constrained equivalents over a wide range of datasets and application settings with regards to the improved learning of the embedding space. We then turn our attention to the general training protocol of Siamese Neural Networks (SiNNs), and address a major yet obvious drawback in its training practice. SiNNs are characterized by a Positive Semi Definite (PSD) matrix M which is invariant to the action of the orthogonal group O(p); thereby resulting in an equivalence class of solutions for M. Taking such invariances into account, we proposed a novel matrix manifold qConv and used it along with the popular Stiefel manifold to exploit the invariances in the siamese networks. We made use of our constrained optimizers to optimize over these two manifolds. Our empirical evaluations clearly showed that the training of SiNNs benefit by invoking such geometrical constraints over the search space whilst making use of such invariances inherent in SiNNs. As our final contribution, we designed and developed a novel, yet effective loss function that incorporates class-wise dissimilarity relationships in learning a discriminative embedding space. Such class-wise dissimilarity relationships have not been considered in the loss functions developed till now; thereby resulting in learning of a sub-optimal embedding space. Hence, we have integrated and maximized such dissimilarity constraints using two standard variants of Sinkhorn Divergences. Further, our experimental evaluations signified the importance of enforcing such constraints in learning a superior embedding space in the presence and absence of noise

    Deep Grassmann Manifold Optimization for Computer Vision

    Get PDF
    In this work, we propose methods that advance four areas in the field of computer vision: dimensionality reduction, deep feature embeddings, visual domain adaptation, and deep neural network compression. We combine concepts from the fields of manifold geometry and deep learning to develop cutting edge methods in each of these areas. Each of the methods proposed in this work achieves state-of-the-art results in our experiments. We propose the Proxy Matrix Optimization (PMO) method for optimization over orthogonal matrix manifolds, such as the Grassmann manifold. This optimization technique is designed to be highly flexible enabling it to be leveraged in many situations where traditional manifold optimization methods cannot be used. We first use PMO in the field of dimensionality reduction, where we propose an iterative optimization approach to Principal Component Analysis (PCA) in a framework called Proxy Matrix optimization based PCA (PM-PCA). We also demonstrate how PM-PCA can be used to solve the general LpL_p-PCA problem, a variant of PCA that uses arbitrary fractional norms, which can be more robust to outliers. We then present Cascaded Projection (CaP), a method which uses tensor compression based on PMO, to reduce the number of filters in deep neural networks. This, in turn, reduces the number of computational operations required to process each image with the network. Cascaded Projection is the first end-to-end trainable method for network compression that uses standard backpropagation to learn the optimal tensor compression. In the area of deep feature embeddings, we introduce Deep Euclidean Feature Representations through Adaptation on the Grassmann manifold (DEFRAG), that leverages PMO. The DEFRAG method improves the feature embeddings learned by deep neural networks through the use of auxiliary loss functions and Grassmann manifold optimization. Lastly, in the area of visual domain adaptation, we propose the Manifold-Aligned Label Transfer for Domain Adaptation (MALT-DA) to transfer knowledge from samples in a known domain to an unknown domain based on cross-domain cluster correspondences

    Distribution-Matching Embedding for Visual Domain Adaptation

    Get PDF
    Domain-invariant representations are key to addressing the domain shift problem where the training and test examples follow different distributions. Existing techniques that have attempted to match the distributions of the source and target domains typically compare these distributions in the original feature space. This space, however, may not be directly suitable for such a comparison, since some of the features may have been distorted by the domain shift, or may be domain specific. In this paper, we introduce a Distribution-Matching Embedding approach: An unsupervised domain adaptation method that overcomes this issue by mapping the data to a latent space where the distance between the empirical distributions of the source and target examples is minimized. In other words, we seek to extract the information that is invariant across the source and target data. In particular, we study two different distances to compare the source and target distributions: the Maximum Mean Discrepancy and the Hellinger distance. Furthermore, we show that our approach allows us to learn either a linear embedding, or a nonlinear one. We demonstrate the benefits of our approach on the tasks of visual object recognition, text categorization, and WiFi localization

    Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint

    Full text link
    Fair Principal Component Analysis (PCA) is a problem setting where we aim to perform PCA while making the resulting representation fair in that the projected distributions, conditional on the sensitive attributes, match one another. However, existing approaches to fair PCA have two main problems: theoretically, there has been no statistical foundation of fair PCA in terms of learnability; practically, limited memory prevents us from using existing approaches, as they explicitly rely on full access to the entire data. On the theoretical side, we rigorously formulate fair PCA using a new notion called \emph{probably approximately fair and optimal} (PAFO) learnability. On the practical side, motivated by recent advances in streaming algorithms for addressing memory limitation, we propose a new setting called \emph{fair streaming PCA} along with a memory-efficient algorithm, fair noisy power method (FNPM). We then provide its {\it statistical} guarantee in terms of PAFO-learnability, which is the first of its kind in fair PCA literature. Lastly, we verify the efficacy and memory efficiency of our algorithm on real-world datasets.Comment: 42 pages, 5 figures, 4 tables. Accepted to the 37th Conference on Neural Information Processing Systems (NeurIPS 2023

    Learning Fair Representations with High-Confidence Guarantees

    Full text link
    Representation learning is increasingly employed to generate representations that are predictive across multiple downstream tasks. The development of representation learning algorithms that provide strong fairness guarantees is thus important because it can prevent unfairness towards disadvantaged groups for all downstream prediction tasks. To prevent unfairness towards disadvantaged groups in all downstream tasks, it is crucial to provide representation learning algorithms that provide fairness guarantees. In this paper, we formally define the problem of learning representations that are fair with high confidence. We then introduce the Fair Representation learning with high-confidence Guarantees (FRG) framework, which provides high-confidence guarantees for limiting unfairness across all downstream models and tasks, with user-defined upper bounds. After proving that FRG ensures fairness for all downstream models and tasks with high probability, we present empirical evaluations that demonstrate FRG's effectiveness at upper bounding unfairness for multiple downstream models and tasks

    Domain Adaptation and Domain Generalization with Representation Learning

    No full text
    Machine learning has achieved great successes in the area of computer vision, especially in object recognition or classification. One of the core factors of the successes is the availability of massive labeled image or video data for training, collected manually by human. Labeling source training data, however, can be expensive and time consuming. Furthermore, a large amount of labeled source data may not always guarantee traditional machine learning techniques to generalize well; there is a potential bias or mismatch in the data, i.e., the training data do not represent the target environment. To mitigate the above dataset bias/mismatch, one can consider domain adaptation: utilizing labeled training data and unlabeled target data to develop a well-performing classifier on the target environment. In some cases, however, the unlabeled target data are nonexistent, but multiple labeled sources of data exist. Such situations can be addressed by domain generalization: using multiple source training sets to produce a classifier that generalizes on the unseen target domain. Although several domain adaptation and generalization approaches have been proposed, the domain mismatch in object recognition remains a challenging, open problem – the model performance has yet reached to a satisfactory level in real world applications. The overall goal of this thesis is to progress towards solving dataset bias in visual object recognition through representation learning in the context of domain adaptation and domain generalization. Representation learning is concerned with finding proper data representations or features via learning rather than via engineering by human experts. This thesis proposes several representation learning solutions based on deep learning and kernel methods. This thesis introduces a robust-to-noise deep neural network for handwritten digit classification trained on “clean” images only, which we name Deep Hybrid Network (DHN). DHNs are based on a particular combination of sparse autoencoders and restricted Boltzmann machines. The results show that DHN performs better than the standard deep neural network in recognizing digits with Gaussian and impulse noise, block and border occlusions. This thesis proposes the Domain Adaptive Neural Network (DaNN), a neural network based domain adaptation algorithm that minimizes the classification error and the domain discrepancy between the source and target data representations. The experiments show the competitiveness of DaNN against several state-of-the-art methods on a benchmark object dataset. This thesis develops the Multi-task Autoencoder (MTAE), a domain generalization algorithm based on autoencoders trained via multi-task learning. MTAE learns to transform the original image into its analogs in multiple related domains simultaneously. The results show that the MTAE’s representations provide better classification performance than some alternative autoencoder-based models as well as the current state-of-the-art domain generalization algorithms. This thesis proposes a fast kernel-based representation learning algorithm for both domain adaptation and domain generalization, Scatter Component Analysis (SCA). SCA finds a data representation that trades between maximizing the separability of classes, minimizing the mismatch between domains, and maximizing the separability of the whole data points. The results show that SCA performs much faster than some competitive algorithms, while providing state-of-the-art accuracy in both domain adaptation and domain generalization. Finally, this thesis presents the Deep Reconstruction-Classification Network (DRCN), a deep convolutional network for domain adaptation. DRCN learns to classify labeled source data and also to reconstruct unlabeled target data via a shared encoding representation. The results show that DRCN provides competitive or better performance than the prior state-of-the-art model on several cross-domain object datasets

    Compressive learning: new models and applications

    Get PDF
    Today’s world is fuelled by data. From self-driving cars through to agriculture, massive amounts of data are used to fit learning models to provide valuable insights and predictions. Such insights come at a significant price as many traditional learning procedures have both memory and computational costs that scale with the size of the data. This quickly becomes prohibitive, even when substantial resources are available. A new way of learning is therefore needed to allow for efficient model fitting in the 21st century. The birth of compressive learning in recent years has provided a novel solution to the bottleneck of learning from big data. Situated at the core of the compressive learning framework is the construction of a so-called sketch. The sketch is a compact representation of the data that provides sufficient information for specific learning tasks. In this thesis we develop the compressive learning framework to a host of new models and applications. In the first part of the thesis, we consider the group of semi-parametric models and demonstrate the unique advantages and challenges associated with creating a compressive learning paradigm for these particular models. Concentrating on the independent component analysis model, we develop a framework of algorithms and theory enabling magnitudes of compression with respect to memory complexity compared to existing methods. In the second part of the thesis, we develop a compressive learning framework to the emerging technology of single-photon counting lidar. We demonstrate that forming a sketch of the time-of-flight data circumvents the inherent data-transfer bottleneck of existing lidar techniques. Finally, we extend the compressive lidar technology by developing both an efficient sketch-based detection algorithm that can detect the presence of a surface solely from the sketch and a sketched plug and play framework that can integrate existing powerful denoisers that are robust to noisy lidar scenes with low photon counts

    Semi-supervised and unsupervised kernel-based novelty detection with application to remote sensing images

    Get PDF
    The main challenge of new information technologies is to retrieve intelligible information from the large volume of digital data gathered every day. Among the variety of existing data sources, the satellites continuously observing the surface of the Earth are key to the monitoring of our environment. The new generation of satellite sensors are tremendously increasing the possibilities of applications but also increasing the need for efficient processing methodologies in order to extract information relevant to the users' needs in an automatic or semi-automatic way. This is where machine learning comes into play to transform complex data into simplified products such as maps of land-cover changes or classes by learning from data examples annotated by experts. These annotations, also called labels, may actually be difficult or costly to obtain since they are established on the basis of ground surveys. As an example, it is extremely difficult to access a region recently flooded or affected by wildfires. In these situations, the detection of changes has to be done with only annotations from unaffected regions. In a similar way, it is difficult to have information on all the land-cover classes present in an image while being interested in the detection of a single one of interest. These challenging situations are called novelty detection or one-class classification in machine learning. In these situations, the learning phase has to rely only on a very limited set of annotations, but can exploit the large set of unlabeled pixels available in the images. This setting, called semi-supervised learning, allows significantly improving the detection. In this Thesis we address the development of methods for novelty detection and one-class classification with few or no labeled information. The proposed methodologies build upon the kernel methods, which take place within a principled but flexible framework for learning with data showing potentially non-linear feature relations. The thesis is divided into two parts, each one having a different assumption on the data structure and both addressing unsupervised (automatic) and semi-supervised (semi-automatic) learning settings. The first part assumes the data to be formed by arbitrary-shaped and overlapping clusters and studies the use of kernel machines, such as Support Vector Machines or Gaussian Processes. An emphasis is put on the robustness to noise and outliers and on the automatic retrieval of parameters. Experiments on multi-temporal multispectral images for change detection are carried out using only information from unchanged regions or none at all. The second part assumes high-dimensional data to lie on multiple low dimensional structures, called manifolds. We propose a method seeking a sparse and low-rank representation of the data mapped in a non-linear feature space. This representation allows us to build a graph, which is cut into several groups using spectral clustering. For the semi-supervised case where few labels of one class of interest are available, we study several approaches incorporating the graph information. The class labels can either be propagated on the graph, constrain spectral clustering or used to train a one-class classifier regularized by the given graph. Experiments on the unsupervised and oneclass classification of hyperspectral images demonstrate the effectiveness of the proposed approaches

    Augmented Deep Representations for Unconstrained Still/Video-based Face Recognition

    Get PDF
    Face recognition is one of the active areas of research in computer vision and biometrics. Many approaches have been proposed in the literature that demonstrate impressive performance, especially those based on deep learning. However, unconstrained face recognition with large pose, illumination, occlusion and other variations is still an unsolved problem. Unconstrained video-based face recognition is even more challenging due to the large volume of data to be processed, lack of labeled training data and significant intra/inter-video variations on scene, blur, video quality, etc. Although Deep Convolutional Neural Networks (DCNNs) have provided discriminant representations for faces and achieved performance surpassing humans in controlled scenarios, modifications are necessary for face recognition in unconstrained conditions. In this dissertation, we propose several methods that improve unconstrained face recognition performance by augmenting the representation provided by the deep networks using correlation or contextual information in the data. For unconstrained still face recognition, we present an encoding approach to combine the Fisher vector (FV) encoding and DCNN representations, which is called FV-DCNN. The feature maps from the last convolutional layer in the deep network are encoded by FV into a robust representation, which utilizes the correlation between facial parts within each face. A VLAD-based encoding method called VLAD-DCNN is also proposed as an extension. Extensive evaluations on three challenging face recognition datasets show that the proposed FV-DCNN and VLAD-DCNN perform comparable to or better than many state-of-the-art face verification methods. For the more challenging video-based face recognition task, we first propose an automatic system and model the video-to-video similarity as subspace-to-subspace similarity, where the subspaces characterize the correlation between deep representations of faces in videos. In the system, a quality-aware subspace-to-subspace similarity is introduced, where subspaces are learned using quality-aware principal component analysis. Subspaces along with quality-aware exemplars of templates are used to produce the similarity scores between video pairs by a quality-aware principal angle-based subspace-to-subspace similarity metric. The method is evaluated on four video datasets. The experimental results demonstrate the superior performance of the proposed method. To utilize the temporal information in videos, a hybrid dictionary learning method is also proposed for video-based face recognition. The proposed unsupervised approach effectively models the temporal correlation between deep representations of video faces using dynamical dictionaries. A practical iterative optimization algorithm is introduced to learn the dynamical dictionary. Experiments on three video-based face recognition datasets demonstrate that the proposed method can effectively learn robust and discriminative representation for videos and improve the face recognition performance. Finally, to leverage contextual information in videos, we present the Uncertainty-Gated Graph (UGG) for unconstrained video-based face recognition. It utilizes contextual information between faces by conducting graph-based identity propagation between sample tracklets, where identity information are initialized by the deep representations of video faces. UGG explicitly models the uncertainty of the contextual connections between tracklets by adaptively updating the weights of the edge gates according to the identity distributions of the nodes during inference. UGG is a generic graphical model that can be applied at only inference time or with end-to-end training. We demonstrate the effectiveness of UGG with state-of-the-art results on the recently released challenging Cast Search in Movies and IARPA Janus Surveillance Video Benchmark datasets
    corecore