463 research outputs found

    Doctor of Philosophy

    Get PDF
    dissertationThe goal of machine learning is to develop efficient algorithms that use training data to create models that generalize well to unseen data. Learning algorithms can use labeled data, unlabeled data or both. Supervised learning algorithms learn a model using labeled data only. Unsupervised learning methods learn the internal structure of a dataset using only unlabeled data. Lastly, semisupervised learning is the task of finding a model using both labeled and unlabeled data. In this research work, we contribute to both supervised and semisupervised learning. We contribute to supervised learning by proposing an efficient high-dimensional space coverage scheme which is based on the disjunctive normal form. We use conjunctions of a set of half-spaces to create a set of convex polytopes. Disjunction of these polytopes can provide desirable coverage of space. Unlike traditional methods based on neural networks, we do not initialize the model parameters randomly. As a result, our model minimizes the risk of poor local minima and higher learning rates can be used which leads to faster convergence. We contribute to semisupervised learning by proposing 2 unsupervised loss functions that form the basis of a novel semisupervised learning method. The first loss function is called Mutual-Exclusivity. The motivation of this method is the observation that an optimal decision boundary lies between the manifolds of different classes where there are no or very few samples. Decision boundaries can be pushed away from training samples by maximizing their margin and it is not necessary to know the class labels of the samples to maximize the margin. The second loss is named Transformation/Stability and is based on the fact that the prediction of a classifier for a data sample should not change with respect to transformations and perturbations applied to that data sample. In addition, internal variations of a learning system should have little to no effect on the output. The proposed loss minimizes the variation in the prediction of the network for a specific data sample. We also show that the same technique can be used to improve the robustness of a learning model with respect to adversarial examples

    Doctor of Philosophy

    Get PDF
    dissertationMachine learning is the science of building predictive models from data that automatically improve based on past experience. To learn these models, traditional learning algorithms require labeled data. They also require that the entire dataset fits in the memory of a single machine. Labeled data are available or can be acquired for small and moderately sized datasets but curating large datasets can be prohibitively expensive. Similarly, massive datasets are usually too huge to fit into the memory of a single machine. An alternative is to distribute the dataset over multiple machines. Distributed learning, however, poses new challenges as most existing machine learning techniques are inherently sequential. Additionally, these distributed approaches have to be designed keeping in mind various resource limitations of real-world settings, prime among them being intermachine communication. With the advent of big datasets machine learning algorithms are facing new challenges. Their design is no longer limited to minimizing some loss function but, additionally, needs to consider other resources that are critical when learning at scale. In this thesis, we explore different models and measures for learning with limited resources that have a budget. What budgetary constraints are posed by modern datasets? Can we reuse or combine existing machine learning paradigms to address these challenges at scale? How does the cost metrics change when we shift to distributed models for learning? These are some of the questions that have been investigated in this thesis. The answers to these questions hold the key to addressing some of the challenges faced when learning on massive datasets. In the first part of this thesis, we present three different budgeted scenarios that deal with scarcity of labeled data and limited computational resources. The goal is to leverage transfer information from related domains to learn under budgetary constraints. Our proposed techniques comprise semisupervised transfer, online transfer and active transfer. In the second part of this thesis, we study distributed learning with limited communication. We present initial sampling based results, as well as, propose communication protocols for learning distributed linear classifiers

    Lightweight Adaptation of Classifiers to Users and Contexts: Trends of the Emerging Domain

    Get PDF
    Intelligent computer applications need to adapt their behaviour to contexts and users, but conventional classifier adaptation methods require long data collection and/or training times. Therefore classifier adaptation is often performed as follows: at design time application developers define typical usage contexts and provide reasoning models for each of these contexts, and then at runtime an appropriate model is selected from available ones. Typically, definition of usage contexts and reasoning models heavily relies on domain knowledge. However, in practice many applications are used in so diverse situations that no developer can predict them all and collect for each situation adequate training and test databases. Such applications have to adapt to a new user or unknown context at runtime just from interaction with the user, preferably in fairly lightweight ways, that is, requiring limited user effort to collect training data and limited time of performing the adaptation. This paper analyses adaptation trends in several emerging domains and outlines promising ideas, proposed for making multimodal classifiers user-specific and context-specific without significant user efforts, detailed domain knowledge, and/or complete retraining of the classifiers. Based on this analysis, this paper identifies important application characteristics and presents guidelines to consider these characteristics in adaptation design

    Adaptive Graph via Multiple Kernel Learning for Nonnegative Matrix Factorization

    Full text link
    Nonnegative Matrix Factorization (NMF) has been continuously evolving in several areas like pattern recognition and information retrieval methods. It factorizes a matrix into a product of 2 low-rank non-negative matrices that will define parts-based, and linear representation of nonnegative data. Recently, Graph regularized NMF (GrNMF) is proposed to find a compact representation,which uncovers the hidden semantics and simultaneously respects the intrinsic geometric structure. In GNMF, an affinity graph is constructed from the original data space to encode the geometrical information. In this paper, we propose a novel idea which engages a Multiple Kernel Learning approach into refining the graph structure that reflects the factorization of the matrix and the new data space. The GrNMF is improved by utilizing the graph refined by the kernel learning, and then a novel kernel learning method is introduced under the GrNMF framework. Our approach shows encouraging results of the proposed algorithm in comparison to the state-of-the-art clustering algorithms like NMF, GrNMF, SVD etc.Comment: This paper has been withdrawn by the author due to the terrible writin

    Schroedinger Eigenmaps for Manifold Alignment of Multimodal Hyperspectral Images

    Get PDF
    Multimodal remote sensing is an upcoming field as it allows for many views of the same region of interest. Domain adaption attempts to fuse these multimodal remotely sensed images by utilizing the concept of transfer learning to understand data from different sources to learn a fused outcome. Semisupervised Manifold Alignment (SSMA) maps multiple Hyperspectral images (HSIs) from high dimensional source spaces to a low dimensional latent space where similar elements reside closely together. SSMA preserves the original geometric structure of respective HSIs whilst pulling similar data points together and pushing dissimilar data points apart. The SSMA algorithm is comprised of a geometric component, a similarity component and dissimilarity component. The geometric component of the SSMA method has roots in the original Laplacian Eigenmaps (LE) dimension reduction algorithm and the projection functions have roots in the original Locality Preserving Projections (LPP) dimensionality reduction framework. The similarity and dissimilarity component is a semisupervised component that allows expert labeled information to improve the image fusion process. Spatial-Spectral Schroedinger Eigenmaps (SSSE) was designed as a semisupervised enhancement to the LE algorithm by augmenting the Laplacian matrix with a user-defined potential function. However, the user-defined enhancement has yet to be explored in the LPP framework. The first part of this thesis proposes to use the Spatial-Spectral potential within the LPP algorithm, creating a new algorithm we call the Schroedinger Eigenmap Projections (SEP). Through experiments on publicly available data with expert-labeled ground truth, we perform experiments to compare the performance of the SEP algorithm with respect to the LPP algorithm. The second part of this thesis proposes incorporating the Spatial Spectral potential from SSSE into the SSMA framework. Using two multi-angled HSI’s, we explore the impact of incorporating this potential into SSMA

    Nonparametric Feature Extraction from Dendrograms

    Full text link
    We propose feature extraction from dendrograms in a nonparametric way. The Minimax distance measures correspond to building a dendrogram with single linkage criterion, with defining specific forms of a level function and a distance function over that. Therefore, we extend this method to arbitrary dendrograms. We develop a generalized framework wherein different distance measures can be inferred from different types of dendrograms, level functions and distance functions. Via an appropriate embedding, we compute a vector-based representation of the inferred distances, in order to enable many numerical machine learning algorithms to employ such distances. Then, to address the model selection problem, we study the aggregation of different dendrogram-based distances respectively in solution space and in representation space in the spirit of deep representations. In the first approach, for example for the clustering problem, we build a graph with positive and negative edge weights according to the consistency of the clustering labels of different objects among different solutions, in the context of ensemble methods. Then, we use an efficient variant of correlation clustering to produce the final clusters. In the second approach, we investigate the sequential combination of different distances and features sequentially in the spirit of multi-layered architectures to obtain the final features. Finally, we demonstrate the effectiveness of our approach via several numerical studies
    • …
    corecore