3,240 research outputs found

    Learning Models For Corrupted Multi-Dimensional Data: Fundamental Limits And Algorithms

    Get PDF
    Developing machine learning models for unstructured multi-dimensional datasets such as datasets with unreliable labels and noisy multi-dimensional signals with or without missing information have becoming a central necessity. We are not always fortunate enough to get noise-free datasets for developing classification and representation models. Though there is a number of techniques available to deal with noisy datasets, these methods do not exploit the multi-dimensional structures of the signals, which could be used to improve the overall classification and representation performance of the model. In this thesis, we develop a Kronecker-structure (K-S) subspace model that exploits the multi-dimensional structure of the signal. First, we study the classification performance of K-S subspace models in two asymptotic regimes when the signal dimensions go to infinity and when the noise power tends to zero. We characterize the misclassification probability in terms of diversity order and we drive an exact expression for the diversity order. We further derive a tighter bound on misclassification probability in terms of pairwise geometry of the subspaces. The proposed scheme is optimal in most of the signal dimension regimes except in one regime where the signal dimension is less than twice the subspace dimension, however, hitting such a signal dimension regime is very rare in practice. We empirically show that the classification performance of K-S subspace models agrees with the diversity order analysis. We also develop an algorithm, Kronecker- Structured Learning of Discriminative Dictionaries (K-SLD2), for fast and compact K-S subspace learning for better classification and representation of multidimensional signals. We show that the K-SLD2 algorithm balances compact signal representation and good classification performance on synthetic and real-world datasets. Next, we develop a scheme to detect whether a given multi-dimensional signal with missing information lies on a given K-S subspace. We find that under some mild incoherence conditions we must observe ��(��1 log ��1) number of rows and ��(��2 log ��2) number of columns in order to detect the K-S subspace. In order to account for unreliable labels in datasets we present Nonlinear, Noise- aware, Quasiclustering (NNAQC), a method for learning deep convolutional networks from datasets corrupted by unknown label noise. We append a nonlinear noise model to a standard convolutional network, which is learned in tandem with the parameters of the network. Further, we train the network using a loss function that encourages the clustering of training images. We argue that the non-linear noise model, while not rigorous as a probabilistic model, results in a more effective denoising operator during backpropagation. We evaluate the performance of NNAQC on artificially injected label noise to MNIST, CIFAR-10, CIFAR-100, and ImageNet datasets and on a large-scale Clothing1M dataset with inherent label noise. We show that on all these datasets, NNAQC provides significantly improved classification performance over the state of the art and is robust to the amount of label noise and the training samples

    Graph Kernels

    Get PDF
    We present a unified framework to study graph kernels, special cases of which include the random walk (Gärtner et al., 2003; Borgwardt et al., 2005) and marginalized (Kashima et al., 2003, 2004; Mahé et al., 2004) graph kernels. Through reduction to a Sylvester equation we improve the time complexity of kernel computation between unlabeled graphs with n vertices from O(n^6) to O(n^3). We find a spectral decomposition approach even more efficient when computing entire kernel matrices. For labeled graphs we develop conjugate gradient and fixed-point methods that take O(dn^3) time per iteration, where d is the size of the label set. By extending the necessary linear algebra to Reproducing Kernel Hilbert Spaces (RKHS) we obtain the same result for d-dimensional edge kernels, and O(n^4) in the infinite-dimensional case; on sparse graphs these algorithms only take O(n^2) time per iteration in all cases. Experiments on graphs from bioinformatics and other application domains show that these techniques can speed up computation of the kernel by an order of magnitude or more. We also show that certain rational kernels (Cortes et al., 2002, 2003, 2004) when specialized to graphs reduce to our random walk graph kernel. Finally, we relate our framework to R-convolution kernels (Haussler, 1999) and provide a kernel that is close to the optimal assignment kernel of Fröhlich et al. (2006) yet provably positive semi-definite
    corecore