Challenges in Modern Machine Learning: Multiresolution Structure, Model Understanding and Transfer Learning

Abstract

Recent advances in Artificial Intelligence (AI) are characterized by ever-increasing sizes of datasets and reemergence of neural-network methods. The modern AI pipeline begins with building datasets, followed by designing and training machine-learning models, and concludes with deployment of trained models in the real world. We tackle three important challenges relevant to this era; one from each part of the pipeline: 1) efficiently manipulating large matrices arising in real-world datasets (e.g., graph Laplacians from social network datasets), 2) interpreting deep-neural-network models, and 3) efficiently deploying hundreds of deep-neural-network models on embedded devices. Matrices arising in large, real-world datasets are oftentimes found to have high rank, rendering common matrix-manipulation approaches that are based on the low-rank assumption (e.g. SVD) ineffective. In the first part of this thesis, we build upon Multiresolution Matrix Factorization (MMF), a method originally proposed to perform multiresolution analysis on discrete spaces, and can consequently model hierarchical structure in symmetric matrices as a matrix factorization. We describe a parallel algorithm for computing the factorization that can scale up to matrices with a million rows and columns. We then showcase an application of MMF, wherein we demonstrate a preconditioner that accelerates iterative algorithms solving systems of linear equations. Among wavelet-based preconditioners, the MMF-preconditioner consistently results in faster convergence and is highly scalable. Finally, we propose approaches to extend MMF to asymmetric matrices and evaluate them in the context of matrix compression. In the second part of the thesis, we address the black-box nature of deep-neural-network models. The goodness of a deep-neural-network model is typically measured by its test accuracy. We argue that it is an incomplete measure, and show that state-of-the-art question-answering models often ignore important question terms. We perform a case study of a question-answering model and expose various ways in which the network gets the right answer for the wrong reasons. We propose a human-in-the-loop workflow based on the notion of "attribution" (word-importance) to understand the input-output behavior of neural network models, extract rules, identify weaknesses and construct adversarial attacks by leveraging the weaknesses. Our strongest attacks drop the accuracy of a visual question answering model from 61.1% to 19%, and that of a tabular question answering model from 33.5% to 3.3%. We propose a measure for overstability - the tendency of a model to rely on trigger logic and ignore semantics. We use a path-sensitive attribution method to extract contextual synonyms (rules) learned by a model. We discuss how attributions can augment standard measures of accuracy and empower investigation of model performance. We finish by identifying opportunities for research: abstraction tools that aid the debugging process, concepts and semantics of path-sensitive dataflow analysis, and formalizing the process of verifying natural-language-based specifications. The third challenge pertains to real-world deployment of deep-neural-network models. With the proliferation of personal devices such as phones, smart assistants, etc., the grounds for much of the human-AI interactions has shifted away from the cloud. While this has critical advantages such as user privacy and faster response times, as the space of deep-learning-based applications expands, limited availability of memory on these devices makes deploying hundreds of models impractical. We tackle the problem of re-purposing trained deep-neural-network models to new tasks while keeping most of the learned weights intact. Our method introduces the concept of a "model patch'' -- a set of small, trainable layers -- that can be applied to an existing trained model to adapt it to a new task. While keeping more than 98% of the weights intact, we show significantly higher transfer-learning performance from an object-detection task to an image-classification task, compared to traditional last-layer fine-tuning, among other results. We show how the model-patch idea can be used in multitask learning, where, despite using significantly fewer parameters, we incur zero accuracy loss compared to single-task performance for all the involved tasks

    Similar works