14 research outputs found

    Towards Better Representation Learning in the Absence of Sufficient Supervision

    Get PDF
    We focus on the problem of learning representations from data in the situation where we do not have access to sufficient supervision such as labels or feature values. This situation can be present in many real-world machine learning tasks. We approach this problem from different perspectives summarized as follows.First, we assume there is some knowledge already available from a different but related task or model, and aim at using that knowledge in our task of interest. We perform this form of knowledge transfer in two different but related ways: i. using the knowledge available in kernel embeddings to improve the training properties of a neural network, and ii. transferring the knowledge available in a large model to a smaller one. In the former case, we use the recent theoretical results on training of neural networks and a multiple kernel learning algorithm to achieve a high performance in terms of both optimization and generalization in a neural network.Next, we tackle the problem of learning appropriate data representations from an online learning point of view in which one should learn incrementally from an incoming source of data. We assume that the whole feature set of a data input is not always available, and seek a way to learn efficiently from a smaller set of feature values. We propose a novel online learning framework which builds a decision tree from a data stream, and yields highly accurate predictions, competitive with classical online decision tree learners but with a significantly lower cost

    Spectral Analysis of Kernel and Neural Embeddings: Optimization and Generalization

    Get PDF
    We extend the recent results of (Arora et al. 2019). by spectral analysis of the representations corresponding to the kernel and neural embeddings. They showed that in a simple single-layer network, the alignment of the labels to the eigenvectors of the corresponding Gram matrix determines both the convergence of the optimization during training as well as the generalization properties. We generalize their result to the kernel and neural representations and show these extensions improve both optimization and generalization of the basic setup studied in (Arora et al. 2019). In particular, we first extend the setup with the Gaussian kernel and the approximations by random Fourier features as well as with the embeddings produced by two-layer networks trained on different tasks. We then study the use of more sophisticated kernels and embeddings, those designed optimally for deep neural networks and those developed for the classification task of interest given the data and the training labels, independent of any specific classification model

    Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit Approach

    Full text link
    Online decision making plays a crucial role in numerous real-world applications. In many scenarios, the decision is made based on performing a sequence of tests on the incoming data points. However, performing all tests can be expensive and is not always possible. In this paper, we provide a novel formulation of the online decision making problem based on combinatorial multi-armed bandits and take the cost of performing tests into account. Based on this formulation, we provide a new framework for cost-efficient online decision making which can utilize posterior sampling or BayesUCB for exploration. We provide a rigorous theoretical analysis for our framework and present various experimental results that demonstrate its applicability to real-world problems

    Do Kernel and Neural Embeddings Help in Training and Generalization?

    Get PDF
    Recent results on optimization and generalization properties of neural networks showed that in a simple two-layer network, the alignment of the labels to the eigenvectors of the corresponding Gram matrix determines the convergence of the optimization during training. Such analyses also provide upper bounds on the generalization error. We experimentally investigate the implications of these results to deeper networks via embeddings. We regard the layers preceding the final hidden layer as producing different representations of the input data which are then fed to the two-layer model. We show that these representations improve both optimization and generalization. In particular, we investigate three kernel representations when fed to the final hidden layer: the Gaussian kernel and its approximation by random Fourier features, kernels designed to imitate representations produced by neural networks and finally an optimal kernel designed to align the data with target labels. The approximated representations induced by these kernels are fed to the neural network and the optimization and generalization properties of the final model are evaluated and compared

    Stochastic Proximal Algorithms with SON Regularization: Towards Efficient Optimal Transport for Domain Adaptation

    Full text link
    We propose a new regularizer for optimal transport (OT) which is tailored to better preserve the class structure of the subjected process. Accordingly, we provide the first theoretical guarantees for an OT scheme that respects class structure. We derive an accelerated proximal algorithm with a closed form projection and proximal operator scheme thereby affording a highly scalable algorithm for computing optimal transport plans. We provide a novel argument for the uniqueness of the optimum even in the absence of strong convexity.Our experiments show that the new regularizer does not only result in a better preservation of the class structure but also in additional robustness relative to previous regularizers

    Efficient Online Decision Tree Learning with Active Feature Acquisition

    Full text link
    Constructing decision trees online is a classical machine learning problem. Existing works often assume that features are readily available for each incoming data point. However, in many real world applications, both feature values and the labels are unknown a priori and can only be obtained at a cost. For example, in medical diagnosis, doctors have to choose which tests to perform (i.e., making costly feature queries) on a patient in order to make a diagnosis decision (i.e., predicting labels). We provide a fresh perspective to tackle this practical challenge. Our framework consists of an active planning oracle embedded in an online learning scheme for which we investigate several information acquisition functions. Specifically, we employ a surrogate information acquisition function based on adaptive submodularity to actively query feature values with a minimal cost, while using a posterior sampling scheme to maintain a low regret for online prediction. We demonstrate the efficiency and effectiveness of our framework via extensive experiments on various real-world datasets. Our framework also naturally adapts to the challenging setting of online learning with concept drift and is shown to be competitive with baseline models while being more flexible

    Analysis of Knowledge Transfer in Kernel Regime

    Get PDF
    Knowledge transfer is shown to be a very successful technique for training neural classifiers: together with the ground truth data, it uses the "privileged information" (PI) obtained by a "teacher" network to train a "student" network. It has been observed that classifiers learn much faster and more reliably via knowledge transfer. However, there has been little or no theoretical analysis of this phenomenon. To bridge this gap, we propose to approach the problem of knowledge transfer by regularizing the fit between the teacher and the student with PI provided by the teacher. Using tools from dynamical systems theory, we show that when the student is an extremely wide two layer network, we can analyze it in the kernel regime and show that it is able to interpolate between PI and the given data. This characterization sheds new light on the relation between the training error and capacity of the student relative to the teacher. Another contribution of the paper is a quantitative statement on the convergence of student network. We prove that the teacher reduces the number of required iterations for a student to learn, and consequently improves the generalization power of the student. We give corresponding experimental analysis that validates the theoretical results and yield additional insights

    On the Unreasonable Effectiveness of Knowledge Distillation: Analysis in the Kernel Regime

    Get PDF
    Knowledge distillation (KD), i.e. one classifier being trained on the outputs of another classifier, is an empirically very successful technique for knowledge transfer between classifiers. It has even been observed that classifiers learn much faster and more reliably if trained with the outputs of another classifier as soft labels, instead of from ground truth data. However, there has been little or no theoretical analysis of this phenomenon. We provide the first theoretical analysis of KD in the setting of extremely wide two layer non-linear networks in model and regime in (Arora et al., 2019; Du & Hu, 2019; Cao & Gu, 2019). We prove results on what the student network learns and on the rate of convergence for the student network. Intriguingly, we also confirm the lottery ticket hypothesis (Frankle & Carbin, 2019) in this model. To prove our results, we extend the repertoire of techniques from linear systems dynamics. We give corresponding experimental analysis that validates the theoretical results and yields additional insights

    Prophylactic Role of Lactobacillus paracasei Exopolysaccharides on Colon Cancer Cells through Apoptosis Not Ferroptosis

    Get PDF
    Background: Nowadays despite conventional methods in colon cancer treatment, targeting vital molecular pathways and induction of various forms of cell death by safe probiotic components like exopolysaccharides (EPSs) are of great importance and are considered as potential therapeutic agents. This study aimed to investigate the inhibitory effect of the EPS of L. paracasei on different colon cancer cell lines (SW-480, HT-29, and HCT-116). Methods: For this purpose, several cellular and molecular experiments including MTS assay, DAPI staining, Annexin V/PI assay, quantitative real-time PCR (qPCR) and some important ferroptosis-related assays were performed. Results: Based on the findings, L. paracasei EPS can induce apoptosis confirmed by all apoptosis related assays and could not act through ferroptosis pathways. L. paracasei EPS could hinder the Akt1, mTOR, and Jak-1 mRNAs, and induces apoptosis through down-regulation of the antiapoptotic gene (Bcl-2), up-regulation of pro-apoptotic genes (BAX, caspase-3, 8). Conclusion: The exploited EPS of an indigenous probiotic strain with anticancer potential with low/insignificant cytotoxicity to normal cells is proposed for future applications in molecular targeted therapy of colon cancer treatment. Furthermore, in vivo and clinical trials should be performed to evaluate the applicability of this component besides conventional methods to increase the survival rate of colon cancer patients

    Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit Approach

    No full text
    Online decision making plays a crucial role in numerous real-world applications. In many scenarios, the decision is made based on performing a sequence of tests on the incoming data points. However, performing all tests can be expensive and is not always possible. In this paper, we provide a novel formulation of the online decision making problem based on combinatorial multi-armed bandits and take the cost of performing tests into account. Based on this formulation, we provide a new framework for cost-efficient online decision making which can utilize posterior sampling or BayesUCB for exploration. We provide a rigorous theoretical analysis for our framework and present various experimental results that demonstrate its applicability to real-world problems
    corecore