14 research outputs found
Towards Better Representation Learning in the Absence of Sufficient Supervision
We focus on the problem of learning representations from data in the situation where we do not have access to sufficient supervision such as labels or feature values. This situation can be present in many real-world machine learning tasks. We approach this problem from different perspectives summarized as follows.First, we assume there is some knowledge already available from a different but related task or model, and aim at using that knowledge in our task of interest. We perform this form of knowledge transfer in two different but related ways: i. using the knowledge available in kernel embeddings to improve the training properties of a neural network, and ii. transferring the knowledge available in a large model to a smaller one. In the former case, we use the recent theoretical results on training of neural networks and a multiple kernel learning algorithm to achieve a high performance in terms of both optimization and generalization in a neural network.Next, we tackle the problem of learning appropriate data representations from an online learning point of view in which one should learn incrementally from an incoming source of data. We assume that the whole feature set of a data input is not always available, and seek a way to learn efficiently from a smaller set of feature values. We propose a novel online learning framework which builds a decision tree from a data stream, and yields highly accurate predictions, competitive with classical online decision tree learners but with a significantly lower cost
Spectral Analysis of Kernel and Neural Embeddings: Optimization and Generalization
We extend the recent results of (Arora et al. 2019). by spectral analysis of
the representations corresponding to the kernel and neural embeddings. They
showed that in a simple single-layer network, the alignment of the labels to
the eigenvectors of the corresponding Gram matrix determines both the
convergence of the optimization during training as well as the generalization
properties. We generalize their result to the kernel and neural representations
and show these extensions improve both optimization and generalization of the
basic setup studied in (Arora et al. 2019). In particular, we first extend the
setup with the Gaussian kernel and the approximations by random Fourier
features as well as with the embeddings produced by two-layer networks trained
on different tasks. We then study the use of more sophisticated kernels and
embeddings, those designed optimally for deep neural networks and those
developed for the classification task of interest given the data and the
training labels, independent of any specific classification model
Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit Approach
Online decision making plays a crucial role in numerous real-world
applications. In many scenarios, the decision is made based on performing a
sequence of tests on the incoming data points. However, performing all tests
can be expensive and is not always possible. In this paper, we provide a novel
formulation of the online decision making problem based on combinatorial
multi-armed bandits and take the cost of performing tests into account. Based
on this formulation, we provide a new framework for cost-efficient online
decision making which can utilize posterior sampling or BayesUCB for
exploration. We provide a rigorous theoretical analysis for our framework and
present various experimental results that demonstrate its applicability to
real-world problems
Do Kernel and Neural Embeddings Help in Training and Generalization?
Recent results on optimization and generalization properties of neural networks showed that in a simple two-layer network, the alignment of the labels to the eigenvectors of the corresponding Gram matrix determines the convergence of the optimization during training. Such analyses also provide upper bounds on the generalization error. We experimentally investigate the implications of these results to deeper networks via embeddings. We regard the layers preceding the final hidden layer as producing different representations of the input data which are then fed to the two-layer model. We show that these representations improve both optimization and generalization. In particular, we investigate three kernel representations when fed to the final hidden layer: the Gaussian kernel and its approximation by random Fourier features, kernels designed to imitate representations produced by neural networks and finally an optimal kernel designed to align the data with target labels. The approximated representations induced by these kernels are fed to the neural network and the optimization and generalization properties of the final model are evaluated and compared
Stochastic Proximal Algorithms with SON Regularization: Towards Efficient Optimal Transport for Domain Adaptation
We propose a new regularizer for optimal transport (OT) which is tailored to
better preserve the class structure of the subjected process. Accordingly, we
provide the first theoretical guarantees for an OT scheme that respects class
structure. We derive an accelerated proximal algorithm with a closed form
projection and proximal operator scheme thereby affording a highly scalable
algorithm for computing optimal transport plans. We provide a novel argument
for the uniqueness of the optimum even in the absence of strong convexity.Our
experiments show that the new regularizer does not only result in a better
preservation of the class structure but also in additional robustness relative
to previous regularizers
Efficient Online Decision Tree Learning with Active Feature Acquisition
Constructing decision trees online is a classical machine learning problem.
Existing works often assume that features are readily available for each
incoming data point. However, in many real world applications, both feature
values and the labels are unknown a priori and can only be obtained at a cost.
For example, in medical diagnosis, doctors have to choose which tests to
perform (i.e., making costly feature queries) on a patient in order to make a
diagnosis decision (i.e., predicting labels). We provide a fresh perspective to
tackle this practical challenge. Our framework consists of an active planning
oracle embedded in an online learning scheme for which we investigate several
information acquisition functions. Specifically, we employ a surrogate
information acquisition function based on adaptive submodularity to actively
query feature values with a minimal cost, while using a posterior sampling
scheme to maintain a low regret for online prediction. We demonstrate the
efficiency and effectiveness of our framework via extensive experiments on
various real-world datasets. Our framework also naturally adapts to the
challenging setting of online learning with concept drift and is shown to be
competitive with baseline models while being more flexible
Analysis of Knowledge Transfer in Kernel Regime
Knowledge transfer is shown to be a very successful technique for training neural classifiers: together with the ground truth data, it uses the "privileged information" (PI) obtained by a "teacher" network to train a "student" network. It has been observed that classifiers learn much faster and more reliably via knowledge transfer. However, there has been little or no theoretical analysis of this phenomenon. To bridge this gap, we propose to approach the problem of knowledge transfer by regularizing the fit between the teacher and the student with PI provided by the teacher. Using tools from dynamical systems theory, we show that when the student is an extremely wide two layer network, we can analyze it in the kernel regime and show that it is able to interpolate between PI and the given data. This characterization sheds new light on the relation between the training error and capacity of the student relative to the teacher. Another contribution of the paper is a quantitative statement on the convergence of student network. We prove that the teacher reduces the number of required iterations for a student to learn, and consequently improves the generalization power of the student. We give corresponding experimental analysis that validates the theoretical results and yield additional insights
On the Unreasonable Effectiveness of Knowledge Distillation: Analysis in the Kernel Regime
Knowledge distillation (KD), i.e. one classifier being trained on the outputs
of another classifier, is an empirically very successful technique for
knowledge transfer between classifiers. It has even been observed that
classifiers learn much faster and more reliably if trained with the outputs of
another classifier as soft labels, instead of from ground truth data. However,
there has been little or no theoretical analysis of this phenomenon. We provide
the first theoretical analysis of KD in the setting of extremely wide two layer
non-linear networks in model and regime in (Arora et al., 2019; Du & Hu, 2019;
Cao & Gu, 2019). We prove results on what the student network learns and on the
rate of convergence for the student network. Intriguingly, we also confirm the
lottery ticket hypothesis (Frankle & Carbin, 2019) in this model. To prove our
results, we extend the repertoire of techniques from linear systems dynamics.
We give corresponding experimental analysis that validates the theoretical
results and yields additional insights
Prophylactic Role of Lactobacillus paracasei Exopolysaccharides on Colon Cancer Cells through Apoptosis Not Ferroptosis
Background: Nowadays despite conventional methods in colon cancer treatment, targeting vital molecular pathways and induction of various forms of cell death by safe probiotic components like exopolysaccharides (EPSs) are of great importance and are considered as potential therapeutic agents. This study aimed to investigate the inhibitory effect of the EPS of L. paracasei on different colon cancer cell lines (SW-480, HT-29, and HCT-116). Methods: For this purpose, several cellular and molecular experiments including MTS assay, DAPI staining, Annexin V/PI assay, quantitative real-time PCR (qPCR) and some important ferroptosis-related assays were performed. Results: Based on the findings, L. paracasei EPS can induce apoptosis confirmed by all apoptosis related assays and could not act through ferroptosis pathways. L. paracasei EPS could hinder the Akt1, mTOR, and Jak-1 mRNAs, and induces apoptosis through down-regulation of the antiapoptotic gene (Bcl-2), up-regulation of pro-apoptotic genes (BAX, caspase-3, 8). Conclusion: The exploited EPS of an indigenous probiotic strain with anticancer potential with low/insignificant cytotoxicity to normal cells is proposed for future applications in molecular targeted therapy of colon cancer treatment. Furthermore, in vivo and clinical trials should be performed to evaluate the applicability of this component besides conventional methods to increase the survival rate of colon cancer patients
Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit Approach
Online decision making plays a crucial role in numerous real-world applications. In many scenarios, the decision is made based on performing a sequence of tests on the incoming data points. However, performing all tests can be expensive and is not always possible. In this paper, we provide a novel formulation of the online decision making problem based on combinatorial multi-armed bandits and take the cost of performing tests into account. Based on this formulation, we provide a new framework for cost-efficient online decision making which can utilize posterior sampling or BayesUCB for exploration. We provide a rigorous theoretical analysis for our framework and present various experimental results that demonstrate its applicability to real-world problems