205 research outputs found
A Novel Sequential Coreset Method for Gradient Descent Algorithms
A wide range of optimization problems arising in machine learning can be
solved by gradient descent algorithms, and a central question in this area is
how to efficiently compress a large-scale dataset so as to reduce the
computational complexity. {\em Coreset} is a popular data compression technique
that has been extensively studied before. However, most of existing coreset
methods are problem-dependent and cannot be used as a general tool for a
broader range of applications. A key obstacle is that they often rely on the
pseudo-dimension and total sensitivity bound that can be very high or hard to
obtain. In this paper, based on the ''locality'' property of gradient descent
algorithms, we propose a new framework, termed ''sequential coreset'', which
effectively avoids these obstacles. Moreover, our method is particularly
suitable for sparse optimization whence the coreset size can be further reduced
to be only poly-logarithmically dependent on the dimension. In practice, the
experimental results suggest that our method can save a large amount of running
time compared with the baseline algorithms
Coresets for Relational Data and The Applications
A coreset is a small set that can approximately preserve the structure of the
original input data set. Therefore we can run our algorithm on a coreset so as
to reduce the total computational complexity. Conventional coreset techniques
assume that the input data set is available to process explicitly. However,
this assumption may not hold in real-world scenarios. In this paper, we
consider the problem of coresets construction over relational data. Namely, the
data is decoupled into several relational tables, and it could be very
expensive to directly materialize the data matrix by joining the tables. We
propose a novel approach called ``aggregation tree with pseudo-cube'' that can
build a coreset from bottom to up. Moreover, our approach can neatly circumvent
several troublesome issues of relational learning problems [Khamis et al., PODS
2019]. Under some mild assumptions, we show that our coreset approach can be
applied for the machine learning tasks, such as clustering, logistic regression
and SVM
A Comparison between Deep Neural Nets and Kernel Acoustic Models for Speech Recognition
We study large-scale kernel methods for acoustic modeling and compare to DNNs
on performance metrics related to both acoustic modeling and recognition.
Measuring perplexity and frame-level classification accuracy, kernel-based
acoustic models are as effective as their DNN counterparts. However, on
token-error-rates DNN models can be significantly better. We have discovered
that this might be attributed to DNN's unique strength in reducing both the
perplexity and the entropy of the predicted posterior probabilities. Motivated
by our findings, we propose a new technique, entropy regularized perplexity,
for model selection. This technique can noticeably improve the recognition
performance of both types of models, and reduces the gap between them. While
effective on Broadcast News, this technique could be also applicable to other
tasks.Comment: arXiv admin note: text overlap with arXiv:1411.400
Towards Sustainable Learning: Coresets for Data-efficient Deep Learning
To improve the efficiency and sustainability of learning deep models, we
propose CREST, the first scalable framework with rigorous theoretical
guarantees to identify the most valuable examples for training non-convex
models, particularly deep networks. To guarantee convergence to a stationary
point of a non-convex function, CREST models the non-convex loss as a series of
quadratic functions and extracts a coreset for each quadratic sub-region. In
addition, to ensure faster convergence of stochastic gradient methods such as
(mini-batch) SGD, CREST iteratively extracts multiple mini-batch coresets from
larger random subsets of training data, to ensure nearly-unbiased gradients
with small variances. Finally, to further improve scalability and efficiency,
CREST identifies and excludes the examples that are learned from the coreset
selection pipeline. Our extensive experiments on several deep networks trained
on vision and NLP datasets, including CIFAR-10, CIFAR-100, TinyImageNet, and
SNLI, confirm that CREST speeds up training deep networks on very large
datasets, by 1.7x to 2.5x with minimum loss in the performance. By analyzing
the learning difficulty of the subsets selected by CREST, we show that deep
models benefit the most by learning from subsets of increasing difficulty
levels
- …