8,243 research outputs found
Compact Personalized Models for Neural Machine Translation
We propose and compare methods for gradient-based domain adaptation of
self-attentive neural machine translation models. We demonstrate that a large
proportion of model parameters can be frozen during adaptation with minimal or
no reduction in translation quality by encouraging structured sparsity in the
set of offset tensors during learning via group lasso regularization. We
evaluate this technique for both batch and incremental adaptation across
multiple data sets and language pairs. Our system architecture - combining a
state-of-the-art self-attentive model with compact domain adaptation - provides
high quality personalized machine translation that is both space and time
efficient.Comment: Published at the 2018 Conference on Empirical Methods in Natural
Language Processin
Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes
I argue that data becomes temporarily interesting by itself to some
self-improving, but computationally limited, subjective observer once he learns
to predict or compress the data in a better way, thus making it subjectively
simpler and more beautiful. Curiosity is the desire to create or discover more
non-random, non-arbitrary, regular data that is novel and surprising not in the
traditional sense of Boltzmann and Shannon but in the sense that it allows for
compression progress because its regularity was not yet known. This drive
maximizes interestingness, the first derivative of subjective beauty or
compressibility, that is, the steepness of the learning curve. It motivates
exploring infants, pure mathematicians, composers, artists, dancers, comedians,
yourself, and (since 1990) artificial systems.Comment: 35 pages, 3 figures, based on KES 2008 keynote and ALT 2007 / DS 2007
joint invited lectur
Incremental Learning Using a Grow-and-Prune Paradigm with Efficient Neural Networks
Deep neural networks (DNNs) have become a widely deployed model for numerous
machine learning applications. However, their fixed architecture, substantial
training cost, and significant model redundancy make it difficult to
efficiently update them to accommodate previously unseen data. To solve these
problems, we propose an incremental learning framework based on a
grow-and-prune neural network synthesis paradigm. When new data arrive, the
neural network first grows new connections based on the gradients to increase
the network capacity to accommodate new data. Then, the framework iteratively
prunes away connections based on the magnitude of weights to enhance network
compactness, and hence recover efficiency. Finally, the model rests at a
lightweight DNN that is both ready for inference and suitable for future
grow-and-prune updates. The proposed framework improves accuracy, shrinks
network size, and significantly reduces the additional training cost for
incoming data compared to conventional approaches, such as training from
scratch and network fine-tuning. For the LeNet-300-100 and LeNet-5 neural
network architectures derived for the MNIST dataset, the framework reduces
training cost by up to 64% (63%) and 67% (63%) compared to training from
scratch (network fine-tuning), respectively. For the ResNet-18 architecture
derived for the ImageNet dataset and DeepSpeech2 for the AN4 dataset, the
corresponding training cost reductions against training from scratch (network
fine-tunning) are 64% (60%) and 67% (62%), respectively. Our derived models
contain fewer network parameters but achieve higher accuracy relative to
conventional baselines
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
SIESTA: Efficient Online Continual Learning with Sleep
In supervised continual learning, a deep neural network (DNN) is updated with
an ever-growing data stream. Unlike the offline setting where data is shuffled,
we cannot make any distributional assumptions about the data stream. Ideally,
only one pass through the dataset is needed for computational efficiency.
However, existing methods are inadequate and make many assumptions that cannot
be made for real-world applications, while simultaneously failing to improve
computational efficiency. In this paper, we propose a novel continual learning
method, SIESTA based on wake/sleep framework for training, which is well
aligned to the needs of on-device learning. The major goal of SIESTA is to
advance compute efficient continual learning so that DNNs can be updated
efficiently using far less time and energy. The principal innovations of SIESTA
are: 1) rapid online updates using a rehearsal-free, backpropagation-free, and
data-driven network update rule during its wake phase, and 2) expedited memory
consolidation using a compute-restricted rehearsal policy during its sleep
phase. For memory efficiency, SIESTA adapts latent rehearsal using memory
indexing from REMIND. Compared to REMIND and prior arts, SIESTA is far more
computationally efficient, enabling continual learning on ImageNet-1K in under
2 hours on a single GPU; moreover, in the augmentation-free setting it matches
the performance of the offline learner, a milestone critical to driving
adoption of continual learning in real-world applications.Comment: Accepted to TMLR 202
- …