14 research outputs found
A Hardware-Friendly Algorithm for Scalable Training and Deployment of Dimensionality Reduction Models on FPGA
With ever-increasing application of machine learning models in various
domains such as image classification, speech recognition and synthesis, and
health care, designing efficient hardware for these models has gained a lot of
popularity. While the majority of researches in this area focus on efficient
deployment of machine learning models (a.k.a inference), this work concentrates
on challenges of training these models in hardware. In particular, this paper
presents a high-performance, scalable, reconfigurable solution for both
training and deployment of different dimensionality reduction models in
hardware by introducing a hardware-friendly algorithm. Compared to
state-of-the-art implementations, our proposed algorithm and its hardware
realization decrease resource consumption by 50\% without any degradation in
accuracy
Runtime Deep Model Multiplexing for Reduced Latency and Energy Consumption Inference
We propose a learning algorithm to design a light-weight neural multiplexer
that given the input and computational resource requirements, calls the model
that will consume the minimum compute resources for a successful inference.
Mobile devices can use the proposed algorithm to offload the hard inputs to the
cloud while inferring the easy ones locally. Besides, in the large scale
cloud-based intelligent applications, instead of replicating the most-accurate
model, a range of small and large models can be multiplexed from depending on
the input's complexity which will save the cloud's computational resources. The
input complexity or hardness is determined by the number of models that can
predict the correct label. For example, if no model can predict the label
correctly, then the input is considered as the hardest. The proposed algorithm
allows the mobile device to detect the inputs that can be processed locally and
the ones that require a larger model and should be sent a cloud server.
Therefore, the mobile user benefits from not only the local processing but also
from an accurate model hosted on a cloud server. Our experimental results show
that the proposed algorithm improves mobile's model accuracy by 8.52% which is
because of those inputs that are properly selected and offloaded to the cloud
server. In addition, it saves the cloud providers' compute resources by a
factor of 2.85x as small models are chosen for easier inputs
A Meta-Learning Approach for Custom Model Training
Transfer-learning and meta-learning are two effective methods to apply
knowledge learned from large data sources to new tasks. In few-class, few-shot
target task settings (i.e. when there are only a few classes and training
examples available in the target task), meta-learning approaches that optimize
for future task learning have outperformed the typical transfer approach of
initializing model weights from a pre-trained starting point. But as we
experimentally show, meta-learning algorithms that work well in the few-class
setting do not generalize well in many-shot and many-class cases. In this
paper, we propose a joint training approach that combines both
transfer-learning and meta-learning. Benefiting from the advantages of each,
our method obtains improved generalization performance on unseen target tasks
in both few- and many-class and few- and many-shot scenarios.Comment: AAAI 201