180,366 research outputs found
Scalable and Sustainable Deep Learning via Randomized Hashing
Current deep learning architectures are growing larger in order to learn from
complex datasets. These architectures require giant matrix multiplication
operations to train millions of parameters. Conversely, there is another
growing trend to bring deep learning to low-power, embedded devices. The matrix
operations, associated with both training and testing of deep networks, are
very expensive from a computational and energy standpoint. We present a novel
hashing based technique to drastically reduce the amount of computation needed
to train and test deep networks. Our approach combines recent ideas from
adaptive dropouts and randomized hashing for maximum inner product search to
select the nodes with the highest activation efficiently. Our new algorithm for
deep learning reduces the overall computational cost of forward and
back-propagation by operating on significantly fewer (sparse) nodes. As a
consequence, our algorithm uses only 5% of the total multiplications, while
keeping on average within 1% of the accuracy of the original model. A unique
property of the proposed hashing based back-propagation is that the updates are
always sparse. Due to the sparse gradient updates, our algorithm is ideally
suited for asynchronous and parallel training leading to near linear speedup
with increasing number of cores. We demonstrate the scalability and
sustainability (energy efficiency) of our proposed algorithm via rigorous
experimental evaluations on several real datasets
Reservoir Computing for Learning in Structured Domains
The study of learning models for direct processing complex data structures has gained an
increasing interest within the Machine Learning (ML) community during the last decades.
In this concern, efficiency, effectiveness and adaptivity of the ML models on large classes
of data structures represent challenging and open research issues.
The paradigm under consideration is Reservoir Computing (RC), a novel and extremely
efficient methodology for modeling Recurrent Neural Networks (RNN) for adaptive
sequence processing. RC comprises a number of different neural models, among which the
Echo State Network (ESN) probably represents the most popular, used and studied one.
Another research area of interest is represented by Recursive Neural Networks (RecNNs),
constituting a class of neural network models recently proposed for dealing with
hierarchical data structures directly.
In this thesis the RC paradigm is investigated and suitably generalized in order to
approach the problems arising from learning in structured domains. The research studies
described in this thesis cover classes of data structures characterized by increasing
complexity, from sequences, to trees and graphs structures. Accordingly, the research focus
goes progressively from the analysis of standard ESNs for sequence processing, to the
development of new models for trees and graphs structured domains. The analysis of ESNs
for sequence processing addresses the interesting problem of identifying and
characterizing the relevant factors which influence the reservoir dynamics and the ESN performance.
Promising applications of ESNs in the emerging field of Ambient Assisted Living are also
presented and discussed. Moving towards highly structured data representations, the
ESN model is extended to deal with complex structures directly, resulting in the proposed
TreeESN, which is suitable for domains comprising hierarchical structures, and Graph-ESN,
which generalizes the approach to a large class of cyclic/acyclic directed/undirected
labeled graphs. TreeESNs and GraphESNs represent both novel RC models for structured
data and extremely efficient approaches for modeling RecNNs, eventually contributing
to the definition of an RC framework for learning in structured domains. The problem
of adaptively exploiting the state space in GraphESNs is also investigated, with specific
regard to tasks in which input graphs are required to be mapped into flat vectorial outputs,
resulting in the GraphESN-wnn and GraphESN-NG models. As a further point, the
generalization performance of the proposed models is evaluated considering both artificial
and complex real-world tasks from different application domains, including Chemistry,
Toxicology and Document Processing
- …