Search CORE

27 research outputs found

Accelerating recurrent neural network training using sequence bucketing and multi-GPU data parallelization

Author: Bokhan Kostiantyn
Khomenko Viacheslav
Radyvonenko Olga
Shyshkov Oleg
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/08/2017
Field of study

An efficient algorithm for recurrent neural network training is presented. The approach increases the training speed for tasks where a length of the input sequence may vary significantly. The proposed approach is based on the optimal batch bucketing by input sequence length and data parallelization on multiple graphical processing units. The baseline training performance without sequence bucketing is compared with the proposed solution for a different number of buckets. An example is given for the online handwriting recognition task using an LSTM recurrent neural network. The evaluation is performed in terms of the wall clock time, number of epochs, and validation loss value.Comment: 4 pages, 5 figures, Comments, 2016 IEEE First International Conference on Data Stream Mining & Processing (DSMP), Lviv, 201

arXiv.org e-Print Archive

Crossref

Incremental Training of a Recurrent Neural Network Exploiting a Multi-Scale Dynamic Memory

Author: Bacciu Davide
Carta Antonio
Sperduti Alessandro
Publication venue
Publication date: 01/01/2020
Field of study

The effectiveness of recurrent neural networks can be largely influenced by their ability to store into their dynamical memory information extracted from input sequences at different frequencies and timescales. Such a feature can be introduced into a neural architecture by an appropriate modularization of the dynamic memory. In this paper we propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning. First, we show how to extend the architecture of a simple RNN by separating its hidden state into different modules, each subsampling the network hidden activations at different frequencies. Then, we discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies. Each new module works at a slower frequency than the previous ones and it is initialized to encode the subsampled sequence of hidden activations. Experimental results on synthetic and real-world datasets on speech recognition and handwritten characters show that the modular architecture and the incremental training algorithm improve the ability of recurrent neural networks to capture long-term dependencies.Comment: accepted @ ECML 2020. arXiv admin note: substantial text overlap with arXiv:2001.1177

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

A hypothesize-and-verify framework for Text Recognition using Deep Recurrent Neural Networks

Author: Chaudhury Santanu
Rajeswar Sai
Ray Anupama
Publication venue
Publication date: 26/02/2015
Field of study

Deep LSTM is an ideal candidate for text recognition. However text recognition involves some initial image processing steps like segmentation of lines and words which can induce error to the recognition system. Without segmentation, learning very long range context is difficult and becomes computationally intractable. Therefore, alternative soft decisions are needed at the pre-processing level. This paper proposes a hybrid text recognizer using a deep recurrent neural network with multiple layers of abstraction and long range context along with a language model to verify the performance of the deep neural network. In this paper we construct a multi-hypotheses tree architecture with candidate segments of line sequences from different segmentation algorithms at its different branches. The deep neural network is trained on perfectly segmented data and tests each of the candidate segments, generating unicode sequences. In the verification step, these unicode sequences are validated using a sub-string match with the language model and best first search is used to find the best possible combination of alternative hypothesis from the tree structure. Thus the verification framework using language models eliminates wrong segmentation outputs and filters recognition errors

arXiv.org e-Print Archive

Crossref

Implementasi Long Short-Term Memory (LSTM) untuk Prediksi Intensitas Curah Hujan (Studi Kasus: Kabupaten Malang)

Author: Faisal Muhammad Thoriq Afa
Irawan Muhammad Isa
Publication venue: Lembaga Penelitian dan Pengabdian Kepada Masyarakat (LPPM), ITS
Publication date: 29/06/2023
Field of study

Curah hujan merupakan salah satu fenomena alam yang dianggap sebagai salah satu faktor terpenting bagi setiap orang untuk meningkatkan produktivitasnya dalam berbagai sektor usaha. Kondisi ini sangat mempengaruhi dalam pengambilan keputusan yang optimal pada aspek kehidupan dengan berbagai tujuan, salah satu contohnya adalah kegiatan manusia di sektor pertanian. Sulitnya memprediksi curah hujan dikarenakan tidak menentunya keadaan cuaca. Pada beberapa daerah yang terlihat cerah, tidak lama kemudian dapat terjadi hujan bahkan badai. Kabupaten Malang merupakan daerah yang mempunyai iklim tropis dan juga memiliki sumber daya alam yang melimpah di sektor pertanian dan perkebunan. Pada sektor ini terdapat beberapa faktor yang memiliki pengaruh yang pada tingkat produktivitas yang mana salah satunya adalah curah hujan. Dengan dilakukannya prediksi pada curah hujan, yang bertujuan untuk meningkatkan produktivitas dan mobilitas pada aktivitas manusia. Penelitian ini membahas tentang prediksi curah hujan di Kabupaten Malang. Salah satu metode yang digunakan untuk memprediksi kondisi cuaca yaitu menggunakan Long Short-Term Memory (LSTM). Hasil penelitian ini diperoleh bahwa Model Long Short-Term Memory mempunyai performa terbaik dengan parameter yang telah ditentukan, dimana tingkat nilai error yang digunakan pada penelitian ini menggunakan RMSE dan MAE terkecil berturut-turut adalah sebesar 0.98162 dan 0.68847. Hal ini menunjukkan bahwa semakin kecil tingkat nilai error, maka semakin akurat model tersebut melakukan prediksi

Jurnal Sains dan Seni ITS

Institut Teknologi Sepuluh Nopember (ITS): Publikasi Ilmiah Online Mahasiswa ITS (POMITS)

DeepCare: A Deep Dynamic Memory Model for Predictive Medicine

Author: A Graves
AB Jensen
BB Granger
J Futoma
JM Corbin
JS Mathias
K Orphanou
PB Jensen
R Henriques
S Hochreiter
SJ Henly
T Tran
T Tran
Y LeCun
Publication venue
Publication date: 01/01/2016
Field of study

Personalized predictive medicine necessitates the modeling of patient illness and care processes, which inherently have long-term temporal dependencies. Healthcare observations, recorded in electronic medical records, are episodic and irregular in time. We introduce DeepCare, an end-to-end deep dynamic neural network that reads medical records, stores previous illness history, infers current illness states and predicts future medical outcomes. At the data level, DeepCare represents care episodes as vectors in space, models patient health state trajectories through explicit memory of historical records. Built on Long Short-Term Memory (LSTM), DeepCare introduces time parameterizations to handle irregular timed events by moderating the forgetting and consolidation of memory cells. DeepCare also incorporates medical interventions that change the course of illness and shape future medical risk. Moving up to the health state level, historical and present health states are then aggregated through multiscale temporal pooling, before passing through a neural network that estimates future outcomes. We demonstrate the efficacy of DeepCare for disease progression modeling, intervention recommendation, and future risk prediction. On two important cohorts with heavy social and economic burden -- diabetes and mental health -- the results show improved modeling and risk prediction accuracy.Comment: Accepted at JBI under the new name: "Predicting healthcare trajectories from medical records: A deep learning approach

arXiv.org e-Print Archive

Deakin Research Online

Crossref