4 research outputs found
Deep Gated Recurrent Unit for Smartphone-Based Image Captioning
Expressing the visual content of an image in natural language form has gained relevance due to technological and algorithmic advances together with improved computational processing capacity. Many smartphone applications for image captioning have been developed recently as built-in cameras provide advantages of easy-operation and portability, resulting in capturing an image whenever or wherever needed. Here, an encoder-decoder framework based new image captioning approach with a multi-layer gated recurrent unit is proposed. The Inception-v3 convolutional neural network is employed in the encoder due to its capability of more feature extraction from small regions. The proposed recurrent neural network-based decoder utilizes these features in the multi-layer gated recurrent unit to produce a natural language expression word-by-word. Experimental evaluations on the MSCOCO dataset demonstrate that our proposed approach has the advantage over existing approaches consistently across different evaluation metrics. With the integration of the proposed approach to our custom-designed Android application, named “VirtualEye+”, it has great potential to implement image captioning in daily routine
An Improved Bees Algorithm for Training Deep Recurrent Networks for Sentiment Classification
Recurrent neural networks (RNNs) are powerful tools for learning information from
temporal sequences. Designing an optimum deep RNN is difficult due to configuration and training
issues, such as vanishing and exploding gradients. In this paper, a novel metaheuristic optimisation
approach is proposed for training deep RNNs for the sentiment classification task. The approach
employs an enhanced Ternary Bees Algorithm (BA-3+), which operates for large dataset classification
problems by considering only three individual solutions in each iteration. BA-3+ combines the
collaborative search of three bees to find the optimal set of trainable parameters of the proposed deep
recurrent learning architecture. Local learning with exploitative search utilises the greedy selection
strategy. Stochastic gradient descent (SGD) learning with singular value decomposition (SVD) aims to
handle vanishing and exploding gradients of the decision parameters with the stabilisation strategy
of SVD. Global learning with explorative search achieves faster convergence without getting trapped
at local optima to find the optimal set of trainable parameters of the proposed deep recurrent learning
architecture. BA-3+ has been tested on the sentiment classification task to classify symmetric and
asymmetric distribution of the datasets from different domains, including Twitter, product reviews,
and movie reviews. Comparative results have been obtained for advanced deep language models and
Differential Evolution (DE) and Particle Swarm Optimization (PSO) algorithms. BA-3+ converged
to the global minimum faster than the DE and PSO algorithms, and it outperformed the SGD, DE,
and PSO algorithms for the Turkish and English datasets. The accuracy value and F1 measure have
improved at least with a 30–40% improvement than the standard SGD algorithm for all classification
datasets. Accuracy rates in the RNN model trained with BA-3+ ranged from 80% to 90%, while the
RNN trained with SGD was able to achieve between 50% and 60% for most datasets. The performance
of the RNN model with BA-3+ has as good as for Tree-LSTMs and Recursive Neural Tensor Networks
(RNTNs) language models, which achieved accuracy results of up to 90% for some datasets. The
improved accuracy and convergence results show that BA-3+ is an efficient, stable algorithm for the
complex classification task, and it can handle the vanishing and exploding gradients problem of
deep RNNs
New ideas and trends in deep multimodal content understanding: a review
The focus of this survey is on the analysis of two modalities of multimodal deep learning: image and text. Unlike classic reviews of deep learning where monomodal image classifiers such as VGG, ResNet and Inception module are central topics, this paper will examine recent multimodal deep models and structures, including auto-encoders, generative adversarial nets and their variants. These models go beyond the simple image classifiers in which they can do uni-directional (e.g. image captioning, image generation) and bi-directional (e.g. cross-modal retrieval, visual question answering) multimodal tasks. Besides, we analyze two aspects of the challenge in terms of better content understanding in deep multimodal applications. We then introduce current ideas and trends in deep multimodal feature learning, such as feature embedding approaches and objective function design, which are crucial in overcoming the aforementioned challenges. Finally, we include several promising directions for future research.Computer Systems, Imagery and Medi