58 research outputs found
Generating Diverse and Meaningful Captions: Unsupervised Specificity Optimization for Image Captioning
Image Captioning is a task that requires models to acquire a multi-modal understanding of the world and to express this understanding in natural language text. While the state-of-the-art for this task has rapidly improved in terms of n-gram metrics, these models tend to output the same generic captions for similar images. In this work, we address this limitation and train a model that generates more diverse and specific captions through an unsupervised training approach that incorporates a learning signal from an Image Retrieval model. We summarize previous results and improve the state-of-the-art on caption diversity and novelty.
We make our source code publicly available online: https://github.com/AnnikaLindh/Diverse_and_Specific_Image_Captionin
Type-2 diabetes mellitus diagnosis from time series clinical data using deep learning models.
Clinical data is usually observed and recorded at irregular intervals and includes: evaluations, treatments, vital sign and lab test results. These provide an invaluable source of information to help diagnose and understand medical conditions. In this work, we introduce the largest patient records dataset in diabetes research: King Abdullah International Research Centre Diabetes (KAIMRCD) which includes over 14k patient data. KAIMRCD contains detailed information about the patient’s visit and have been labelled against T2DM by clinicians. The data is processed as time series and then investigated using temporal predictive Deep Learning models with the goal of diagnosing Type 2 Diabetes Mellitus (T2DM). Long Short-Term Memory (LSTM) and Gated-Recurrent Unit (GRU) are trained on KAIMRCD and are demonstrated here to outperform classical machine learning approaches in the literature with over 97% accuracy
From Pointwise to Powerhouse: Initialising Neural Networks with Generative Models
Traditional initialisation methods, e.g. He and Xavier, have been effective
in avoiding the problem of vanishing or exploding gradients in neural networks.
However, they only use simple pointwise distributions, which model
one-dimensional variables. Moreover, they ignore most information about the
architecture and disregard past training experiences. These limitations can be
overcome by employing generative models for initialisation. In this paper, we
introduce two groups of new initialisation methods. First, we locally
initialise weight groups by employing variational autoencoders. Secondly, we
globally initialise full weight sets by employing graph hypernetworks. We
thoroughly evaluate the impact of the employed generative models on
state-of-the-art neural networks in terms of accuracy, convergence speed and
ensembling. Our results show that global initialisations result in higher
accuracy and faster initial convergence speed. However, the implementation
through graph hypernetworks leads to diminished ensemble performance on out of
distribution data. To counteract, we propose a modification called noise graph
hypernetwork, which encourages diversity in the produced ensemble members.
Furthermore, our approach might be able to transfer learned knowledge to
different image distributions. Our work provides insights into the potential,
the trade-offs and possible modifications of these new initialisation methods
Representation Learning with Fine-grained Patterns
With the development of computational power and techniques for data
collection, deep learning demonstrates a superior performance over most of
existing algorithms on benchmark data sets. Many efforts have been devoted to
studying the mechanism of deep learning. One important observation is that deep
learning can learn the discriminative patterns from raw materials directly in a
task-dependent manner. Therefore, the representations obtained by deep learning
outperform hand-crafted features significantly. However, those patterns are
often learned from super-class labels due to a limited availability of
fine-grained labels, while fine-grained patterns are desired in many real-world
applications such as visual search in online shopping. To mitigate the
challenge, we propose an algorithm to learn the fine-grained patterns
sufficiently when only super-class labels are available. The effectiveness of
our method can be guaranteed with the theoretical analysis. Extensive
experiments on real-world data sets demonstrate that the proposed method can
significantly improve the performance on target tasks corresponding to
fine-grained classes, when only super-class information is available for
training
A Comprehensive Overview and Comparative Analysis on Deep Learning Models: CNN, RNN, LSTM, GRU
Deep learning (DL) has emerged as a powerful subset of machine learning (ML)
and artificial intelligence (AI), outperforming traditional ML methods,
especially in handling unstructured and large datasets. Its impact spans across
various domains, including speech recognition, healthcare, autonomous vehicles,
cybersecurity, predictive analytics, and more. However, the complexity and
dynamic nature of real-world problems present challenges in designing effective
deep learning models. Consequently, several deep learning models have been
developed to address different problems and applications. In this article, we
conduct a comprehensive survey of various deep learning models, including
Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs),
Generative Models, Deep Reinforcement Learning (DRL), and Deep Transfer
Learning. We examine the structure, applications, benefits, and limitations of
each model. Furthermore, we perform an analysis using three publicly available
datasets: IMDB, ARAS, and Fruit-360. We compare the performance of six renowned
deep learning models: CNN, Simple RNN, Long Short-Term Memory (LSTM),
Bidirectional LSTM, Gated Recurrent Unit (GRU), and Bidirectional GRU.Comment: 16 pages, 29 figure
Using Regular Languages to Explore the Representational Capacity of Recurrent Neural Architectures
The presence of Long Distance Dependencies (LDDs) in sequential data poses significant challenges for computational models. Various recurrent neural architectures have been designed to mitigate this issue. In order to test these state-of-the-art architectures, there is growing need for rich benchmarking datasets. However, one of the drawbacks of existing datasets is the lack of experimental control with regards to the presence and/or degree of LDDs. This lack of control limits the analysis of model performance in relation to the specific challenge posed by LDDs. One way to address this is to use synthetic data having the properties of subregular languages. The degree of LDDs within the generated data can be controlled through the k parameter, length of the generated strings, and by choosing appropriate forbidden strings. In this paper, we explore the capacity of different RNN extensions to model LDDs, by evaluating these models on a sequence of SPk synthesized datasets, where each subsequent dataset exhibits a longer degree of LDD. Even though SPk are simple languages, the presence of LDDs does have significant impact on the performance of recurrent neural architectures, thus making them prime candidate in benchmarking tasks. © Springer Nature Switzerland AG 2018
Graph Classification with Kernels, Embeddings and Convolutional Neural Networks
In the graph classification problem, given is a family of graphs and a group of different categories, and we aim to classify all the graphs (of the family) into the given categories. Earlier approaches, such as graph kernels and graph embedding techniques have focused on extracting certain features by processing the entire graph. However, real world graphs are complex and noisy and these traditional approaches are computationally intensive. With the introduction of the deep learning framework, there have been numerous attempts to create more efficient classification approaches. We modify a kernel graph convolutional neural network approach, that extracts subgraphs (patches) from the graph using various community detection algorithms. These patches are provided as input to a graph kernel and max pooling is applied. We use different community detection algorithms and a shortest path graph kernel and compare their efficiency and performance. In this paper we compare three methods: a graph kernel, an embedding technique and one that uses convolutional neural networks by using eight real world datasets, ranging from biological to social networks
Graph Neural Networks and Reinforcement Learning for Behavior Generation in Semantic Environments
Most reinforcement learning approaches used in behavior generation utilize
vectorial information as input. However, this requires the network to have a
pre-defined input-size -- in semantic environments this means assuming the
maximum number of vehicles. Additionally, this vectorial representation is not
invariant to the order and number of vehicles. To mitigate the above-stated
disadvantages, we propose combining graph neural networks with actor-critic
reinforcement learning. As graph neural networks apply the same network to
every vehicle and aggregate incoming edge information, they are invariant to
the number and order of vehicles. This makes them ideal candidates to be used
as networks in semantic environments -- environments consisting of objects
lists. Graph neural networks exhibit some other advantages that make them
favorable to be used in semantic environments. The relational information is
explicitly given and does not have to be inferred. Moreover, graph neural
networks propagate information through the network and can gather higher-degree
information. We demonstrate our approach using a highway lane-change scenario
and compare the performance of graph neural networks to conventional ones. We
show that graph neural networks are capable of handling scenarios with a
varying number and order of vehicles during training and application
Handwriting-Based Gender Classification Using End-to-End Deep Neural Networks
Handwriting-based gender classification is a well-researched problem that has
been approached mainly by traditional machine learning techniques. In this
paper, we propose a novel deep learning-based approach for this task.
Specifically, we present a convolutional neural network (CNN), which performs
automatic feature extraction from a given handwritten image, followed by
classification of the writer's gender. Also, we introduce a new dataset of
labeled handwritten samples, in Hebrew and English, of 405 participants.
Comparing the gender classification accuracy on this dataset against human
examiners, our results show that the proposed deep learning-based approach is
substantially more accurate than that of humans
- …