614 research outputs found
Semantic Object Parsing with Graph LSTM
By taking the semantic object parsing task as an exemplar application
scenario, we propose the Graph Long Short-Term Memory (Graph LSTM) network,
which is the generalization of LSTM from sequential data or multi-dimensional
data to general graph-structured data. Particularly, instead of evenly and
fixedly dividing an image to pixels or patches in existing multi-dimensional
LSTM structures (e.g., Row, Grid and Diagonal LSTMs), we take each
arbitrary-shaped superpixel as a semantically consistent node, and adaptively
construct an undirected graph for each image, where the spatial relations of
the superpixels are naturally used as edges. Constructed on such an adaptive
graph topology, the Graph LSTM is more naturally aligned with the visual
patterns in the image (e.g., object boundaries or appearance similarities) and
provides a more economical information propagation route. Furthermore, for each
optimization step over Graph LSTM, we propose to use a confidence-driven scheme
to update the hidden and memory states of nodes progressively till all nodes
are updated. In addition, for each node, the forgets gates are adaptively
learned to capture different degrees of semantic correlation with neighboring
nodes. Comprehensive evaluations on four diverse semantic object parsing
datasets well demonstrate the significant superiority of our Graph LSTM over
other state-of-the-art solutions.Comment: 18 page
Comparison of System Call Representations for Intrusion Detection
Over the years, artificial neural networks have been applied successfully in
many areas including IT security. Yet, neural networks can only process
continuous input data. This is particularly challenging for security-related
non-continuous data like system calls. This work focuses on four different
options to preprocess sequences of system calls so that they can be processed
by neural networks. These input options are based on one-hot encoding and
learning word2vec or GloVe representations of system calls. As an additional
option, we analyze if the mapping of system calls to their respective kernel
modules is an adequate generalization step for (a) replacing system calls or
(b) enhancing system call data with additional information regarding their
context. However, when performing such preprocessing steps it is important to
ensure that no relevant information is lost during the process. The overall
objective of system call based intrusion detection is to categorize sequences
of system calls as benign or malicious behavior. Therefore, this scenario is
used to evaluate the different input options as a classification task. The
results show, that each of the four different methods is a valid option when
preprocessing input data, but the use of kernel modules only is not recommended
because too much information is being lost during the mapping process.Comment: 12 pages, 1 figure, submitted to CISIS 201
Regularized Neural User Model for Goal-Oriented Spoken Dialogue Systems
User simulation is widely used to generate artificial dialogues in order to train statistical spoken dialogue systems and perform evaluations. This paper presents a neural network approach for user modeling that exploits an encoder-decoder bidirectional architecture with a regularization layer for each dialogue act. In order to minimize the impact of data sparsity, the dialogue act space is compressed according to the user goal. Experiments on the Dialogue State Tracking Challenge 2 (DSTC2) dataset provide significant results at dialogue act and slot level predictions, outperforming previous neural user modeling approaches in terms of F1 score.Spanish Minister of Science under grants TIN2014-54288-C4-4-R and TIN2017-85854-C4-3-R and by the EU H2020 EMPATHIC project grant number 769872
Modeling the Temporal Nature of Human Behavior for Demographics Prediction
Mobile phone metadata is increasingly used for humanitarian purposes in
developing countries as traditional data is scarce. Basic demographic
information is however often absent from mobile phone datasets, limiting the
operational impact of the datasets. For these reasons, there has been a growing
interest in predicting demographic information from mobile phone metadata.
Previous work focused on creating increasingly advanced features to be modeled
with standard machine learning algorithms. We here instead model the raw mobile
phone metadata directly using deep learning, exploiting the temporal nature of
the patterns in the data. From high-level assumptions we design a data
representation and convolutional network architecture for modeling patterns
within a week. We then examine three strategies for aggregating patterns across
weeks and show that our method reaches state-of-the-art accuracy on both age
and gender prediction using only the temporal modality in mobile metadata. We
finally validate our method on low activity users and evaluate the modeling
assumptions.Comment: Accepted at ECML 2017. A previous version of this paper was titled
'Using Deep Learning to Predict Demographics from Mobile Phone Metadata' and
was accepted at the ICLR 2016 worksho
Deep Recurrent Modelling of Stationary Bitcoin Price Formation Using the Order Flow
In this paper we propose a deep recurrent model based on the order flow for
the stationary modelling of the high-frequency directional prices movements.
The order flow is the microsecond stream of orders arriving at the exchange,
driving the formation of prices seen on the price chart of a stock or currency.
To test the stationarity of our proposed model we train our model on data
before the 2017 Bitcoin bubble period and test our model during and after the
bubble. We show that without any retraining, the proposed model is temporally
stable even as Bitcoin trading shifts into an extremely volatile "bubble
trouble" period. The significance of the result is shown by benchmarking
against existing state-of-the-art models in the literature for modelling price
formation using deep learning.Comment: 10 pages, The 19th International Conference on Artificial
Intelligence and Soft Computin
Visual Text Correction
Videos, images, and sentences are mediums that can express the same
semantics. One can imagine a picture by reading a sentence or can describe a
scene with some words. However, even small changes in a sentence can cause a
significant semantic inconsistency with the corresponding video/image. For
example, by changing the verb of a sentence, the meaning may drastically
change. There have been many efforts to encode a video/sentence and decode it
as a sentence/video. In this research, we study a new scenario in which both
the sentence and the video are given, but the sentence is inaccurate. A
semantic inconsistency between the sentence and the video or between the words
of a sentence can result in an inaccurate description. This paper introduces a
new problem, called Visual Text Correction (VTC), i.e., finding and replacing
an inaccurate word in the textual description of a video. We propose a deep
network that can simultaneously detect an inaccuracy in a sentence, and fix it
by replacing the inaccurate word(s). Our method leverages the semantic
interdependence of videos and words, as well as the short-term and long-term
relations of the words in a sentence. In our formulation, part of a visual
feature vector for every single word is dynamically selected through a gating
process. Furthermore, to train and evaluate our model, we propose an approach
to automatically construct a large dataset for VTC problem. Our experiments and
performance analysis demonstrates that the proposed method provides very good
results and also highlights the general challenges in solving the VTC problem.
To the best of our knowledge, this work is the first of its kind for the Visual
Text Correction task
Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition
Good old on-line back-propagation for plain multi-layer perceptrons yields a
very low 0.35% error rate on the famous MNIST handwritten digits benchmark. All
we need to achieve this best result so far are many hidden layers, many neurons
per layer, numerous deformed training images, and graphics cards to greatly
speed up learning.Comment: 14 pages, 2 figures, 4 listing
Peak Forecasting for Battery-based Energy Optimizations in Campus Microgrids
Battery-based energy storage has emerged as an enabling technology for a
variety of grid energy optimizations, such as peak shaving and cost arbitrage.
A key component of battery-driven peak shaving optimizations is peak
forecasting, which predicts the hours of the day that see the greatest demand.
While there has been significant prior work on load forecasting, we argue that
the problem of predicting periods where the demand peaks for individual
consumers or micro-grids is more challenging than forecasting load at a grid
scale. We propose a new model for peak forecasting, based on deep learning,
that predicts the k hours of each day with the highest and lowest demand. We
evaluate our approach using a two year trace from a real micro-grid of 156
buildings and show that it outperforms the state of the art load forecasting
techniques adapted for peak predictions by 11-32%. When used for battery-based
peak shaving, our model yields annual savings of $496,320 for a 4 MWhr battery
for this micro-grid.Comment: 5 pages. 4 figures, This paper will appear in the Proceedings of ACM
International Conference on Future Energy Systems (e-Energy'20), June 202
Investigation on N-gram Approximated RNNLMs for Recognition of Morphologically Rich Speech
Recognition of Hungarian conversational telephone speech is challenging due
to the informal style and morphological richness of the language. Recurrent
Neural Network Language Model (RNNLM) can provide remedy for the high
perplexity of the task; however, two-pass decoding introduces a considerable
processing delay. In order to eliminate this delay we investigate approaches
aiming at the complexity reduction of RNNLM, while preserving its accuracy. We
compare the performance of conventional back-off n-gram language models (BNLM),
BNLM approximation of RNNLMs (RNN-BNLM) and RNN n-grams in terms of perplexity
and word error rate (WER). Morphological richness is often addressed by using
statistically derived subwords - morphs - in the language models, hence our
investigations are extended to morph-based models, as well. We found that using
RNN-BNLMs 40% of the RNNLM perplexity reduction can be recovered, which is
roughly equal to the performance of a RNN 4-gram model. Combining morph-based
modeling and approximation of RNNLM, we were able to achieve 8% relative WER
reduction and preserve real-time operation of our conversational telephone
speech recognition system.Comment: 12 pages, 2 figures, accepted for publication at SLSP 201
- …