796 research outputs found
Broadcasting Convolutional Network for Visual Relational Reasoning
In this paper, we propose the Broadcasting Convolutional Network (BCN) that
extracts key object features from the global field of an entire input image and
recognizes their relationship with local features. BCN is a simple network
module that collects effective spatial features, embeds location information
and broadcasts them to the entire feature maps. We further introduce the
Multi-Relational Network (multiRN) that improves the existing Relation Network
(RN) by utilizing the BCN module. In pixel-based relation reasoning problems,
with the help of BCN, multiRN extends the concept of `pairwise relations' in
conventional RNs to `multiwise relations' by relating each object with multiple
objects at once. This yields in O(n) complexity for n objects, which is a vast
computational gain from RNs that take O(n^2). Through experiments, multiRN has
achieved a state-of-the-art performance on CLEVR dataset, which proves the
usability of BCN on relation reasoning problems.Comment: Accepted paper at ECCV 2018. 24 page
Decoding Sequence Classification Models for Acquiring New Biological Insights
Classifying biological sequences is one of the most important tasks in computational biology. In the last decade, support vector machines (SVMs) in combination with sequence kernels have emerged as a de-facto standard. These methods are theoretically well-founded, reliable, and provide high-accuracy solutions at low computational cost. However, obtaining a highly accurate classifier is rarely the end of the story in many practical situations. Instead, one often aims to acquire biological knowledge about the principles underlying a given classification task. SVMs with traditional sequence kernels do not offer a straightforward way of accessing this knowledge.

In this contribution, we propose a new approach to analyzing biological sequences on the basis of support vector machines with sequence kernels. We first extract explicit pattern weights from a given SVM. When classifying a sequence, we then compute a prediction profile by distributing the weight of each pattern to the sequence positions that match the pattern. The final profile not only allows assessing the importance of a position, but also determining for which class it is indicative. Since it is unfeasible to analyze profiles of all sequences in a given data set, we advocate using affinity propagation (AP) clustering to narrow down the analysis to a small set of typical sequences.

The proposed approach is applicable to a wide range of biological sequences and a wide selection of sequence kernels. To illustrate our framework, we present the prediction of oligomerization tendencies of coiled coil proteins as a case study.

The Long-Short Story of Movie Description
Generating descriptions for videos has many applications including assisting
blind people and human-robot interaction. The recent advances in image
captioning as well as the release of large-scale movie description datasets
such as MPII Movie Description allow to study this task in more depth. Many of
the proposed methods for image captioning rely on pre-trained object classifier
CNNs and Long-Short Term Memory recurrent networks (LSTMs) for generating
descriptions. While image description focuses on objects, we argue that it is
important to distinguish verbs, objects, and places in the challenging setting
of movie description. In this work we show how to learn robust visual
classifiers from the weak annotations of the sentence descriptions. Based on
these visual classifiers we learn how to generate a description using an LSTM.
We explore different design choices to build and train the LSTM and achieve the
best performance to date on the challenging MPII-MD dataset. We compare and
analyze our approach and prior work along various dimensions to better
understand the key challenges of the movie description task
ContextVP: Fully Context-Aware Video Prediction
Video prediction models based on convolutional networks, recurrent networks,
and their combinations often result in blurry predictions. We identify an
important contributing factor for imprecise predictions that has not been
studied adequately in the literature: blind spots, i.e., lack of access to all
relevant past information for accurately predicting the future. To address this
issue, we introduce a fully context-aware architecture that captures the entire
available past context for each pixel using Parallel Multi-Dimensional LSTM
units and aggregates it using blending units. Our model outperforms a strong
baseline network of 20 recurrent convolutional layers and yields
state-of-the-art performance for next step prediction on three challenging
real-world video datasets: Human 3.6M, Caltech Pedestrian, and UCF-101.
Moreover, it does so with fewer parameters than several recently proposed
models, and does not rely on deep convolutional networks, multi-scale
architectures, separation of background and foreground modeling, motion flow
learning, or adversarial training. These results highlight that full awareness
of past context is of crucial importance for video prediction.Comment: 19 pages. ECCV 2018 oral presentation. Project webpage is at
https://wonmin-byeon.github.io/publication/2018-ecc
Recurrent Fully Convolutional Neural Networks for Multi-slice MRI Cardiac Segmentation
In cardiac magnetic resonance imaging, fully-automatic segmentation of the
heart enables precise structural and functional measurements to be taken, e.g.
from short-axis MR images of the left-ventricle. In this work we propose a
recurrent fully-convolutional network (RFCN) that learns image representations
from the full stack of 2D slices and has the ability to leverage inter-slice
spatial dependences through internal memory units. RFCN combines anatomical
detection and segmentation into a single architecture that is trained
end-to-end thus significantly reducing computational time, simplifying the
segmentation pipeline, and potentially enabling real-time applications. We
report on an investigation of RFCN using two datasets, including the publicly
available MICCAI 2009 Challenge dataset. Comparisons have been carried out
between fully convolutional networks and deep restricted Boltzmann machines,
including a recurrent version that leverages inter-slice spatial correlation.
Our studies suggest that RFCN produces state-of-the-art results and can
substantially improve the delineation of contours near the apex of the heart.Comment: MICCAI Workshop RAMBO 201
Evolutionary multi-stage financial scenario tree generation
Multi-stage financial decision optimization under uncertainty depends on a
careful numerical approximation of the underlying stochastic process, which
describes the future returns of the selected assets or asset categories.
Various approaches towards an optimal generation of discrete-time,
discrete-state approximations (represented as scenario trees) have been
suggested in the literature. In this paper, a new evolutionary algorithm to
create scenario trees for multi-stage financial optimization models will be
presented. Numerical results and implementation details conclude the paper
Deep Tree Transductions - A Short Survey
The paper surveys recent extensions of the Long-Short Term Memory networks to
handle tree structures from the perspective of learning non-trivial forms of
isomorph structured transductions. It provides a discussion of modern TreeLSTM
models, showing the effect of the bias induced by the direction of tree
processing. An empirical analysis is performed on real-world benchmarks,
highlighting how there is no single model adequate to effectively approach all
transduction problems.Comment: To appear in the Proceedings of the 2019 INNS Big Data and Deep
Learning (INNSBDDL 2019). arXiv admin note: text overlap with
arXiv:1809.0909
Deep Autoencoder for Combined Human Pose Estimation and body Model Upscaling
We present a method for simultaneously estimating 3D human pose and body
shape from a sparse set of wide-baseline camera views. We train a symmetric
convolutional autoencoder with a dual loss that enforces learning of a latent
representation that encodes skeletal joint positions, and at the same time
learns a deep representation of volumetric body shape. We harness the latter to
up-scale input volumetric data by a factor of , whilst recovering a
3D estimate of joint positions with equal or greater accuracy than the state of
the art. Inference runs in real-time (25 fps) and has the potential for passive
human behaviour monitoring where there is a requirement for high fidelity
estimation of human body shape and pose
RePAD: Real-time Proactive Anomaly Detection for Time Series
During the past decade, many anomaly detection approaches have been
introduced in different fields such as network monitoring, fraud detection, and
intrusion detection. However, they require understanding of data pattern and
often need a long off-line period to build a model or network for the target
data. Providing real-time and proactive anomaly detection for streaming time
series without human intervention and domain knowledge is highly valuable since
it greatly reduces human effort and enables appropriate countermeasures to be
undertaken before a disastrous damage, failure, or other harmful event occurs.
However, this issue has not been well studied yet. To address it, this paper
proposes RePAD, which is a Real-time Proactive Anomaly Detection algorithm for
streaming time series based on Long Short-Term Memory (LSTM). RePAD utilizes
short-term historic data points to predict and determine whether or not the
upcoming data point is a sign that an anomaly is likely to happen in the near
future. By dynamically adjusting the detection threshold over time, RePAD is
able to tolerate minor pattern change in time series and detect anomalies
either proactively or on time. Experiments based on two time series datasets
collected from the Numenta Anomaly Benchmark demonstrate that RePAD is able to
proactively detect anomalies and provide early warnings in real time without
human intervention and domain knowledge.Comment: 12 pages, 8 figures, the 34th International Conference on Advanced
Information Networking and Applications (AINA 2020
Modeling the Temporal Nature of Human Behavior for Demographics Prediction
Mobile phone metadata is increasingly used for humanitarian purposes in
developing countries as traditional data is scarce. Basic demographic
information is however often absent from mobile phone datasets, limiting the
operational impact of the datasets. For these reasons, there has been a growing
interest in predicting demographic information from mobile phone metadata.
Previous work focused on creating increasingly advanced features to be modeled
with standard machine learning algorithms. We here instead model the raw mobile
phone metadata directly using deep learning, exploiting the temporal nature of
the patterns in the data. From high-level assumptions we design a data
representation and convolutional network architecture for modeling patterns
within a week. We then examine three strategies for aggregating patterns across
weeks and show that our method reaches state-of-the-art accuracy on both age
and gender prediction using only the temporal modality in mobile metadata. We
finally validate our method on low activity users and evaluate the modeling
assumptions.Comment: Accepted at ECML 2017. A previous version of this paper was titled
'Using Deep Learning to Predict Demographics from Mobile Phone Metadata' and
was accepted at the ICLR 2016 worksho
- …