278 research outputs found
Single Shot Temporal Action Detection
Temporal action detection is a very important yet challenging problem, since
videos in real applications are usually long, untrimmed and contain multiple
action instances. This problem requires not only recognizing action categories
but also detecting start time and end time of each action instance. Many
state-of-the-art methods adopt the "detection by classification" framework:
first do proposal, and then classify proposals. The main drawback of this
framework is that the boundaries of action instance proposals have been fixed
during the classification step. To address this issue, we propose a novel
Single Shot Action Detector (SSAD) network based on 1D temporal convolutional
layers to skip the proposal generation step via directly detecting action
instances in untrimmed video. On pursuit of designing a particular SSAD network
that can work effectively for temporal action detection, we empirically search
for the best network architecture of SSAD due to lacking existing models that
can be directly adopted. Moreover, we investigate into input feature types and
fusion strategies to further improve detection accuracy. We conduct extensive
experiments on two challenging datasets: THUMOS 2014 and MEXaction2. When
setting Intersection-over-Union threshold to 0.5 during evaluation, SSAD
significantly outperforms other state-of-the-art systems by increasing mAP from
19.0% to 24.6% on THUMOS 2014 and from 7.4% to 11.0% on MEXaction2.Comment: ACM Multimedia 201
Yeah, Right, Uh-Huh: A Deep Learning Backchannel Predictor
Using supporting backchannel (BC) cues can make human-computer interaction
more social. BCs provide a feedback from the listener to the speaker indicating
to the speaker that he is still listened to. BCs can be expressed in different
ways, depending on the modality of the interaction, for example as gestures or
acoustic cues. In this work, we only considered acoustic cues. We are proposing
an approach towards detecting BC opportunities based on acoustic input features
like power and pitch. While other works in the field rely on the use of a
hand-written rule set or specialized features, we made use of artificial neural
networks. They are capable of deriving higher order features from input
features themselves. In our setup, we first used a fully connected feed-forward
network to establish an updated baseline in comparison to our previously
proposed setup. We also extended this setup by the use of Long Short-Term
Memory (LSTM) networks which have shown to outperform feed-forward based setups
on various tasks. Our best system achieved an F1-Score of 0.37 using power and
pitch features. Adding linguistic information using word2vec, the score
increased to 0.39
Exploiting Cognitive Structure for Adaptive Learning
Adaptive learning, also known as adaptive teaching, relies on learning path
recommendation, which sequentially recommends personalized learning items
(e.g., lectures, exercises) to satisfy the unique needs of each learner.
Although it is well known that modeling the cognitive structure including
knowledge level of learners and knowledge structure (e.g., the prerequisite
relations) of learning items is important for learning path recommendation,
existing methods for adaptive learning often separately focus on either
knowledge levels of learners or knowledge structure of learning items. To fully
exploit the multifaceted cognitive structure for learning path recommendation,
we propose a Cognitive Structure Enhanced framework for Adaptive Learning,
named CSEAL. By viewing path recommendation as a Markov Decision Process and
applying an actor-critic algorithm, CSEAL can sequentially identify the right
learning items to different learners. Specifically, we first utilize a
recurrent neural network to trace the evolving knowledge levels of learners at
each learning step. Then, we design a navigation algorithm on the knowledge
structure to ensure the logicality of learning paths, which reduces the search
space in the decision process. Finally, the actor-critic algorithm is used to
determine what to learn next and whose parameters are dynamically updated along
the learning path. Extensive experiments on real-world data demonstrate the
effectiveness and robustness of CSEAL.Comment: Accepted by KDD 2019 Research Track. In Proceedings of the 25th ACM
SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD'19
Deep Reinforcement Learning for Join Order Enumeration
Join order selection plays a significant role in query performance. However,
modern query optimizers typically employ static join enumeration algorithms
that do not receive any feedback about the quality of the resulting plan.
Hence, optimizers often repeatedly choose the same bad plan, as they do not
have a mechanism for "learning from their mistakes". In this paper, we argue
that existing deep reinforcement learning techniques can be applied to address
this challenge. These techniques, powered by artificial neural networks, can
automatically improve decision making by incorporating feedback from their
successes and failures. Towards this goal, we present ReJOIN, a
proof-of-concept join enumerator, and present preliminary results indicating
that ReJOIN can match or outperform the PostgreSQL optimizer in terms of plan
quality and join enumeration efficiency
ContextVP: Fully Context-Aware Video Prediction
Video prediction models based on convolutional networks, recurrent networks,
and their combinations often result in blurry predictions. We identify an
important contributing factor for imprecise predictions that has not been
studied adequately in the literature: blind spots, i.e., lack of access to all
relevant past information for accurately predicting the future. To address this
issue, we introduce a fully context-aware architecture that captures the entire
available past context for each pixel using Parallel Multi-Dimensional LSTM
units and aggregates it using blending units. Our model outperforms a strong
baseline network of 20 recurrent convolutional layers and yields
state-of-the-art performance for next step prediction on three challenging
real-world video datasets: Human 3.6M, Caltech Pedestrian, and UCF-101.
Moreover, it does so with fewer parameters than several recently proposed
models, and does not rely on deep convolutional networks, multi-scale
architectures, separation of background and foreground modeling, motion flow
learning, or adversarial training. These results highlight that full awareness
of past context is of crucial importance for video prediction.Comment: 19 pages. ECCV 2018 oral presentation. Project webpage is at
https://wonmin-byeon.github.io/publication/2018-ecc
The Evolution of Neural Network-Based Chart Patterns: A Preliminary Study
A neural network-based chart pattern represents adaptive parametric features,
including non-linear transformations, and a template that can be applied in the
feature space. The search of neural network-based chart patterns has been
unexplored despite its potential expressiveness. In this paper, we formulate a
general chart pattern search problem to enable cross-representational
quantitative comparison of various search schemes. We suggest a HyperNEAT
framework applying state-of-the-art deep neural network techniques to find
attractive neural network-based chart patterns; These techniques enable a fast
evaluation and search of robust patterns, as well as bringing a performance
gain. The proposed framework successfully found attractive patterns on the
Korean stock market. We compared newly found patterns with those found by
different search schemes, showing the proposed approach has potential.Comment: 8 pages, In proceedings of Genetic and Evolutionary Computation
Conference (GECCO 2017), Berlin, German
Evaluating surgical skills from kinematic data using convolutional neural networks
The need for automatic surgical skills assessment is increasing, especially
because manual feedback from senior surgeons observing junior surgeons is prone
to subjectivity and time consuming. Thus, automating surgical skills evaluation
is a very important step towards improving surgical practice. In this paper, we
designed a Convolutional Neural Network (CNN) to evaluate surgeon skills by
extracting patterns in the surgeon motions performed in robotic surgery. The
proposed method is validated on the JIGSAWS dataset and achieved very
competitive results with 100% accuracy on the suturing and needle passing
tasks. While we leveraged from the CNNs efficiency, we also managed to mitigate
its black-box effect using class activation map. This feature allows our method
to automatically highlight which parts of the surgical task influenced the
skill prediction and can be used to explain the classification and to provide
personalized feedback to the trainee.Comment: Accepted at MICCAI 201
Neural Collaborative Filtering
In recent years, deep neural networks have yielded immense success on speech
recognition, computer vision and natural language processing. However, the
exploration of deep neural networks on recommender systems has received
relatively less scrutiny. In this work, we strive to develop techniques based
on neural networks to tackle the key problem in recommendation -- collaborative
filtering -- on the basis of implicit feedback. Although some recent work has
employed deep learning for recommendation, they primarily used it to model
auxiliary information, such as textual descriptions of items and acoustic
features of musics. When it comes to model the key factor in collaborative
filtering -- the interaction between user and item features, they still
resorted to matrix factorization and applied an inner product on the latent
features of users and items. By replacing the inner product with a neural
architecture that can learn an arbitrary function from data, we present a
general framework named NCF, short for Neural network-based Collaborative
Filtering. NCF is generic and can express and generalize matrix factorization
under its framework. To supercharge NCF modelling with non-linearities, we
propose to leverage a multi-layer perceptron to learn the user-item interaction
function. Extensive experiments on two real-world datasets show significant
improvements of our proposed NCF framework over the state-of-the-art methods.
Empirical evidence shows that using deeper layers of neural networks offers
better recommendation performance.Comment: 10 pages, 7 figure
Recurrent Latent Variable Networks for Session-Based Recommendation
In this work, we attempt to ameliorate the impact of data sparsity in the
context of session-based recommendation. Specifically, we seek to devise a
machine learning mechanism capable of extracting subtle and complex underlying
temporal dynamics in the observed session data, so as to inform the
recommendation algorithm. To this end, we improve upon systems that utilize
deep learning techniques with recurrently connected units; we do so by adopting
concepts from the field of Bayesian statistics, namely variational inference.
Our proposed approach consists in treating the network recurrent units as
stochastic latent variables with a prior distribution imposed over them. On
this basis, we proceed to infer corresponding posteriors; these can be used for
prediction and recommendation generation, in a way that accounts for the
uncertainty in the available sparse training data. To allow for our approach to
easily scale to large real-world datasets, we perform inference under an
approximate amortized variational inference (AVI) setup, whereby the learned
posteriors are parameterized via (conventional) neural networks. We perform an
extensive experimental evaluation of our approach using challenging benchmark
datasets, and illustrate its superiority over existing state-of-the-art
techniques
A Deep Learning Parameterization for Ozone Dry Deposition Velocities
The loss of ozone to terrestrial and aquatic systems, known as dry deposition, is a highly uncertain process governed by turbulent transport, interfacial chemistry, and plant physiology. We demonstrate the value of using Deep Neural Networks (DNN) in predicting ozone dry deposition velocities. We find that a feedforward DNN trained on observations from a coniferous forest site (Hyytiala, Finland) can predict hourly ozone dry deposition velocities at a mixed forest site (Harvard Forest, Massachusetts) more accurately than modern theoretical models, with a reduction in the normalized mean bias (0.05 versus similar to 0.1). The same DNN model, when driven by assimilated meteorology at 2 degrees x 2.5 degrees spatial resolution, outperforms the Wesely scheme as implemented in the GEOS-Chem model. With more available training data from other climate and ecological zones, this methodology could yield a generalizable DNN suitable for global models. Plain Language Summary Ozone in the lower atmosphere is a toxic pollutant and greenhouse gas. In this work, we use a machine learning technique known as deep learning, to simulate the loss of ozone to Earth's surface. We show that our deep learning simulation of this loss process outperforms existing traditional models and demonstrate the opportunity for using machine learning to improve our understanding of the chemical composition of the atmosphere.Peer reviewe
- …