336 research outputs found
Yeah, Right, Uh-Huh: A Deep Learning Backchannel Predictor
Using supporting backchannel (BC) cues can make human-computer interaction
more social. BCs provide a feedback from the listener to the speaker indicating
to the speaker that he is still listened to. BCs can be expressed in different
ways, depending on the modality of the interaction, for example as gestures or
acoustic cues. In this work, we only considered acoustic cues. We are proposing
an approach towards detecting BC opportunities based on acoustic input features
like power and pitch. While other works in the field rely on the use of a
hand-written rule set or specialized features, we made use of artificial neural
networks. They are capable of deriving higher order features from input
features themselves. In our setup, we first used a fully connected feed-forward
network to establish an updated baseline in comparison to our previously
proposed setup. We also extended this setup by the use of Long Short-Term
Memory (LSTM) networks which have shown to outperform feed-forward based setups
on various tasks. Our best system achieved an F1-Score of 0.37 using power and
pitch features. Adding linguistic information using word2vec, the score
increased to 0.39
Exploiting Cognitive Structure for Adaptive Learning
Adaptive learning, also known as adaptive teaching, relies on learning path
recommendation, which sequentially recommends personalized learning items
(e.g., lectures, exercises) to satisfy the unique needs of each learner.
Although it is well known that modeling the cognitive structure including
knowledge level of learners and knowledge structure (e.g., the prerequisite
relations) of learning items is important for learning path recommendation,
existing methods for adaptive learning often separately focus on either
knowledge levels of learners or knowledge structure of learning items. To fully
exploit the multifaceted cognitive structure for learning path recommendation,
we propose a Cognitive Structure Enhanced framework for Adaptive Learning,
named CSEAL. By viewing path recommendation as a Markov Decision Process and
applying an actor-critic algorithm, CSEAL can sequentially identify the right
learning items to different learners. Specifically, we first utilize a
recurrent neural network to trace the evolving knowledge levels of learners at
each learning step. Then, we design a navigation algorithm on the knowledge
structure to ensure the logicality of learning paths, which reduces the search
space in the decision process. Finally, the actor-critic algorithm is used to
determine what to learn next and whose parameters are dynamically updated along
the learning path. Extensive experiments on real-world data demonstrate the
effectiveness and robustness of CSEAL.Comment: Accepted by KDD 2019 Research Track. In Proceedings of the 25th ACM
SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD'19
Single Shot Temporal Action Detection
Temporal action detection is a very important yet challenging problem, since
videos in real applications are usually long, untrimmed and contain multiple
action instances. This problem requires not only recognizing action categories
but also detecting start time and end time of each action instance. Many
state-of-the-art methods adopt the "detection by classification" framework:
first do proposal, and then classify proposals. The main drawback of this
framework is that the boundaries of action instance proposals have been fixed
during the classification step. To address this issue, we propose a novel
Single Shot Action Detector (SSAD) network based on 1D temporal convolutional
layers to skip the proposal generation step via directly detecting action
instances in untrimmed video. On pursuit of designing a particular SSAD network
that can work effectively for temporal action detection, we empirically search
for the best network architecture of SSAD due to lacking existing models that
can be directly adopted. Moreover, we investigate into input feature types and
fusion strategies to further improve detection accuracy. We conduct extensive
experiments on two challenging datasets: THUMOS 2014 and MEXaction2. When
setting Intersection-over-Union threshold to 0.5 during evaluation, SSAD
significantly outperforms other state-of-the-art systems by increasing mAP from
19.0% to 24.6% on THUMOS 2014 and from 7.4% to 11.0% on MEXaction2.Comment: ACM Multimedia 201
ContextVP: Fully Context-Aware Video Prediction
Video prediction models based on convolutional networks, recurrent networks,
and their combinations often result in blurry predictions. We identify an
important contributing factor for imprecise predictions that has not been
studied adequately in the literature: blind spots, i.e., lack of access to all
relevant past information for accurately predicting the future. To address this
issue, we introduce a fully context-aware architecture that captures the entire
available past context for each pixel using Parallel Multi-Dimensional LSTM
units and aggregates it using blending units. Our model outperforms a strong
baseline network of 20 recurrent convolutional layers and yields
state-of-the-art performance for next step prediction on three challenging
real-world video datasets: Human 3.6M, Caltech Pedestrian, and UCF-101.
Moreover, it does so with fewer parameters than several recently proposed
models, and does not rely on deep convolutional networks, multi-scale
architectures, separation of background and foreground modeling, motion flow
learning, or adversarial training. These results highlight that full awareness
of past context is of crucial importance for video prediction.Comment: 19 pages. ECCV 2018 oral presentation. Project webpage is at
https://wonmin-byeon.github.io/publication/2018-ecc
Deep Reinforcement Learning for Join Order Enumeration
Join order selection plays a significant role in query performance. However,
modern query optimizers typically employ static join enumeration algorithms
that do not receive any feedback about the quality of the resulting plan.
Hence, optimizers often repeatedly choose the same bad plan, as they do not
have a mechanism for "learning from their mistakes". In this paper, we argue
that existing deep reinforcement learning techniques can be applied to address
this challenge. These techniques, powered by artificial neural networks, can
automatically improve decision making by incorporating feedback from their
successes and failures. Towards this goal, we present ReJOIN, a
proof-of-concept join enumerator, and present preliminary results indicating
that ReJOIN can match or outperform the PostgreSQL optimizer in terms of plan
quality and join enumeration efficiency
The Evolution of Neural Network-Based Chart Patterns: A Preliminary Study
A neural network-based chart pattern represents adaptive parametric features,
including non-linear transformations, and a template that can be applied in the
feature space. The search of neural network-based chart patterns has been
unexplored despite its potential expressiveness. In this paper, we formulate a
general chart pattern search problem to enable cross-representational
quantitative comparison of various search schemes. We suggest a HyperNEAT
framework applying state-of-the-art deep neural network techniques to find
attractive neural network-based chart patterns; These techniques enable a fast
evaluation and search of robust patterns, as well as bringing a performance
gain. The proposed framework successfully found attractive patterns on the
Korean stock market. We compared newly found patterns with those found by
different search schemes, showing the proposed approach has potential.Comment: 8 pages, In proceedings of Genetic and Evolutionary Computation
Conference (GECCO 2017), Berlin, German
Recurrent Latent Variable Networks for Session-Based Recommendation
In this work, we attempt to ameliorate the impact of data sparsity in the
context of session-based recommendation. Specifically, we seek to devise a
machine learning mechanism capable of extracting subtle and complex underlying
temporal dynamics in the observed session data, so as to inform the
recommendation algorithm. To this end, we improve upon systems that utilize
deep learning techniques with recurrently connected units; we do so by adopting
concepts from the field of Bayesian statistics, namely variational inference.
Our proposed approach consists in treating the network recurrent units as
stochastic latent variables with a prior distribution imposed over them. On
this basis, we proceed to infer corresponding posteriors; these can be used for
prediction and recommendation generation, in a way that accounts for the
uncertainty in the available sparse training data. To allow for our approach to
easily scale to large real-world datasets, we perform inference under an
approximate amortized variational inference (AVI) setup, whereby the learned
posteriors are parameterized via (conventional) neural networks. We perform an
extensive experimental evaluation of our approach using challenging benchmark
datasets, and illustrate its superiority over existing state-of-the-art
techniques
Evaluating surgical skills from kinematic data using convolutional neural networks
The need for automatic surgical skills assessment is increasing, especially
because manual feedback from senior surgeons observing junior surgeons is prone
to subjectivity and time consuming. Thus, automating surgical skills evaluation
is a very important step towards improving surgical practice. In this paper, we
designed a Convolutional Neural Network (CNN) to evaluate surgeon skills by
extracting patterns in the surgeon motions performed in robotic surgery. The
proposed method is validated on the JIGSAWS dataset and achieved very
competitive results with 100% accuracy on the suturing and needle passing
tasks. While we leveraged from the CNNs efficiency, we also managed to mitigate
its black-box effect using class activation map. This feature allows our method
to automatically highlight which parts of the surgical task influenced the
skill prediction and can be used to explain the classification and to provide
personalized feedback to the trainee.Comment: Accepted at MICCAI 201
Neural Collaborative Filtering
In recent years, deep neural networks have yielded immense success on speech
recognition, computer vision and natural language processing. However, the
exploration of deep neural networks on recommender systems has received
relatively less scrutiny. In this work, we strive to develop techniques based
on neural networks to tackle the key problem in recommendation -- collaborative
filtering -- on the basis of implicit feedback. Although some recent work has
employed deep learning for recommendation, they primarily used it to model
auxiliary information, such as textual descriptions of items and acoustic
features of musics. When it comes to model the key factor in collaborative
filtering -- the interaction between user and item features, they still
resorted to matrix factorization and applied an inner product on the latent
features of users and items. By replacing the inner product with a neural
architecture that can learn an arbitrary function from data, we present a
general framework named NCF, short for Neural network-based Collaborative
Filtering. NCF is generic and can express and generalize matrix factorization
under its framework. To supercharge NCF modelling with non-linearities, we
propose to leverage a multi-layer perceptron to learn the user-item interaction
function. Extensive experiments on two real-world datasets show significant
improvements of our proposed NCF framework over the state-of-the-art methods.
Empirical evidence shows that using deeper layers of neural networks offers
better recommendation performance.Comment: 10 pages, 7 figure
Variational Deep Semantic Hashing for Text Documents
As the amount of textual data has been rapidly increasing over the past
decade, efficient similarity search methods have become a crucial component of
large-scale information retrieval systems. A popular strategy is to represent
original data samples by compact binary codes through hashing. A spectrum of
machine learning methods have been utilized, but they often lack expressiveness
and flexibility in modeling to learn effective representations. The recent
advances of deep learning in a wide range of applications has demonstrated its
capability to learn robust and powerful feature representations for complex
data. Especially, deep generative models naturally combine the expressiveness
of probabilistic generative models with the high capacity of deep neural
networks, which is very suitable for text modeling. However, little work has
leveraged the recent progress in deep learning for text hashing.
In this paper, we propose a series of novel deep document generative models
for text hashing. The first proposed model is unsupervised while the second one
is supervised by utilizing document labels/tags for hashing. The third model
further considers document-specific factors that affect the generation of
words. The probabilistic generative formulation of the proposed models provides
a principled framework for model extension, uncertainty estimation, simulation,
and interpretability. Based on variational inference and reparameterization,
the proposed models can be interpreted as encoder-decoder deep neural networks
and thus they are capable of learning complex nonlinear distributed
representations of the original documents. We conduct a comprehensive set of
experiments on four public testbeds. The experimental results have demonstrated
the effectiveness of the proposed supervised learning models for text hashing.Comment: 11 pages, 4 figure
- …
