227 research outputs found
Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder
We present Tweet2Vec, a novel method for generating general-purpose vector
representation of tweets. The model learns tweet embeddings using
character-level CNN-LSTM encoder-decoder. We trained our model on 3 million,
randomly selected English-language tweets. The model was evaluated using two
methods: tweet semantic similarity and tweet sentiment categorization,
outperforming the previous state-of-the-art in both tasks. The evaluations
demonstrate the power of the tweet embeddings generated by our model for
various tweet categorization tasks. The vector representations generated by our
model are generic, and hence can be applied to a variety of tasks. Though the
model presented in this paper is trained on English-language tweets, the method
presented can be used to learn tweet embeddings for different languages.Comment: SIGIR 2016, July 17-21, 2016, Pisa. Proceedings of SIGIR 2016. Pisa,
Italy (2016
Distributed Deep Learning for Question Answering
This paper is an empirical study of the distributed deep learning for
question answering subtasks: answer selection and question classification.
Comparison studies of SGD, MSGD, ADADELTA, ADAGRAD, ADAM/ADAMAX, RMSPROP,
DOWNPOUR and EASGD/EAMSGD algorithms have been presented. Experimental results
show that the distributed framework based on the message passing interface can
accelerate the convergence speed at a sublinear scale. This paper demonstrates
the importance of distributed training. For example, with 48 workers, a 24x
speedup is achievable for the answer selection task and running time is
decreased from 138.2 hours to 5.81 hours, which will increase the productivity
significantly.Comment: This paper will appear in the Proceeding of The 25th ACM
International Conference on Information and Knowledge Management (CIKM 2016),
Indianapolis, US
Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images
We address the problem of fine-grained action localization from temporally
untrimmed web videos. We assume that only weak video-level annotations are
available for training. The goal is to use these weak labels to identify
temporal segments corresponding to the actions, and learn models that
generalize to unconstrained web videos. We find that web images queried by
action names serve as well-localized highlights for many actions, but are
noisily labeled. To solve this problem, we propose a simple yet effective
method that takes weak video labels and noisy image labels as input, and
generates localized action frames as output. This is achieved by cross-domain
transfer between video frames and web images, using pre-trained deep
convolutional neural networks. We then use the localized action frames to train
action recognition models with long short-term memory networks. We collect a
fine-grained sports action data set FGA-240 of more than 130,000 YouTube
videos. It has 240 fine-grained actions under 85 sports activities. Convincing
results are shown on the FGA-240 data set, as well as the THUMOS 2014
localization data set with untrimmed training videos.Comment: Camera ready version for ACM Multimedia 201
A Hierarchical Recurrent Encoder-Decoder For Generative Context-Aware Query Suggestion
Users may strive to formulate an adequate textual query for their information
need. Search engines assist the users by presenting query suggestions. To
preserve the original search intent, suggestions should be context-aware and
account for the previous queries issued by the user. Achieving context
awareness is challenging due to data sparsity. We present a probabilistic
suggestion model that is able to account for sequences of previous queries of
arbitrary lengths. Our novel hierarchical recurrent encoder-decoder
architecture allows the model to be sensitive to the order of queries in the
context while avoiding data sparsity. Additionally, our model can suggest for
rare, or long-tail, queries. The produced suggestions are synthetic and are
sampled one word at a time, using computationally cheap decoding techniques.
This is in contrast to current synthetic suggestion models relying upon machine
learning pipelines and hand-engineered feature sets. Results show that it
outperforms existing context-aware approaches in a next query prediction
setting. In addition to query suggestion, our model is general enough to be
used in a variety of other applications.Comment: To appear in Conference of Information Knowledge and Management
(CIKM) 201
Energy-based temporal neural networks for imputing missing values
Imputing missing values in high dimensional time series is a difficult problem. There have been some approaches to the problem [11,8] where neural architectures were trained as probabilistic models of the data. However, we argue that this approach is not optimal. We propose to view temporal neural networks with latent variables as energy-based models and train them for missing value recovery directly. In this paper we introduce two energy-based models. The first model is based on a one dimensional convolution and the second model utilizes a recurrent neural network. We demonstrate how ideas from the energy-based learning framework can be used to train these models to recover missing values. The models are evaluated on a motion capture dataset
The Case for Learned Index Structures
Indexes are models: a B-Tree-Index can be seen as a model to map a key to the
position of a record within a sorted array, a Hash-Index as a model to map a
key to a position of a record within an unsorted array, and a BitMap-Index as a
model to indicate if a data record exists or not. In this exploratory research
paper, we start from this premise and posit that all existing index structures
can be replaced with other types of models, including deep-learning models,
which we term learned indexes. The key idea is that a model can learn the sort
order or structure of lookup keys and use this signal to effectively predict
the position or existence of records. We theoretically analyze under which
conditions learned indexes outperform traditional index structures and describe
the main challenges in designing learned index structures. Our initial results
show, that by using neural nets we are able to outperform cache-optimized
B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over
several real-world data sets. More importantly though, we believe that the idea
of replacing core components of a data management system through learned models
has far reaching implications for future systems designs and that this work
just provides a glimpse of what might be possible
Neural NILM: Deep Neural Networks Applied to Energy Disaggregation
Energy disaggregation estimates appliance-by-appliance electricity
consumption from a single meter that measures the whole home's electricity
demand. Recently, deep neural networks have driven remarkable improvements in
classification performance in neighbouring machine learning fields such as
image classification and automatic speech recognition. In this paper, we adapt
three deep neural network architectures to energy disaggregation: 1) a form of
recurrent neural network called `long short-term memory' (LSTM); 2) denoising
autoencoders; and 3) a network which regresses the start time, end time and
average power demand of each appliance activation. We use seven metrics to test
the performance of these algorithms on real aggregate power data from five
appliances. Tests are performed against a house not seen during training and
against houses seen during training. We find that all three neural nets achieve
better F1 scores (averaged over all five appliances) than either combinatorial
optimisation or factorial hidden Markov models and that our neural net
algorithms generalise well to an unseen house.Comment: To appear in ACM BuildSys'15, November 4--5, 2015, Seou
Neural Networks for Information Retrieval
Machine learning plays a role in many aspects of modern IR systems, and deep
learning is applied in all of them. The fast pace of modern-day research has
given rise to many different approaches for many different IR problems. The
amount of information available can be overwhelming both for junior students
and for experienced researchers looking for new research topics and directions.
Additionally, it is interesting to see what key insights into IR problems the
new technologies are able to give us. The aim of this full-day tutorial is to
give a clear overview of current tried-and-trusted neural methods in IR and how
they benefit IR research. It covers key architectures, as well as the most
promising future directions.Comment: Overview of full-day tutorial at SIGIR 201
- …