14,398 research outputs found
Accelerating recurrent neural network training using sequence bucketing and multi-GPU data parallelization
An efficient algorithm for recurrent neural network training is presented.
The approach increases the training speed for tasks where a length of the input
sequence may vary significantly. The proposed approach is based on the optimal
batch bucketing by input sequence length and data parallelization on multiple
graphical processing units. The baseline training performance without sequence
bucketing is compared with the proposed solution for a different number of
buckets. An example is given for the online handwriting recognition task using
an LSTM recurrent neural network. The evaluation is performed in terms of the
wall clock time, number of epochs, and validation loss value.Comment: 4 pages, 5 figures, Comments, 2016 IEEE First International
Conference on Data Stream Mining & Processing (DSMP), Lviv, 201
Deep Character-Level Click-Through Rate Prediction for Sponsored Search
Predicting the click-through rate of an advertisement is a critical component
of online advertising platforms. In sponsored search, the click-through rate
estimates the probability that a displayed advertisement is clicked by a user
after she submits a query to the search engine. Commercial search engines
typically rely on machine learning models trained with a large number of
features to make such predictions. This is inevitably requires a lot of
engineering efforts to define, compute, and select the appropriate features. In
this paper, we propose two novel approaches (one working at character level and
the other working at word level) that use deep convolutional neural networks to
predict the click-through rate of a query-advertisement pair. Specially, the
proposed architectures only consider the textual content appearing in a
query-advertisement pair as input, and produce as output a click-through rate
prediction. By comparing the character-level model with the word-level model,
we show that language representation can be learnt from scratch at character
level when trained on enough data. Through extensive experiments using billions
of query-advertisement pairs of a popular commercial search engine, we
demonstrate that both approaches significantly outperform a baseline model
built on well-selected text features and a state-of-the-art word2vec-based
approach. Finally, by combining the predictions of the deep models introduced
in this study with the prediction of the model in production of the same
commercial search engine, we significantly improve the accuracy and the
calibration of the click-through rate prediction of the production system.Comment: SIGIR2017, 10 page
Deep Interest Evolution Network for Click-Through Rate Prediction
Click-through rate~(CTR) prediction, whose goal is to estimate the
probability of the user clicks, has become one of the core tasks in advertising
systems. For CTR prediction model, it is necessary to capture the latent user
interest behind the user behavior data. Besides, considering the changing of
the external environment and the internal cognition, user interest evolves over
time dynamically. There are several CTR prediction methods for interest
modeling, while most of them regard the representation of behavior as the
interest directly, and lack specially modeling for latent interest behind the
concrete behavior. Moreover, few work consider the changing trend of interest.
In this paper, we propose a novel model, named Deep Interest Evolution
Network~(DIEN), for CTR prediction. Specifically, we design interest extractor
layer to capture temporal interests from history behavior sequence. At this
layer, we introduce an auxiliary loss to supervise interest extracting at each
step. As user interests are diverse, especially in the e-commerce system, we
propose interest evolving layer to capture interest evolving process that is
relative to the target item. At interest evolving layer, attention mechanism is
embedded into the sequential structure novelly, and the effects of relative
interests are strengthened during interest evolution. In the experiments on
both public and industrial datasets, DIEN significantly outperforms the
state-of-the-art solutions. Notably, DIEN has been deployed in the display
advertisement system of Taobao, and obtained 20.7\% improvement on CTR.Comment: 9 pages. Accepted by AAAI 201
Network On Network for Tabular Data Classification in Real-world Applications
Tabular data is the most common data format adopted by our customers ranging
from retail, finance to E-commerce, and tabular data classification plays an
essential role to their businesses. In this paper, we present Network On
Network (NON), a practical tabular data classification model based on deep
neural network to provide accurate predictions. Various deep methods have been
proposed and promising progress has been made. However, most of them use
operations like neural network and factorization machines to fuse the
embeddings of different features directly, and linearly combine the outputs of
those operations to get the final prediction. As a result, the intra-field
information and the non-linear interactions between those operations (e.g.
neural network and factorization machines) are ignored. Intra-field information
is the information that features inside each field belong to the same field.
NON is proposed to take full advantage of intra-field information and
non-linear interactions. It consists of three components: field-wise network at
the bottom to capture the intra-field information, across field network in the
middle to choose suitable operations data-drivenly, and operation fusion
network on the top to fuse outputs of the chosen operations deeply. Extensive
experiments on six real-world datasets demonstrate NON can outperform the
state-of-the-art models significantly. Furthermore, both qualitative and
quantitative study of the features in the embedding space show NON can capture
intra-field information effectively
- …