90,757 research outputs found
Distributed Online Big Data Classification Using Context Information
Distributed, online data mining systems have emerged as a result of
applications requiring analysis of large amounts of correlated and
high-dimensional data produced by multiple distributed data sources. We propose
a distributed online data classification framework where data is gathered by
distributed data sources and processed by a heterogeneous set of distributed
learners which learn online, at run-time, how to classify the different data
streams either by using their locally available classification functions or by
helping each other by classifying each other's data. Importantly, since the
data is gathered at different locations, sending the data to another learner to
process incurs additional costs such as delays, and hence this will be only
beneficial if the benefits obtained from a better classification will exceed
the costs. We model the problem of joint classification by the distributed and
heterogeneous learners from multiple data sources as a distributed contextual
bandit problem where each data is characterized by a specific context. We
develop a distributed online learning algorithm for which we can prove
sublinear regret. Compared to prior work in distributed online data mining, our
work is the first to provide analytic regret results characterizing the
performance of the proposed algorithm
Representation Learning with Fine-grained Patterns
With the development of computational power and techniques for data
collection, deep learning demonstrates a superior performance over most of
existing algorithms on benchmark data sets. Many efforts have been devoted to
studying the mechanism of deep learning. One important observation is that deep
learning can learn the discriminative patterns from raw materials directly in a
task-dependent manner. Therefore, the representations obtained by deep learning
outperform hand-crafted features significantly. However, those patterns are
often learned from super-class labels due to a limited availability of
fine-grained labels, while fine-grained patterns are desired in many real-world
applications such as visual search in online shopping. To mitigate the
challenge, we propose an algorithm to learn the fine-grained patterns
sufficiently when only super-class labels are available. The effectiveness of
our method can be guaranteed with the theoretical analysis. Extensive
experiments on real-world data sets demonstrate that the proposed method can
significantly improve the performance on target tasks corresponding to
fine-grained classes, when only super-class information is available for
training
Cost-sensitive Learning for Utility Optimization in Online Advertising Auctions
One of the most challenging problems in computational advertising is the
prediction of click-through and conversion rates for bidding in online
advertising auctions. An unaddressed problem in previous approaches is the
existence of highly non-uniform misprediction costs. While for model evaluation
these costs have been taken into account through recently proposed
business-aware offline metrics -- such as the Utility metric which measures the
impact on advertiser profit -- this is not the case when training the models
themselves. In this paper, to bridge the gap, we formally analyze the
relationship between optimizing the Utility metric and the log loss, which is
considered as one of the state-of-the-art approaches in conversion modeling.
Our analysis motivates the idea of weighting the log loss with the business
value of the predicted outcome. We present and analyze a new cost weighting
scheme and show that significant gains in offline and online performance can be
achieved.Comment: First version of the paper was presented at NIPS 2015 Workshop on
E-Commerce: https://sites.google.com/site/nips15ecommerce/papers Third
version of the paper will be presented at AdKDD 2017 Workshop:
adkdd17.wixsite.com/adkddtargetad201
An Ensemble-based Approach to Click-Through Rate Prediction for Promoted Listings at Etsy
Etsy is a global marketplace where people across the world connect to make,
buy and sell unique goods. Sellers at Etsy can promote their product listings
via advertising campaigns similar to traditional sponsored search ads.
Click-Through Rate (CTR) prediction is an integral part of online search
advertising systems where it is utilized as an input to auctions which
determine the final ranking of promoted listings to a particular user for each
query. In this paper, we provide a holistic view of Etsy's promoted listings'
CTR prediction system and propose an ensemble learning approach which is based
on historical or behavioral signals for older listings as well as content-based
features for new listings. We obtain representations from texts and images by
utilizing state-of-the-art deep learning techniques and employ multimodal
learning to combine these different signals. We compare the system to
non-trivial baselines on a large-scale real world dataset from Etsy,
demonstrating the effectiveness of the model and strong correlations between
offline experiments and online performance. The paper is also the first
technical overview to this kind of product in e-commerce context
- …