425,747 research outputs found
Deep Character-Level Click-Through Rate Prediction for Sponsored Search
Predicting the click-through rate of an advertisement is a critical component
of online advertising platforms. In sponsored search, the click-through rate
estimates the probability that a displayed advertisement is clicked by a user
after she submits a query to the search engine. Commercial search engines
typically rely on machine learning models trained with a large number of
features to make such predictions. This is inevitably requires a lot of
engineering efforts to define, compute, and select the appropriate features. In
this paper, we propose two novel approaches (one working at character level and
the other working at word level) that use deep convolutional neural networks to
predict the click-through rate of a query-advertisement pair. Specially, the
proposed architectures only consider the textual content appearing in a
query-advertisement pair as input, and produce as output a click-through rate
prediction. By comparing the character-level model with the word-level model,
we show that language representation can be learnt from scratch at character
level when trained on enough data. Through extensive experiments using billions
of query-advertisement pairs of a popular commercial search engine, we
demonstrate that both approaches significantly outperform a baseline model
built on well-selected text features and a state-of-the-art word2vec-based
approach. Finally, by combining the predictions of the deep models introduced
in this study with the prediction of the model in production of the same
commercial search engine, we significantly improve the accuracy and the
calibration of the click-through rate prediction of the production system.Comment: SIGIR2017, 10 page
Following the Money: How the 50 States Rate in Providing Online Access to Government Spending Data
Grades states' efforts to provide public spending data through Web portals; lists the benefits of "transparency 2.0," including cost-efficient and targeted spending; and outlines best practices for comprehensive, one-stop, one-click searchable sites
You Must Have Clicked on this Ad by Mistake! Data-Driven Identification of Accidental Clicks on Mobile Ads with Applications to Advertiser Cost Discounting and Click-Through Rate Prediction
In the cost per click (CPC) pricing model, an advertiser pays an ad network
only when a user clicks on an ad; in turn, the ad network gives a share of that
revenue to the publisher where the ad was impressed. Still, advertisers may be
unsatisfied with ad networks charging them for "valueless" clicks, or so-called
accidental clicks. [...] Charging advertisers for such clicks is detrimental in
the long term as the advertiser may decide to run their campaigns on other ad
networks. In addition, machine-learned click models trained to predict which ad
will bring the highest revenue may overestimate an ad click-through rate, and
as a consequence negatively impacting revenue for both the ad network and the
publisher. In this work, we propose a data-driven method to detect accidental
clicks from the perspective of the ad network. We collect observations of time
spent by users on a large set of ad landing pages - i.e., dwell time. We notice
that the majority of per-ad distributions of dwell time fit to a mixture of
distributions, where each component may correspond to a particular type of
clicks, the first one being accidental. We then estimate dwell time thresholds
of accidental clicks from that component. Using our method to identify
accidental clicks, we then propose a technique that smoothly discounts the
advertiser's cost of accidental clicks at billing time. Experiments conducted
on a large dataset of ads served on Yahoo mobile apps confirm that our
thresholds are stable over time, and revenue loss in the short term is
marginal. We also compare the performance of an existing machine-learned click
model trained on all ad clicks with that of the same model trained only on
non-accidental clicks. There, we observe an increase in both ad click-through
rate (+3.9%) and revenue (+0.2%) on ads served by the Yahoo Gemini network when
using the latter. [...
Tuning an Online Shop: Consumer Reactions to E-tailers' Service Quality
This paper investigates the impact of service quality in e-tailing on site visits and consumer demand (approximated by the last-click- through concept). We use a large representative data set obtained from a price-comparison site which covers most of the national (Austrian) market on e-tailing. Customers' valuations for a broad range of 15 dif- ferent service characteristics are condensed by factor analysis. Negative binomial regressions analysis is used to measure the impact of princi- pal factors for service quality on referral requests to online shops and last-click-throughs for different product categories.e-commerce, price comparison, horizontal service differentiation
Data Sketches for Disaggregated Subset Sum and Frequent Item Estimation
We introduce and study a new data sketch for processing massive datasets. It
addresses two common problems: 1) computing a sum given arbitrary filter
conditions and 2) identifying the frequent items or heavy hitters in a data
set. For the former, the sketch provides unbiased estimates with state of the
art accuracy. It handles the challenging scenario when the data is
disaggregated so that computing the per unit metric of interest requires an
expensive aggregation. For example, the metric of interest may be total clicks
per user while the raw data is a click stream with multiple rows per user. Thus
the sketch is suitable for use in a wide range of applications including
computing historical click through rates for ad prediction, reporting user
metrics from event streams, and measuring network traffic for IP flows.
We prove and empirically show the sketch has good properties for both the
disaggregated subset sum estimation and frequent item problems. On i.i.d. data,
it not only picks out the frequent items but gives strongly consistent
estimates for the proportion of each frequent item. The resulting sketch
asymptotically draws a probability proportional to size sample that is optimal
for estimating sums over the data. For non i.i.d. data, we show that it
typically does much better than random sampling for the frequent item problem
and never does worse. For subset sum estimation, we show that even for
pathological sequences, the variance is close to that of an optimal sampling
design. Empirically, despite the disadvantage of operating on disaggregated
data, our method matches or bests priority sampling, a state of the art method
for pre-aggregated data and performs orders of magnitude better on skewed data
compared to uniform sampling. We propose extensions to the sketch that allow it
to be used in combining multiple data sets, in distributed systems, and for
time decayed aggregation
- …
