54 research outputs found
Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking
Machine-learned models are often described as "black boxes". In many
real-world applications however, models may have to sacrifice predictive power
in favour of human-interpretability. When this is the case, feature engineering
becomes a crucial task, which requires significant and time-consuming human
effort. Whilst some features are inherently static, representing properties
that cannot be influenced (e.g., the age of an individual), others capture
characteristics that could be adjusted (e.g., the daily amount of carbohydrates
taken). Nonetheless, once a model is learned from the data, each prediction it
makes on new instances is irreversible - assuming every instance to be a static
point located in the chosen feature space. There are many circumstances however
where it is important to understand (i) why a model outputs a certain
prediction on a given instance, (ii) which adjustable features of that instance
should be modified, and finally (iii) how to alter such a prediction when the
mutated instance is input back to the model. In this paper, we present a
technique that exploits the internals of a tree-based ensemble classifier to
offer recommendations for transforming true negative instances into positively
predicted ones. We demonstrate the validity of our approach using an online
advertising application. First, we design a Random Forest classifier that
effectively separates between two types of ads: low (negative) and high
(positive) quality ads (instances). Then, we introduce an algorithm that
provides recommendations that aim to transform a low quality ad (negative
instance) into a high quality one (positive instance). Finally, we evaluate our
approach on a subset of the active inventory of a large ad network, Yahoo
Gemini.Comment: 10 pages, KDD 201
You Must Have Clicked on this Ad by Mistake! Data-Driven Identification of Accidental Clicks on Mobile Ads with Applications to Advertiser Cost Discounting and Click-Through Rate Prediction
In the cost per click (CPC) pricing model, an advertiser pays an ad network
only when a user clicks on an ad; in turn, the ad network gives a share of that
revenue to the publisher where the ad was impressed. Still, advertisers may be
unsatisfied with ad networks charging them for "valueless" clicks, or so-called
accidental clicks. [...] Charging advertisers for such clicks is detrimental in
the long term as the advertiser may decide to run their campaigns on other ad
networks. In addition, machine-learned click models trained to predict which ad
will bring the highest revenue may overestimate an ad click-through rate, and
as a consequence negatively impacting revenue for both the ad network and the
publisher. In this work, we propose a data-driven method to detect accidental
clicks from the perspective of the ad network. We collect observations of time
spent by users on a large set of ad landing pages - i.e., dwell time. We notice
that the majority of per-ad distributions of dwell time fit to a mixture of
distributions, where each component may correspond to a particular type of
clicks, the first one being accidental. We then estimate dwell time thresholds
of accidental clicks from that component. Using our method to identify
accidental clicks, we then propose a technique that smoothly discounts the
advertiser's cost of accidental clicks at billing time. Experiments conducted
on a large dataset of ads served on Yahoo mobile apps confirm that our
thresholds are stable over time, and revenue loss in the short term is
marginal. We also compare the performance of an existing machine-learned click
model trained on all ad clicks with that of the same model trained only on
non-accidental clicks. There, we observe an increase in both ad click-through
rate (+3.9%) and revenue (+0.2%) on ads served by the Yahoo Gemini network when
using the latter. [...
Community Membership Hiding as Counterfactual Graph Search via Deep Reinforcement Learning
Community detection techniques are useful tools for social media platforms to
discover tightly connected groups of users who share common interests. However,
this functionality often comes at the expense of potentially exposing
individuals to privacy breaches by inadvertently revealing their tastes or
preferences. Therefore, some users may wish to safeguard their anonymity and
opt out of community detection for various reasons, such as affiliation with
political or religious organizations.
In this study, we address the challenge of community membership hiding, which
involves strategically altering the structural properties of a network graph to
prevent one or more nodes from being identified by a given community detection
algorithm. We tackle this problem by formulating it as a constrained
counterfactual graph objective, and we solve it via deep reinforcement
learning. We validate the effectiveness of our method through two distinct
tasks: node and community deception. Extensive experiments show that our
approach overall outperforms existing baselines in both tasks
MUSTACHE: Multi-Step-Ahead Predictions for Cache Eviction
In this work, we propose MUSTACHE, a new page cache replacement algorithm
whose logic is learned from observed memory access requests rather than fixed
like existing policies. We formulate the page request prediction problem as a
categorical time series forecasting task. Then, our method queries the learned
page request forecaster to obtain the next predicted page memory references
to better approximate the optimal B\'el\'ady's replacement algorithm. We
implement several forecasting techniques using advanced deep learning
architectures and integrate the best-performing one into an existing
open-source cache simulator. Experiments run on benchmark datasets show that
MUSTACHE outperforms the best page replacement heuristic (i.e., exact LRU),
improving the cache hit ratio by 1.9% and reducing the number of reads/writes
required to handle cache misses by 18.4% and 10.3%
Discovering Europeana users’ search behavior
Europeana is a strategic project funded by the European Commission with the goal of making Europe's cultural and scientific heritage accessible to the public. ASSETS is a two-year Best Practice Network co-funded by the CIP PSP Programme to improve performance, accessibility and usability of the Europeana search engine. Here we present a characterization of the Europeana logs by showing statistics on common behavioural patterns of the Europeana users
A Byzantine-Resilient Aggregation Scheme for Federated Learning via Matrix Autoregression on Client Updates
In this work, we propose FLANDERS, a novel federated learning (FL)
aggregation scheme robust to Byzantine attacks. FLANDERS considers the local
model updates sent by clients at each FL round as a matrix-valued time series.
Then, it identifies malicious clients as outliers of this time series by
comparing actual observations with those estimated by a matrix autoregressive
forecasting model. Experiments conducted on several datasets under different FL
settings demonstrate that FLANDERS matches the robustness of the most powerful
baselines against Byzantine clients. Furthermore, FLANDERS remains highly
effective even under extremely severe attack scenarios, as opposed to existing
defense strategies
Sparse Vicious Attacks on Graph Neural Networks
Graph Neural Networks (GNNs) have proven to be successful in several
predictive modeling tasks for graph-structured data.
Amongst those tasks, link prediction is one of the fundamental problems for
many real-world applications, such as recommender systems.
However, GNNs are not immune to adversarial attacks, i.e., carefully crafted
malicious examples that are designed to fool the predictive model.
In this work, we focus on a specific, white-box attack to GNN-based link
prediction models, where a malicious node aims to appear in the list of
recommended nodes for a given target victim.
To achieve this goal, the attacker node may also count on the cooperation of
other existing peers that it directly controls, namely on the ability to inject
a number of ``vicious'' nodes in the network.
Specifically, all these malicious nodes can add new edges or remove existing
ones, thereby perturbing the original graph.
Thus, we propose SAVAGE, a novel framework and a method to mount this type of
link prediction attacks.
SAVAGE formulates the adversary's goal as an optimization task, striking the
balance between the effectiveness of the attack and the sparsity of malicious
resources required.
Extensive experiments conducted on real-world and synthetic datasets
demonstrate that adversarial attacks implemented through SAVAGE indeed achieve
high attack success rate yet using a small amount of vicious nodes.
Finally, despite those attacks require full knowledge of the target model, we
show that they are successfully transferable to other black-box methods for
link prediction
Twitter anticipates bursts of requests for Wikipedia articles
Most of the tweets that users exchange on Twitter make implicit mentions of named-entities, which in turn can be mapped to corresponding Wikipedia articles using proper Entity Linking (EL) techniques. Some of those become trending entities on Twitter due to a long-lasting or a sudden effect on the volume of tweets where they are mentioned. We argue that the set of trending entities discovered from Twitter may help predict the volume of requests for relating Wikipedia articles. To validate this claim, we apply an EL technique to extract trending entities from a large dataset of public tweets. Then, we analyze the time series derived from the hourly trending score (i.e., an index of popularity) of each entity as measured by Twitter and Wikipedia, respectively. Our results reveals that Twitter actually leads Wikipedia by one or more hours
- …