17 research outputs found

    Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking

    Full text link
    Machine-learned models are often described as "black boxes". In many real-world applications however, models may have to sacrifice predictive power in favour of human-interpretability. When this is the case, feature engineering becomes a crucial task, which requires significant and time-consuming human effort. Whilst some features are inherently static, representing properties that cannot be influenced (e.g., the age of an individual), others capture characteristics that could be adjusted (e.g., the daily amount of carbohydrates taken). Nonetheless, once a model is learned from the data, each prediction it makes on new instances is irreversible - assuming every instance to be a static point located in the chosen feature space. There are many circumstances however where it is important to understand (i) why a model outputs a certain prediction on a given instance, (ii) which adjustable features of that instance should be modified, and finally (iii) how to alter such a prediction when the mutated instance is input back to the model. In this paper, we present a technique that exploits the internals of a tree-based ensemble classifier to offer recommendations for transforming true negative instances into positively predicted ones. We demonstrate the validity of our approach using an online advertising application. First, we design a Random Forest classifier that effectively separates between two types of ads: low (negative) and high (positive) quality ads (instances). Then, we introduce an algorithm that provides recommendations that aim to transform a low quality ad (negative instance) into a high quality one (positive instance). Finally, we evaluate our approach on a subset of the active inventory of a large ad network, Yahoo Gemini.Comment: 10 pages, KDD 201

    Understanding the Detection of View Fraud in Video Content Portals

    Full text link
    While substantial effort has been devoted to understand fraudulent activity in traditional online advertising (search and banner), more recent forms such as video ads have received little attention. The understanding and identification of fraudulent activity (i.e., fake views) in video ads for advertisers, is complicated as they rely exclusively on the detection mechanisms deployed by video hosting portals. In this context, the development of independent tools able to monitor and audit the fidelity of these systems are missing today and needed by both industry and regulators. In this paper we present a first set of tools to serve this purpose. Using our tools, we evaluate the performance of the audit systems of five major online video portals. Our results reveal that YouTube's detection system significantly outperforms all the others. Despite this, a systematic evaluation indicates that it may still be susceptible to simple attacks. Furthermore, we find that YouTube penalizes its videos' public and monetized view counters differently, the former being more aggressive. This means that views identified as fake and discounted from the public view counter are still monetized. We speculate that even though YouTube's policy puts in lots of effort to compensate users after an attack is discovered, this practice places the burden of the risk on the advertisers, who pay to get their ads displayed.Comment: To appear in WWW 2016, Montr\'eal, Qu\'ebec, Canada. Please cite the conference version of this pape

    A Critical Analysis Of The State-Of-The-Art On Automated Detection Of Deceptive Behavior In Social Media

    Get PDF
    Recently, a large body of research has been devoted to examine the user behavioral patterns and the business implications of social media. However, relatively little research has been conducted regarding users’ deceptive activities in social media; these deceptive activities may hinder the effective application of the data collected from social media to perform e-marketing and initiate business transformation in general. One of the main contributions of this paper is the critical analysis of the possible forms of deceptive behavior in social media and the state-of-the-art technologies for automated deception detection in social media. Based on the proposed taxonomy of major deception types, the assumptions, advantages, and disadvantages of the popular deception detection methods are analyzed. Our critical analysis shows that deceptive behavior may evolve over time, and so making it difficult for the existing methods to effectively detect social media spam. Accordingly, another main contribution of this paper is the design and development of a generic framework to combat dynamic deceptive activities in social media. The managerial implication of our research is that business managers or marketers will develop better insights about the possible deceptive behavior in social media before they tap into social media to collect and generate market intelligence. Moreover, they can apply the proposed adaptive deception detection framework to more effectively combat the ever increasing and evolving deceptive activities in social medi

    Is blockchain ready to revolutionize online advertising?

    Get PDF
    The 200-billion-dollar per annum online advertising ecosystem has become infested with thousands of intermediaries exploiting user data and advertising budgets. All key stakeholders in the value-chain are infected: advertisers with fraud, publishers with their diminishing share of advertising budgets, and users with their right to privacy. Blockchain presents a possible solution to addressing the critical issues in the online advertising supply chain. The question remains whether blockchain scalability, energy-efficiency, and token volatility issues can be solved in the coming years to the extent that online advertising could widely leverage trustlessness and the benefits gained from blockchain technology. This paper aims to review the current progress and to open a discussion to address the issues. We present new requirements for blockchain-based online advertising solutions. We have also analyzed the available solutions against the requirements and recommend directions for future research and solution development. Evidence from our research points out that blockchain is not yet ready to be widely implemented in online advertising. More research is needed, and new proof-of-concepts need to be developed before blockchain technology can be considered a trusted alternative for the current online advertising marketplace based on open real-time bidding

    A survey of the role of viewability within the online advertising ecosystem

    Get PDF
    © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Within the online advertising ecosystem, viewability is defined as the metric that measures if an ad impression had the chance of being viewable by a potential consumer. Although this metric has been presented as a potential game-changer within the ad industry, it has not been fully adopted by the stakeholders, mainly due to disagreement between the different parties on the standards to implement and measure it, and its potential benefits and drawbacks. In this study, we present a survey of the role that viewability can have on the main challenges of the online advertising ecosystem depicting the main applications, benefits and issues. With this objective, we provide an overall picture of how viewability can fit within the ecosystem, which can help the different stakeholders to work on its adoption, integration and establishing a research agenda.This work was supported by the Plan de Doctorados Industriales de la Secretaría de Universidades e Investigación del Departamento de Empresa y Conocimiento de la Generalitat de Catalunya under the Grant 2018-DI-059 and by ExoClick. Furthermore, this work received support from the Fellowship through ‘‘la Caixa’’ Foundation under Grant ID100010434, from the European Union’s Horizon 2020 Research and Innovation Program through Marie Skłodowska-Curie Grant under Agreement 847648, from the Fellowship under Grant CF/BQ/PR20/11770009, from the Spanish Ministry of Economy and Competitiveness through Juan de la Cierva Formación Program under Grant FJCI-2017-34926 and from the Spanish Government under Grant PID2020-113795RB-C31 ‘‘COMPROMISE’’.Peer ReviewedPostprint (published version

    Analyse et perturbation d'un écosystème de fraude au clic

    Get PDF
    RÉSUMÉ La publicité en ligne est devenue une ressource économique importante et indispensable pour de nombreux services en ligne. Cependant, on note que ce marché est particulièrement touché par la fraude et notamment la fraude au clic. Ainsi, en 2015, il est estimé que, dans le monde, les annonceurs allaient perdre plus de sept milliards de dollars américains en raison de la fraude publicitaire. Les méthodes de luttes actuelles contre la fraude publicitaire sont concentrées sur la détection de logiciels malveillant et le démantèlement des réseaux de machines zombies qui y sont associés. Bien qu’indispensables pour limiter le nombre d’infections, ces démantèlements ne diminuent pas l’attrait pour cette fraude. Il est donc indispensable de s’attaquer en plus à l’incitatif économique. Pour cela, nous avons d’une part essayé de mieux comprendre l’écosystème de la fraude au clic et d’autre part évalué des possibilités de perturbations de cet écosystème afin de diminuer l’attractivité de la fraude. Dans un premier temps, nous avons collecté des données réseau générées par un logiciel malveillant de fraude au clic, Boaxxe. Ces données sont des chaînes de redirection HTTP qui montrent les liens entre les différents acheteurs et revendeurs d’une publicité, c’est-à-dire la chaîne de valeur. Celles-ci commencent au moteur de recherche d’entrée, opéré par des fraudeurs, passe à travers plusieurs régies publicitaires et termine sur le site d’un annonceur, celui ayant acheté le trafic. Dans un second temps, nous avons agrégé les données collectées afin de constituer un graphe montrant les relations entre les différents noms de domaine et adresses IP. Ce graphe est ensuite consolidé, grâce à des données de source ouverte, en regroupant les noeuds réseaux appartenant au même acteur. Le graphe ainsi obtenu constitue une représentation de l’écosystème de la fraude au clic de Boaxxe. Dans un troisième temps, nous avons évalué différentes stratégies de perturbation de l’écosystème. L’objectif de la perturbation est d’empêcher la monétisation de trafic généré par Boaxxe, c’est-à-dire d’empêcher le transit du trafic du moteur de recherche vers le site de l’annonceur. Il s’avère que la stratégie la plus adaptée à notre problème est celle utilisant la méthode du Keyplayer. Nous avons ainsi montré qu’il était possible de protéger un nombre important d’annonceurs en supprimant un faible nombre d’intermédiaires. Enfin, nous discutons des possibilités de mise en pratique de l’opération de perturbation. Nous insistons sur le fait qu’il est important de sensibiliser les annonceurs à la fraude afin qu’ils puissent prendre des mesures contraignantes envers les régies publicitaires les moins scrupuleuses.----------ABSTRACT Online advertising is a growing market with global revenues of 159.8 billion dollars in 2015. Thus, it is a good target for fraudsters to make money on. In 2015, it is estimated that, globally, advertisers were defrauded of more than seven billion dollars. The security community is concerned by this kind of fraud, known as click fraud, and a lot of research aims to limit it. Current methods are more focused on studying malware binaries and performing botnet take-downs. These operations are useful to limit the propagation of malware and to protect users from known threats. However, it does not have an impact on the economic incentives of perpetrating click fraud. In order to diminish the attractiveness of the fraud we first tried to better understand the click-fraud ecosystem and then evaluate disruption strategies on this ecosystem. Firstly, we collected network traces generated by a well-known click-fraud malware, Boaxxe. This data are HTTP redirection chains showing the links between all the intermediaries involved in the reselling of an ad. This constitutes the value chain. The redirection chains begin at a doorway search engine, operated by fraudsters, pass through several ad networks and land on an advertiser web site, that bought the traffic. Secondly, we aggregated the data collected into a single graph. It shows the relationships between the domain names and IP addresses involved in the Boaxxe fraud. We then consolidate this graph by merging all the network nodes operated by a single organization by leveraging information obtained from open sources. Thus, the graph is a representation of the fraud ecosystem. Thirdly, we evaluated disruption strategies on this ecosystem. The aim is to stop the monetization of the traffic generated by Boaxxe. This is equivalent to stopping the traffic going from the doorway search engine to the web sites of the advertisers. Among the strategies tested, the most suitable for our problem was the Keyplayer strategy. We showed that it is possible to protect numerous advertisers from this fraud by disrupting the ecosystem graph. Finally, we discuss how to perform the disruption operation in practice. We focus on increasing the level of awareness of advertisers that could have a strong position to limit click fraud. One way in which they could do so is by implementing controls to make sure they are not maintaining business relationships with unscrupulous ad networks
    corecore