Search CORE

328 research outputs found

REAL-TIME AD CLICK FRAUD DETECTION

Author: Srivastava Apoorva
Publication venue: SJSU ScholarWorks
Publication date: 18/05/2020
Field of study

With the increase in Internet usage, it is now considered a very important platform for advertising and marketing. Digital marketing has become very important to the economy: some of the major Internet services available publicly to users are free, thanks to digital advertising. It has also allowed the publisher ecosystem to flourish, ensuring significant monetary incentives for creating quality public content, helping to usher in the information age. Digital advertising, however, comes with its own set of challenges. One of the biggest challenges is ad fraud. There is a proliferation of malicious parties and software seeking to undermine the ecosystem and causing monetary harm to digital advertisers and ad networks. Pay-per-click advertising is especially susceptible to click fraud, where each click is highly valuable. This leads advertisers to lose money and ad networks to lose their credibility, hurting the overall ecosystem. Much of the fraud detection is done in offline data pipelines, which compute fraud/non-fraud labels on clicks long after they happened. This is because click fraud detection usually depends on complex machine learning models using a large number of features on huge datasets, which can be very costly to train and lookup. In this thesis, the existence of low-cost ad click fraud classifiers with reasonable precision and recall is hypothesized. A set of simple heuristics as well as basic machine learning models (with associated simplified feature spaces) are compared with complex machine learning models, on performance and classification accuracy. Through research and experimentation, a performant classifier is discovered which can be deployed for real-time fraud detection

SJSU ScholarWorks

Click fraud : how to spot it, how to stop it?

Author: Walgampaya Chamila Kumara
Publication venue: ThinkIR: The University of Louisville\u27s Institutional Repository
Publication date: 01/08/2011
Field of study

Online search advertising is currently the greatest source of revenue for many Internet giants such as Google™, Yahoo!™, and Bing™. The increased number of specialized websites and modern profiling techniques have all contributed to an explosion of the income of ad brokers from online advertising. The single biggest threat to this growth is however click fraud. Trained botnets and even individuals are hired by click-fraud specialists in order to maximize the revenue of certain users from the ads they publish on their websites, or to launch an attack between competing businesses. Most academics and consultants who study online advertising estimate that 15% to 35% of ads in pay per click (PPC) online advertising systems are not authentic. In the first two quarters of 2010, US marketers alone spent

5.7 billion on PPC ads, where PPC ads are between 45 and 50 percent of all online ad spending. On average about

1.5 billion is wasted due to click-fraud. These fraudulent clicks are believed to be initiated by users in poor countries, or botnets, who are trained to click on specific ads. For example, according to a 2010 study from Information Warfare Monitor, the operators of Koobface, a program that installed malicious software to participate in click fraud, made over $2 million in just over a year. The process of making such illegitimate clicks to generate revenue is called click-fraud. Search engines claim they filter out most questionable clicks and either not charge for them or reimburse advertisers that have been wrongly billed. However this is a hard task, despite the claims that brokers\u27 efforts are satisfactory. In the simplest scenario, a publisher continuously clicks on the ads displayed on his own website in order to make revenue. In a more complicated scenario. a travel agent may hire a large, globally distributed, botnet to click on its competitor\u27s ads, hence depleting their daily budget. We analyzed those different types of click fraud methods and proposed new methodologies to detect and prevent them real time. While traditional commercial approaches detect only some specific types of click fraud, Collaborative Click Fraud Detection and Prevention (CCFDP) system, an architecture that we have implemented based on the proposed methodologies, can detect and prevents all major types of click fraud. The proposed solution analyzes the detailed user activities on both, the server side and client side collaboratively to better describe the intention of the click. Data fusion techniques are developed to combine evidences from several data mining models and to obtain a better estimation of the quality of the click traffic. Our ideas are experimented through the development of the Collaborative Click Fraud Detection and Prevention (CCFDP) system. Experimental results show that the CCFDP system is better than the existing commercial click fraud solution in three major aspects: 1) detecting more click fraud especially clicks generated by software; 2) providing prevention ability; 3) proposing the concept of click quality score for click quality estimation. In the CCFDP initial version, we analyzed the performances of the click fraud detection and prediction model by using a rule base algorithm, which is similar to most of the existing systems. We have assigned a quality score for each click instead of classifying the click as fraud or genuine, because it is hard to get solid evidence of click fraud just based on the data collected, and it is difficult to determine the real intention of users who make the clicks. Results from initial version revealed that the diversity of CF attack Results from initial version revealed that the diversity of CF attack types makes it hard for a single counter measure to prevent click fraud. Therefore, it is important to be able to combine multiple measures capable of effective protection from click fraud. Therefore, in the CCFDP improved version, we provide the traffic quality score as a combination of evidence from several data mining algorithms. We have tested the system with a data from an actual ad campaign in 2007 and 2008. We have compared the results with Google Adwords reports for the same campaign. Results show that a higher percentage of click fraud present even with the most popular search engine. The multiple model based CCFDP always estimated less valid traffic compare to Google. Sometimes the difference is as high as 53%. Detection of duplicates, fast and efficient, is one of the most important requirement in any click fraud solution. Usually duplicate detection algorithms run in real time. In order to provide real time results, solution providers should utilize data structures that can be updated in real time. In addition, space requirement to hold data should be minimum. In this dissertation, we also addressed the problem of detecting duplicate clicks in pay-per-click streams. We proposed a simple data structure, Temporal Stateful Bloom Filter (TSBF), an extension to the regular Bloom Filter and Counting Bloom Filter. The bit vector in the Bloom Filter was replaced with a status vector. Duplicate detection results of TSBF method is compared with Buffering, FPBuffering, and CBF methods. False positive rate of TSBF is less than 1% and it does not have false negatives. Space requirement of TSBF is minimal among other solutions. Even though Buffering does not have either false positives or false negatives its space requirement increases exponentially with the size of the stream data size. When the false positive rate of the FPBuffering is set to 1% its false negative rate jumps to around 5%, which will not be tolerated by most of the streaming data applications. We also compared the TSBF results with CBF. TSBF uses only half the space or less than standard CBF with the same false positive probability. One of the biggest successes with CCFDP is the discovery of new mercantile click bot, the Smart ClickBot. We presented a Bayesian approach for detecting the Smart ClickBot type clicks. The system combines evidence extracted from web server sessions to determine the final class of each click. Some of these evidences can be used alone, while some can be used in combination with other features for the click bot detection. During training and testing we also addressed the class imbalance problem. Our best classifier shows recall of 94%. and precision of 89%, with F1 measure calculated as 92%. The high accuracy of our system proves the effectiveness of the proposed methodology. Since the Smart ClickBot is a sophisticated click bot that manipulate every possible parameters to go undetected, the techniques that we discussed here can lead to detection of other types of software bots too. Despite the enormous capabilities of modern machine learning and data mining techniques in modeling complicated problems, most of the available click fraud detection systems are rule-based. Click fraud solution providers keep the rules as a secret weapon and bargain with others to prove their superiority. We proposed validation framework to acquire another model of the clicks data that is not rule dependent, a model that learns the inherent statistical regularities of the data. Then the output of both models is compared. Due to the uniqueness of the CCFDP system architecture, it is better than current commercial solution and search engine/ISP solution. The system protects Pay-Per-Click advertisers from click fraud and improves their Return on Investment (ROI). The system can also provide an arbitration system for advertiser and PPC publisher whenever the click fraud argument arises. Advertisers can gain their confidence on PPC advertisement by having a channel to argue the traffic quality with big search engine publishers. The results of this system will booster the internet economy by eliminating the shortcoming of PPC business model. General consumer will gain their confidence on internet business model by reducing fraudulent activities which are numerous in current virtual internet world

University of Louisville

Click Fraud Detection in Online and In-app Advertisements: A Learning Based Approach

Author: Gubbi Sadashiva Thejas
Publication venue: FIU Digital Commons
Publication date: 30/09/2019
Field of study

Click Fraud is the fraudulent act of clicking on pay-per-click advertisements to increase a site’s revenue, to drain revenue from the advertiser, or to inflate the popularity of content on social media platforms. In-app advertisements on mobile platforms are among the most common targets for click fraud, which makes companies hesitant to advertise their products. Fraudulent clicks are supposed to be caught by ad providers as part of their service to advertisers, which is commonly done using machine learning methods. However: (1) there is a lack of research in current literature addressing and evaluating the different techniques of click fraud detection and prevention, (2) threat models composed of active learning systems (smart attackers) can mislead the training process of the fraud detection model by polluting the training data, (3) current deep learning models have significant computational overhead, (4) training data is often in an imbalanced state, and balancing it still results in noisy data that can train the classifier incorrectly, and (5) datasets with high dimensionality cause increased computational overhead and decreased classifier correctness -- while existing feature selection techniques address this issue, they have their own performance limitations. By extending the state-of-the-art techniques in the field of machine learning, this dissertation provides the following solutions: (i) To address (1) and (2), we propose a hybrid deep-learning-based model which consists of an artificial neural network, auto-encoder and semi-supervised generative adversarial network. (ii) As a solution for (3), we present Cascaded Forest and Extreme Gradient Boosting with less hyperparameter tuning. (iii) To overcome (4), we propose a row-wise data reduction method, KSMOTE, which filters out noisy data samples both in the raw data and the synthetically generated samples. (iv) For (5), we propose different column-reduction methods such as multi-time-scale Time Series analysis for fraud forecasting, using binary labeled imbalanced datasets and hybrid filter-wrapper feature selection approaches

DigitalCommons@Florida International University

MAGMA network behavior classifier for malware traffic

Author: BARALIS ELENA MARIA
BOCCHI ENRICO
GRIMAUDO LUIGI
Lee Sung Ju
MELLIA Marco
Miskovic Stanislav
Modelo Howard Gaspar
Saha Sabyasachi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Malware is a major threat to security and privacy of network users. A large variety of malware is typically spread over the Internet, hiding in benign traffic. New types of malware appear every day, challenging both the research community and security companies to improve malware identification techniques. In this paper we present MAGMA, MultilAyer Graphs for MAlware detection, a novel malware behavioral classifier. Our system is based on a Big Data methodology, driven by real-world data obtained from traffic traces collected in an operational network. The methodology we propose automatically extracts patterns related to a specific input event, i.e., a seed, from the enormous amount of events the network carries. By correlating such activities over (i) time, (ii) space, and (iii) network protocols, we build a Network Connectivity Graph that captures the overall “network behavior” of the seed. We next extract features from the Connectivity Graph and design a supervised classifier. We run MAGMA on a large dataset collected from a commercial Internet Provider where 20,000 Internet users generated more than 330 million events. Only 42,000 are flagged as malicious by a commercial IDS, which we consider as an oracle. Using this dataset, we experimentally evaluate MAGMA accuracy and robustness to parameter settings. Results indicate that MAGMA reaches 95% accuracy, with limited false positives. Furthermore, MAGMA proves able to identify suspicious network events that the IDS ignored

INRIA a CCSD electronic archive server

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Assessing enactment of content regulation policies: A post hoc crowd-sourced audit of election misinformation on YouTube

Author: Bhuiyan Md Momen
Juneja Prerna
Mitra Tanushree
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/02/2023
Field of study

With the 2022 US midterm elections approaching, conspiratorial claims about the 2020 presidential elections continue to threaten users' trust in the electoral process. To regulate election misinformation, YouTube introduced policies to remove such content from its searches and recommendations. In this paper, we conduct a 9-day crowd-sourced audit on YouTube to assess the extent of enactment of such policies. We recruited 99 users who installed a browser extension that enabled us to collect up-next recommendation trails and search results for 45 videos and 88 search queries about the 2020 elections. We find that YouTube's search results, irrespective of search query bias, contain more videos that oppose rather than support election misinformation. However, watching misinformative election videos still lead users to a small number of misinformative videos in the up-next trails. Our results imply that while YouTube largely seems successful in regulating election misinformation, there is still room for improvement.Comment: 22 page

arXiv.org e-Print Archive

How Many Phish Can Tweet? Investigating the Effectiveness of Twitter's Phishing and Malware Defence System

Author: Bell Simon
Publication venue
Publication date: 01/01/2021
Field of study

Royal Holloway - Pure

Deteção de anomalias em modelos de publicidade Pay-Per-Click

Author: Lopes Eduardo Luís da Silva
Publication venue
Publication date: 05/12/2012
Field of study

Dissertação de mestrado em Engenharia InformáticaNowadays, online advertisement is one of the most effective and profitable marketing strategies. An example of strong growth in online advertisement is the Pay-Per-Click Advertising Model where all parties are benefited. Due to the number of stakeholders and the amount of money involved, it is inevitable to find efficient methods to analyse the validity of clicks on online advertising, specifically in Pay-Per-Click. The confidence of the advertiser is a crucial point to the success of this model. So it is necessary the distinction between the valid and the invalid clicks, made with the intention of generating charges, benefiting directly or indirectly with that action. Therefore, a state of the art about fraud detection techniques in Pay-Per-Click will be presented, as well as the main techniques used to deceive this advertising model. Other related matters were subject of study, such as the relevant data to collect for an accurate analysis of data flow at the servers. It was performed a comparative analysis of different approaches of anomaly detection in order to identify the most suitable for the problem at hand. Using this subarea of Data Mining, very satisfactory results have been achieved, thus concluding that anomaly detection can give a major contribution to the resolution of Pay-Per-Click fraud.Os anúncios online são atualmente uma das estratégias de marketing mais rentáveis e eficientes. Um exemplo de forte crescimento nesta área é o modelo de publicidade Pay-Per-Click, onde todos os intervenientes são beneficiados. Devido ao número de intervenientes e à quantidade de dinheiro envolvido, torna-se inevitável encontrar métodos eficientes para analisar a validade dos cliques efetuados em publicidade online, mais concretamente em sistemas Pay-Per-Click. A confiança do anunciante é um fator crucial para o sucesso deste modelo. Assim, é necessário distinguir os cliques válidos dos inválidos, feitos com a intenção de gerar um débito, beneficiando direta ou indiretamente com essa ação. Deste modo, será apresentado um estado da arte sobre técnicas de deteção de fraude em Pay-Per-Click, assim como as principais técnicas utilizadas para defraudar esse tipo de modelo. Outros assuntos relacionados foram também objeto de estudo, tal como os dados necessários para uma análise precisa do fluxo de dados nos servidores. Foi efetuado uma análise comparativa de diferentes abordagens de deteção de anomalias a fim de identificar quais as mais adequadas para o problema em questão. Com recurso a esta subárea de Data Mining foram alcançados resultados bastantes satisfatórios, concluindo-se assim que a deteção de anomalias pode dar um contributo fundamental para a resolução de fraude em Pay-Per-Click

Universidade do Minho: RepositoriUM

Recommended from our members

The Design and Implementation of Low-Latency Prediction Serving Systems

Author: Crankshaw Daniel
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Machine learning is being deployed in a growing number of applications which demand real- time, accurate, and cost-efficient predictions under heavy query load. These applications employ a variety of machine learning frameworks and models, often composing several models within the same application. However, most machine learning frameworks and systems are optimized for model training and not deployment.In this thesis, I discuss three prediction serving systems designed to meet the needs of modern interactive machine learning applications. The key idea in this work is to utilize a decoupled, layered design that interposes systems on top of training frameworks to build low-latency, scalable serving systems. Velox introduced this decoupled architecture to enable fast online learning and model personalization in response to feedback. Clipper generalized this system architecture to be framework-agnostic and introduced a set of optimizations to reduce and bound prediction latency and improve prediction throughput, accuracy, and robustness without modifying the underlying machine learning frameworks. And InferLine provisions and manages the individual stages of prediction pipelines to minimize cost while meeting end-to-end tail latency constraints

eScholarship - University of California