149 research outputs found

    Co-location rules discovery process focused on reference spatial features using decision tree learning

    Get PDF
    The co-location discovery process serves to find subsets of spatial features frequently located together. Many algorithms and methods have been designed in recent years; however, finding this kind of patterns around specific spatial features is a task in which the existing solutions provide incorrect results. Throughout this paper we propose a knowledge discovery process to find co-location patterns focused on reference features using decision tree learning algorithms on transactional data generated using maximal cliques. A validation test of this process is provided.Fil: Merlino, Hernán Daniel. Universidad Nacional de Lanús; Argentina.Fil: Rottoli, Giovanni Daián. Universidad Tecnológica Nacional.Facultad Regional Concepción del Uruguay. Departamento Ingeniería en Sistemas de Información. Grupo de Investigación en Bases de Datos; Argentina.Fil: Rottoli, Giovanni Daián. Universidad Nacional de La Plata; Argentina.Fil: Rottoli, Giovanni Daián. Universidad Nacional de Lanús; Argentina.Fil: García Martínez, Ramón. Universidad Nacional de Lanús. Departamento Desarrollo Productivo y Tecnológico. Grupo de Investigación en Sistemas de Información; Argentina.Fil: García Martínez, Ramón. Comisión de Investigaciones Científicas; Argentina.Peer Reviewe

    Applying Association Rules and Co-location Techniques on Geospatial Web Services

    Get PDF
    Most contemporary GIS have only very basic spatial analysis and data mining functionality and many are confined to analysis that involves comparing maps and descriptive statistical displays like histograms or pie charts. Emerging Web standards promise a network of heterogeneous yet interoperable Web Services. Web Services would greatly simplify the development of many kinds of data integration and knowledge management applications. Geospatial data mining describes the combination of two key market intelligence software tools: Geographical Information Systems and Data Mining Systems. This research aims to develop a Spatial Data Mining web service it uses rule association techniques and correlation methods to explore results of huge amounts of data generated from crises management integrated applications developed. It integrates between traffic systems, medical services systems, civil defense and state of the art Geographic Information Systems and Data Mining Systems functionality in an open, highly extensible, internet-enabled plug-in architecture. The Interoperability of geospatial data previously focus just on data formats and standards. The recent popularity and adoption of the Internet and Web Services has provided a new means of interoperability for geospatial information not just for exchanging data but for analyzing these data during exchange. An integrated, user friendly Spatial Data Mining System available on the internet via a web service offers exciting new possibilities for spatial decision making and geographical research to a wide range of potential users.   Keywords: Spatial Data Mining, Rule Association, Co-location, Web Services, Geospatial Dat

    Proceso de descubrimiento de patrones de co-localización alrededor de tipos de eventos de referencia

    Get PDF
    El descubrimiento de patrones de co-localización revela subconjuntos de tipos de eventos espaciales cuyas instancias ocurren frecuentemente vecinas entre sí. Muchos algoritmos y métodos han sido desarrollados a través de los años, sin embargo, cuando se requiere encontrar estos patrones alrededor de tipos de eventos espaciales determinados, la alternativa existente resulta incompleta e incorrecta. En el presente trabajo, en consecuencia, se desarrolla un proceso de explotación de información para el descubrimiento de patrones de co-localización alrededor de tipos de eventos espaciales de referencia que utiliza cliques máximos y algoritmos TDIDT para brindar una solución a este problema. Se presenta una prueba de concepto del proceso propuesto.XIII Workshop Bases de datos y Minería de Datos (WBDMD).Red de Universidades con Carreras en Informática (RedUNCI

    Click fraud : how to spot it, how to stop it?

    Get PDF
    Online search advertising is currently the greatest source of revenue for many Internet giants such as Googleâ„¢, Yahoo!â„¢, and Bingâ„¢. The increased number of specialized websites and modern profiling techniques have all contributed to an explosion of the income of ad brokers from online advertising. The single biggest threat to this growth is however click fraud. Trained botnets and even individuals are hired by click-fraud specialists in order to maximize the revenue of certain users from the ads they publish on their websites, or to launch an attack between competing businesses. Most academics and consultants who study online advertising estimate that 15% to 35% of ads in pay per click (PPC) online advertising systems are not authentic. In the first two quarters of 2010, US marketers alone spent 5.7billiononPPCads,wherePPCadsarebetween45and50percentofallonlineadspending.Onaverageabout5.7 billion on PPC ads, where PPC ads are between 45 and 50 percent of all online ad spending. On average about 1.5 billion is wasted due to click-fraud. These fraudulent clicks are believed to be initiated by users in poor countries, or botnets, who are trained to click on specific ads. For example, according to a 2010 study from Information Warfare Monitor, the operators of Koobface, a program that installed malicious software to participate in click fraud, made over $2 million in just over a year. The process of making such illegitimate clicks to generate revenue is called click-fraud. Search engines claim they filter out most questionable clicks and either not charge for them or reimburse advertisers that have been wrongly billed. However this is a hard task, despite the claims that brokers\u27 efforts are satisfactory. In the simplest scenario, a publisher continuously clicks on the ads displayed on his own website in order to make revenue. In a more complicated scenario. a travel agent may hire a large, globally distributed, botnet to click on its competitor\u27s ads, hence depleting their daily budget. We analyzed those different types of click fraud methods and proposed new methodologies to detect and prevent them real time. While traditional commercial approaches detect only some specific types of click fraud, Collaborative Click Fraud Detection and Prevention (CCFDP) system, an architecture that we have implemented based on the proposed methodologies, can detect and prevents all major types of click fraud. The proposed solution analyzes the detailed user activities on both, the server side and client side collaboratively to better describe the intention of the click. Data fusion techniques are developed to combine evidences from several data mining models and to obtain a better estimation of the quality of the click traffic. Our ideas are experimented through the development of the Collaborative Click Fraud Detection and Prevention (CCFDP) system. Experimental results show that the CCFDP system is better than the existing commercial click fraud solution in three major aspects: 1) detecting more click fraud especially clicks generated by software; 2) providing prevention ability; 3) proposing the concept of click quality score for click quality estimation. In the CCFDP initial version, we analyzed the performances of the click fraud detection and prediction model by using a rule base algorithm, which is similar to most of the existing systems. We have assigned a quality score for each click instead of classifying the click as fraud or genuine, because it is hard to get solid evidence of click fraud just based on the data collected, and it is difficult to determine the real intention of users who make the clicks. Results from initial version revealed that the diversity of CF attack Results from initial version revealed that the diversity of CF attack types makes it hard for a single counter measure to prevent click fraud. Therefore, it is important to be able to combine multiple measures capable of effective protection from click fraud. Therefore, in the CCFDP improved version, we provide the traffic quality score as a combination of evidence from several data mining algorithms. We have tested the system with a data from an actual ad campaign in 2007 and 2008. We have compared the results with Google Adwords reports for the same campaign. Results show that a higher percentage of click fraud present even with the most popular search engine. The multiple model based CCFDP always estimated less valid traffic compare to Google. Sometimes the difference is as high as 53%. Detection of duplicates, fast and efficient, is one of the most important requirement in any click fraud solution. Usually duplicate detection algorithms run in real time. In order to provide real time results, solution providers should utilize data structures that can be updated in real time. In addition, space requirement to hold data should be minimum. In this dissertation, we also addressed the problem of detecting duplicate clicks in pay-per-click streams. We proposed a simple data structure, Temporal Stateful Bloom Filter (TSBF), an extension to the regular Bloom Filter and Counting Bloom Filter. The bit vector in the Bloom Filter was replaced with a status vector. Duplicate detection results of TSBF method is compared with Buffering, FPBuffering, and CBF methods. False positive rate of TSBF is less than 1% and it does not have false negatives. Space requirement of TSBF is minimal among other solutions. Even though Buffering does not have either false positives or false negatives its space requirement increases exponentially with the size of the stream data size. When the false positive rate of the FPBuffering is set to 1% its false negative rate jumps to around 5%, which will not be tolerated by most of the streaming data applications. We also compared the TSBF results with CBF. TSBF uses only half the space or less than standard CBF with the same false positive probability. One of the biggest successes with CCFDP is the discovery of new mercantile click bot, the Smart ClickBot. We presented a Bayesian approach for detecting the Smart ClickBot type clicks. The system combines evidence extracted from web server sessions to determine the final class of each click. Some of these evidences can be used alone, while some can be used in combination with other features for the click bot detection. During training and testing we also addressed the class imbalance problem. Our best classifier shows recall of 94%. and precision of 89%, with F1 measure calculated as 92%. The high accuracy of our system proves the effectiveness of the proposed methodology. Since the Smart ClickBot is a sophisticated click bot that manipulate every possible parameters to go undetected, the techniques that we discussed here can lead to detection of other types of software bots too. Despite the enormous capabilities of modern machine learning and data mining techniques in modeling complicated problems, most of the available click fraud detection systems are rule-based. Click fraud solution providers keep the rules as a secret weapon and bargain with others to prove their superiority. We proposed validation framework to acquire another model of the clicks data that is not rule dependent, a model that learns the inherent statistical regularities of the data. Then the output of both models is compared. Due to the uniqueness of the CCFDP system architecture, it is better than current commercial solution and search engine/ISP solution. The system protects Pay-Per-Click advertisers from click fraud and improves their Return on Investment (ROI). The system can also provide an arbitration system for advertiser and PPC publisher whenever the click fraud argument arises. Advertisers can gain their confidence on PPC advertisement by having a channel to argue the traffic quality with big search engine publishers. The results of this system will booster the internet economy by eliminating the shortcoming of PPC business model. General consumer will gain their confidence on internet business model by reducing fraudulent activities which are numerous in current virtual internet world

    Proceso de descubrimiento de patrones de co-localización alrededor de tipos de eventos de referencia

    Get PDF
    El descubrimiento de patrones de co-localización revela subconjuntos de tipos de eventos espaciales cuyas instancias ocurren frecuentemente vecinas entre sí. Muchos algoritmos y métodos han sido desarrollados a través de los años, sin embargo, cuando se requiere encontrar estos patrones alrededor de tipos de eventos espaciales determinados, la alternativa existente resulta incompleta e incorrecta. En el presente trabajo, en consecuencia, se desarrolla un proceso de explotación de información para el descubrimiento de patrones de co-localización alrededor de tipos de eventos espaciales de referencia que utiliza cliques máximos y algoritmos TDIDT para brindar una solución a este problema. Se presenta una prueba de concepto del proceso propuesto.XIII Workshop Bases de datos y Minería de Datos (WBDMD).Red de Universidades con Carreras en Informática (RedUNCI

    Proceso de descubrimiento de patrones de co-localización alrededor de tipos de eventos de referencia

    Get PDF
    El descubrimiento de patrones de co-localización revela subconjuntos de tipos de eventos espaciales cuyas instancias ocurren frecuentemente vecinas entre sí. Muchos algoritmos y métodos han sido desarrollados a través de los años, sin embargo, cuando se requiere encontrar estos patrones alrededor de tipos de eventos espaciales determinados, la alternativa existente resulta incompleta e incorrecta. En el presente trabajo, en consecuencia, se desarrolla un proceso de explotación de información para el descubrimiento de patrones de co-localización alrededor de tipos de eventos espaciales de referencia que utiliza cliques máximos y algoritmos TDIDT para brindar una solución a este problema. Se presenta una prueba de concepto del proceso propuesto.XIII Workshop Bases de datos y Minería de Datos (WBDMD).Red de Universidades con Carreras en Informática (RedUNCI

    Anticipatory Mobile Computing: A Survey of the State of the Art and Research Challenges

    Get PDF
    Today's mobile phones are far from mere communication devices they were ten years ago. Equipped with sophisticated sensors and advanced computing hardware, phones can be used to infer users' location, activity, social setting and more. As devices become increasingly intelligent, their capabilities evolve beyond inferring context to predicting it, and then reasoning and acting upon the predicted context. This article provides an overview of the current state of the art in mobile sensing and context prediction paving the way for full-fledged anticipatory mobile computing. We present a survey of phenomena that mobile phones can infer and predict, and offer a description of machine learning techniques used for such predictions. We then discuss proactive decision making and decision delivery via the user-device feedback loop. Finally, we discuss the challenges and opportunities of anticipatory mobile computing.Comment: 29 pages, 5 figure
    • …
    corecore