1,634 research outputs found
AI Solutions for MDS: Artificial Intelligence Techniques for Misuse Detection and Localisation in Telecommunication Environments
This report considers the application of Articial Intelligence (AI) techniques to
the problem of misuse detection and misuse localisation within telecommunications
environments. A broad survey of techniques is provided, that covers inter alia
rule based systems, model-based systems, case based reasoning, pattern matching,
clustering and feature extraction, articial neural networks, genetic algorithms, arti
cial immune systems, agent based systems, data mining and a variety of hybrid
approaches. The report then considers the central issue of event correlation, that
is at the heart of many misuse detection and localisation systems. The notion of
being able to infer misuse by the correlation of individual temporally distributed
events within a multiple data stream environment is explored, and a range of techniques,
covering model based approaches, `programmed' AI and machine learning
paradigms. It is found that, in general, correlation is best achieved via rule based approaches,
but that these suffer from a number of drawbacks, such as the difculty of
developing and maintaining an appropriate knowledge base, and the lack of ability
to generalise from known misuses to new unseen misuses. Two distinct approaches
are evident. One attempts to encode knowledge of known misuses, typically within
rules, and use this to screen events. This approach cannot generally detect misuses
for which it has not been programmed, i.e. it is prone to issuing false negatives.
The other attempts to `learn' the features of event patterns that constitute normal
behaviour, and, by observing patterns that do not match expected behaviour, detect
when a misuse has occurred. This approach is prone to issuing false positives,
i.e. inferring misuse from innocent patterns of behaviour that the system was not
trained to recognise. Contemporary approaches are seen to favour hybridisation,
often combining detection or localisation mechanisms for both abnormal and normal
behaviour, the former to capture known cases of misuse, the latter to capture
unknown cases. In some systems, these mechanisms even work together to update
each other to increase detection rates and lower false positive rates. It is concluded
that hybridisation offers the most promising future direction, but that a rule or state
based component is likely to remain, being the most natural approach to the correlation
of complex events. The challenge, then, is to mitigate the weaknesses of
canonical programmed systems such that learning, generalisation and adaptation
are more readily facilitated
Flashes in a Star Stream: Automated Classification of Astronomical Transient Events
An automated, rapid classification of transient events detected in the modern
synoptic sky surveys is essential for their scientific utility and effective
follow-up using scarce resources. This presents some unusual challenges: the
data are sparse, heterogeneous and incomplete; evolving in time; and most of
the relevant information comes not from the data stream itself, but from a
variety of archival data and contextual information (spatial, temporal, and
multi-wavelength). We are exploring a variety of novel techniques, mostly
Bayesian, to respond to these challenges, using the ongoing CRTS sky survey as
a testbed. The current surveys are already overwhelming our ability to
effectively follow all of the potentially interesting events, and these
challenges will grow by orders of magnitude over the next decade as the more
ambitious sky surveys get under way. While we focus on an application in a
specific domain (astrophysics), these challenges are more broadly relevant for
event or anomaly detection and knowledge discovery in massive data streams.Comment: 8 pages, to appear in refereed proceedings of the IEEE eScience 2012
conference, October 2012, IEEE Pres
Click fraud : how to spot it, how to stop it?
Online search advertising is currently the greatest source of revenue for many Internet giants such as Googleâ„¢, Yahoo!â„¢, and Bingâ„¢. The increased number of specialized websites and modern profiling techniques have all contributed to an explosion of the income of ad brokers from online advertising. The single biggest threat to this growth is however click fraud. Trained botnets and even individuals are hired by click-fraud specialists in order to maximize the revenue of certain users from the ads they publish on their websites, or to launch an attack between competing businesses. Most academics and consultants who study online advertising estimate that 15% to 35% of ads in pay per click (PPC) online advertising systems are not authentic. In the first two quarters of 2010, US marketers alone spent 1.5 billion is wasted due to click-fraud. These fraudulent clicks are believed to be initiated by users in poor countries, or botnets, who are trained to click on specific ads. For example, according to a 2010 study from Information Warfare Monitor, the operators of Koobface, a program that installed malicious software to participate in click fraud, made over $2 million in just over a year. The process of making such illegitimate clicks to generate revenue is called click-fraud. Search engines claim they filter out most questionable clicks and either not charge for them or reimburse advertisers that have been wrongly billed. However this is a hard task, despite the claims that brokers\u27 efforts are satisfactory. In the simplest scenario, a publisher continuously clicks on the ads displayed on his own website in order to make revenue. In a more complicated scenario. a travel agent may hire a large, globally distributed, botnet to click on its competitor\u27s ads, hence depleting their daily budget. We analyzed those different types of click fraud methods and proposed new methodologies to detect and prevent them real time. While traditional commercial approaches detect only some specific types of click fraud, Collaborative Click Fraud Detection and Prevention (CCFDP) system, an architecture that we have implemented based on the proposed methodologies, can detect and prevents all major types of click fraud. The proposed solution analyzes the detailed user activities on both, the server side and client side collaboratively to better describe the intention of the click. Data fusion techniques are developed to combine evidences from several data mining models and to obtain a better estimation of the quality of the click traffic. Our ideas are experimented through the development of the Collaborative Click Fraud Detection and Prevention (CCFDP) system. Experimental results show that the CCFDP system is better than the existing commercial click fraud solution in three major aspects: 1) detecting more click fraud especially clicks generated by software; 2) providing prevention ability; 3) proposing the concept of click quality score for click quality estimation. In the CCFDP initial version, we analyzed the performances of the click fraud detection and prediction model by using a rule base algorithm, which is similar to most of the existing systems. We have assigned a quality score for each click instead of classifying the click as fraud or genuine, because it is hard to get solid evidence of click fraud just based on the data collected, and it is difficult to determine the real intention of users who make the clicks. Results from initial version revealed that the diversity of CF attack Results from initial version revealed that the diversity of CF attack types makes it hard for a single counter measure to prevent click fraud. Therefore, it is important to be able to combine multiple measures capable of effective protection from click fraud. Therefore, in the CCFDP improved version, we provide the traffic quality score as a combination of evidence from several data mining algorithms. We have tested the system with a data from an actual ad campaign in 2007 and 2008. We have compared the results with Google Adwords reports for the same campaign. Results show that a higher percentage of click fraud present even with the most popular search engine. The multiple model based CCFDP always estimated less valid traffic compare to Google. Sometimes the difference is as high as 53%. Detection of duplicates, fast and efficient, is one of the most important requirement in any click fraud solution. Usually duplicate detection algorithms run in real time. In order to provide real time results, solution providers should utilize data structures that can be updated in real time. In addition, space requirement to hold data should be minimum. In this dissertation, we also addressed the problem of detecting duplicate clicks in pay-per-click streams. We proposed a simple data structure, Temporal Stateful Bloom Filter (TSBF), an extension to the regular Bloom Filter and Counting Bloom Filter. The bit vector in the Bloom Filter was replaced with a status vector. Duplicate detection results of TSBF method is compared with Buffering, FPBuffering, and CBF methods. False positive rate of TSBF is less than 1% and it does not have false negatives. Space requirement of TSBF is minimal among other solutions. Even though Buffering does not have either false positives or false negatives its space requirement increases exponentially with the size of the stream data size. When the false positive rate of the FPBuffering is set to 1% its false negative rate jumps to around 5%, which will not be tolerated by most of the streaming data applications. We also compared the TSBF results with CBF. TSBF uses only half the space or less than standard CBF with the same false positive probability. One of the biggest successes with CCFDP is the discovery of new mercantile click bot, the Smart ClickBot. We presented a Bayesian approach for detecting the Smart ClickBot type clicks. The system combines evidence extracted from web server sessions to determine the final class of each click. Some of these evidences can be used alone, while some can be used in combination with other features for the click bot detection. During training and testing we also addressed the class imbalance problem. Our best classifier shows recall of 94%. and precision of 89%, with F1 measure calculated as 92%. The high accuracy of our system proves the effectiveness of the proposed methodology. Since the Smart ClickBot is a sophisticated click bot that manipulate every possible parameters to go undetected, the techniques that we discussed here can lead to detection of other types of software bots too. Despite the enormous capabilities of modern machine learning and data mining techniques in modeling complicated problems, most of the available click fraud detection systems are rule-based. Click fraud solution providers keep the rules as a secret weapon and bargain with others to prove their superiority. We proposed validation framework to acquire another model of the clicks data that is not rule dependent, a model that learns the inherent statistical regularities of the data. Then the output of both models is compared. Due to the uniqueness of the CCFDP system architecture, it is better than current commercial solution and search engine/ISP solution. The system protects Pay-Per-Click advertisers from click fraud and improves their Return on Investment (ROI). The system can also provide an arbitration system for advertiser and PPC publisher whenever the click fraud argument arises. Advertisers can gain their confidence on PPC advertisement by having a channel to argue the traffic quality with big search engine publishers. The results of this system will booster the internet economy by eliminating the shortcoming of PPC business model. General consumer will gain their confidence on internet business model by reducing fraudulent activities which are numerous in current virtual internet world
Evoked Potentials during Language Processing as Neurophysiological Phenomena
The evoked, event-related potential of the EEG has been extensively employed to study language processing. But what is the ERP? An extensive discussion of contemporary theories about the neurophysiology underlying late ERPs is given. Then, in a series of experiments, domain-general perspectives on ERP components are tested regarding their applicability for language-related brain activity. A range of analysis methods (some of which have not been previously applied to the study of auditory sentence processing) such as single-trial analyses and independent component decomposition, demonstrate the degree to which domain general mechanisms explain the language-related EEG
Recommended from our members
Data-driven Discovery of Transients in the New Era of Time-Domain Astronomy
Time-domain astronomy has reached an incredible new era where unprecedented amounts of data are becoming available. New large-scale astronomical surveys such as the Legacy Survey of Space and Time (LSST) are going to revolutionise transient astronomy, providing opportunities to discover entirely new classes of transients while also enabling a deeper understanding of known classes. LSST is expected to observe over 10 million transient alerts every night, at least two orders of magnitude more than any preceding survey. It has never been more important that astronomers develop fast and automated methods of identifying transient candidates for follow-up observations.
In this thesis, I tackle two major challenges facing the future of transient astronomy: the early classification of transients and the detection of rare or previously unknown transients. I detail my development of a number of novel methods dealing with these issues. In the first chapter, I provide an introduction to the field of transient astronomy and motivate why new methods of transient identification are necessary. In the second chapter, I detail the development of a new photometric transient classifier, called RAPID, that is able to automatically classify a range of astronomical transients in real-time. My deep neural network architecture is the first method designed to provide early classifications of astronomical transients. In Chapter 3, I identify the issue that with such large data volumes, the astronomical community will struggle to identify rare and interesting anomalous transients that have previously been found serendipitously. I outline my novel method that uses a Bayesian parametric fit of light curves to identify anomalous transients in real-time. In Chapter 4, I highlight some issues with current photometric classifiers and improve upon RAPID so that it is capable of dealing with real data instead of just simulations. I present classifiers that perform effectively on real data from the Zwicky Transient Facility and the PanSTARRS surveys. Finally, in the last chapter, I discuss the conclusions of my work and highlight some future opportunities and work needed in preparing for discovery in the new era of time-domain astronomy.Cambridge Trust;
Cambridge Australia Scholarship
Towards an Automated Classification of Transient Events in Synoptic Sky Surveys
We describe the development of a system for an automated, iterative,
real-time classification of transient events discovered in synoptic sky
surveys. The system under development incorporates a number of Machine Learning
techniques, mostly using Bayesian approaches, due to the sparse nature,
heterogeneity, and variable incompleteness of the available data. The
classifications are improved iteratively as the new measurements are obtained.
One novel feature is the development of an automated follow-up recommendation
engine, that suggest those measurements that would be the most advantageous in
terms of resolving classification ambiguities and/or characterization of the
astrophysically most interesting objects, given a set of available follow-up
assets and their cost functions. This illustrates the symbiotic relationship of
astronomy and applied computer science through the emerging discipline of
AstroInformatics.Comment: Invited paper, 15 pages, to appear in Statistical Analysis and Data
Mining (ASA journal), ref. proc. CIDU 2011 conf., eds. A. Srivasatva & N.
Chawla, in press (2011
Phytoplankton-bacteria coupling under elevated CO<sub>2</sub> levels: a stable isotope labelling study
The potential impact of rising carbon dioxide (CO2) on carbon transfer from phytoplankton to bacteria was investigated during the 2005 PeECE III mesocosm study in Bergen, Norway. Sets of mesocosms, in which a phytoplankton bloom was induced by nutrient addition, were incubated under 1× (~350 μatm), 2× (~700 μatm), and 3× present day CO2 (~1050 μatm) initial seawater and sustained atmospheric CO2 levels for 3 weeks. 13C labelled bicarbonate was added to all mesocosms to follow the transfer of carbon from dissolved inorganic carbon (DIC) into phytoplankton and subsequently heterotrophic bacteria, and settling particles. Isotope ratios of polar-lipid-derived fatty acids (PLFA) were used to infer the biomass and production of phytoplankton and bacteria. Phytoplankton PLFA were enriched within one day after label addition, whilst it took another 3 days before bacteria showed substantial enrichment. Group-specific primary production measurements revealed that coccolithophores showed higher primary production than green algae and diatoms. Elevated CO2 had a significant positive effect on post-bloom biomass of green algae, diatoms, and bacteria. A simple model based on measured isotope ratios of phytoplankton and bacteria revealed that CO2 had no significant effect on the carbon transfer efficiency from phytoplankton to bacteria during the bloom. There was no indication of CO2 effects on enhanced settling based on isotope mixing models during the phytoplankton bloom, but this could not be determined in the post-bloom phase. Our results suggest that CO2 effects are most pronounced in the post-bloom phase, under nutrient limitation
- …