6,338 research outputs found
Bot recognition in a Web store: An approach based on unsupervised learning
Abstract Web traffic on e-business sites is increasingly dominated by artificial agents (Web bots) which pose a threat to the website security, privacy, and performance. To develop efficient bot detection methods and discover reliable e-customer behavioural patterns, the accurate separation of traffic generated by legitimate users and Web bots is necessary. This paper proposes a machine learning solution to the problem of bot and human session classification, with a specific application to e-commerce. The approach studied in this work explores the use of unsupervised learning (k-means and Graded Possibilistic c-Means), followed by supervised labelling of clusters, a generative learning strategy that decouples modelling the data from labelling them. Its efficiency is evaluated through experiments on real e-commerce data, in realistic conditions, and compared to that of supervised learning classifiers (a multi-layer perceptron neural network and a support vector machine). Results demonstrate that the classification based on unsupervised learning is very efficient, achieving a similar performance level as the fully supervised classification. This is an experimental indication that the bot recognition problem can be successfully dealt with using methods that are less sensitive to mislabelled data or missing labels. A very small fraction of sessions remain misclassified in both cases, so an in-depth analysis of misclassified samples was also performed. This analysis exposed the superiority of the proposed approach which was able to correctly recognize more bots, in fact, and identified more camouflaged agents, that had been erroneously labelled as humans
Business Intelligence from Web Usage Mining
The rapid e-commerce growth has made both business community and customers
face a new situation. Due to intense competition on one hand and the customer's
option to choose from several alternatives business community has realized the
necessity of intelligent marketing strategies and relationship management. Web
usage mining attempts to discover useful knowledge from the secondary data
obtained from the interactions of the users with the Web. Web usage mining has
become very critical for effective Web site management, creating adaptive Web
sites, business and support services, personalization, network traffic flow
analysis and so on. In this paper, we present the important concepts of Web
usage mining and its various practical applications. We further present a novel
approach 'intelligent-miner' (i-Miner) to optimize the concurrent architecture
of a fuzzy clustering algorithm (to discover web data clusters) and a fuzzy
inference system to analyze the Web site visitor trends. A hybrid evolutionary
fuzzy clustering algorithm is proposed in this paper to optimally segregate
similar user interests. The clustered data is then used to analyze the trends
using a Takagi-Sugeno fuzzy inference system learned using a combination of
evolutionary algorithm and neural network learning. Proposed approach is
compared with self-organizing maps (to discover patterns) and several function
approximation techniques like neural networks, linear genetic programming and
Takagi-Sugeno fuzzy inference system (to analyze the clusters). The results are
graphically illustrated and the practical significance is discussed in detail.
Empirical results clearly show that the proposed Web usage-mining framework is
efficient
Market Segmentation Analysis and Visualization Using K-Mode Clustering Algorithm for E-Commerce Business
Today all business organizations are adopting data driven strategies to generate more revenue out of their business. Growing startups are investing a lot of money in data economy to maximize profits of business organizations by developing intelligent tools backed by machine learning and artificial intelligence. The nature of BI tool depends on factor like business goals, size, model, technology etc. In this paper architecture of business intelligence tool and decision process has been discussed with a focus on market segmentation, based on user behavior analysis using k-mode clustering algorithm and user geographical distributions. The proposed toolkit also incorporates interactive visualizations and maps
A document management methodology based on similarity contents
The advent of the WWW and distributed information systems have made it possible to share documents between different users and organisations. However, this has created many problems related to the security, accessibility, right and most importantly the consistency of documents. It is important that the people involved in the documents management process have access to the most up-to-date version of documents, retrieve the correct documents and should be able to update the documents repository in such a way that his or her document are known to others. In this paper we propose a method for organising, storing and retrieving documents based on similarity contents. The method uses techniques based on information retrieval, document indexation and term extraction and indexing. This methodology is developed for the E-Cognos project which aims at developing tools for the management and sharing of documents in the construction domain
Improving intrusion detection systems using data mining techniques
Recent surveys and studies have shown that cyber-attacks have caused a
lot of damage to organisations, governments, and individuals around the world.
Although developments are constantly occurring in the computer security field,
cyber-attacks still cause damage as they are developed and evolved by
hackers. This research looked at some industrial challenges in the intrusion
detection area. The research identified two main challenges; the first one is that
signature-based intrusion detection systems such as SNORT lack the capability of
detecting attacks with new signatures without human intervention. The other
challenge is related to multi-stage attack detection, it has been found that
signature-based is not efficient in this area. The novelty in this research is
presented through developing methodologies tackling the mentioned challenges.
The first challenge was handled by developing a multi-layer classification
methodology. The first layer is based on decision tree, while the second layer is a
hybrid module that uses two data mining techniques; neural network, and fuzzy
logic. The second layer will try to detect new attacks in case the first one fails to
detect. This system detects attacks with new signatures, and then updates the
SNORT signature holder automatically, without any human intervention. The
obtained results have shown that a high detection rate has been obtained with
attacks having new signatures. However, it has been found that the false positive
rate needs to be lowered. The second challenge was approached by evaluating IP
information using fuzzy logic. This approach looks at the identity of participants
in the traffic, rather than the sequence and contents of the traffic. The results have
shown that this approach can help in predicting attacks at very early stages in
some scenarios. However, it has been found that combining this approach with a
different approach that looks at the sequence and contents of the traffic, such as
event- correlation, will achieve a better performance than each approach
individually
AI Solutions for MDS: Artificial Intelligence Techniques for Misuse Detection and Localisation in Telecommunication Environments
This report considers the application of Articial Intelligence (AI) techniques to
the problem of misuse detection and misuse localisation within telecommunications
environments. A broad survey of techniques is provided, that covers inter alia
rule based systems, model-based systems, case based reasoning, pattern matching,
clustering and feature extraction, articial neural networks, genetic algorithms, arti
cial immune systems, agent based systems, data mining and a variety of hybrid
approaches. The report then considers the central issue of event correlation, that
is at the heart of many misuse detection and localisation systems. The notion of
being able to infer misuse by the correlation of individual temporally distributed
events within a multiple data stream environment is explored, and a range of techniques,
covering model based approaches, `programmed' AI and machine learning
paradigms. It is found that, in general, correlation is best achieved via rule based approaches,
but that these suffer from a number of drawbacks, such as the difculty of
developing and maintaining an appropriate knowledge base, and the lack of ability
to generalise from known misuses to new unseen misuses. Two distinct approaches
are evident. One attempts to encode knowledge of known misuses, typically within
rules, and use this to screen events. This approach cannot generally detect misuses
for which it has not been programmed, i.e. it is prone to issuing false negatives.
The other attempts to `learn' the features of event patterns that constitute normal
behaviour, and, by observing patterns that do not match expected behaviour, detect
when a misuse has occurred. This approach is prone to issuing false positives,
i.e. inferring misuse from innocent patterns of behaviour that the system was not
trained to recognise. Contemporary approaches are seen to favour hybridisation,
often combining detection or localisation mechanisms for both abnormal and normal
behaviour, the former to capture known cases of misuse, the latter to capture
unknown cases. In some systems, these mechanisms even work together to update
each other to increase detection rates and lower false positive rates. It is concluded
that hybridisation offers the most promising future direction, but that a rule or state
based component is likely to remain, being the most natural approach to the correlation
of complex events. The challenge, then, is to mitigate the weaknesses of
canonical programmed systems such that learning, generalisation and adaptation
are more readily facilitated
- …