7 research outputs found
Improving the Scalability of XCS-Based Learning Classifier Systems
Using evolutionary intelligence and machine learning techniques, a broad
range of intelligent machines have been designed to perform different
tasks. An intelligent machine learns by perceiving its environmental status
and taking an action that maximizes its chances of success.
Human beings have the ability to apply knowledge learned from a
smaller problem to more complex, large-scale problems of the same or a
related domain, but currently the vast majority of evolutionary machine
learning techniques lack this ability. This lack of ability to apply the already
learned knowledge of a domain results in consuming more than
the necessary resources and time to solve complex, large-scale problems
of the domain. As the problem increases in size, it becomes difficult and
even sometimes impractical (if not impossible) to solve due to the needed
resources and time. Therefore, in order to scale in a problem domain, a
systemis needed that has the ability to reuse the learned knowledge of the
domain and/or encapsulate the underlying patterns in the domain.
To extract and reuse building blocks of knowledge or to encapsulate
the underlying patterns in a problem domain, a rich encoding is needed,
but the search space could then expand undesirably and cause bloat, e.g.
as in some forms of genetic programming (GP). Learning classifier systems
(LCSs) are a well-structured evolutionary computation based learning
technique that have pressures to implicitly avoid bloat, such as fitness
sharing through niche based reproduction.
The proposed thesis is that an LCS can scale to complex problems in
a domain by reusing the learnt knowledge from simpler problems of the
domain and/or encapsulating the underlying patterns in the domain. Wilson’s
XCS is used to implement and test the proposed systems, which is a well-tested,
online learning and accuracy based LCS model. To extract the reusable building
blocks of knowledge, GP-tree like, code-fragments are introduced, which are more
than simply another representation (e.g. ternary or real-valued alphabets). This
thesis is extended to capture the underlying patterns in a problemusing a cyclic
representation. Hard problems are experimented to test the newly developed scalable
systems and compare them with benchmark techniques.
Specifically, this work develops four systems to improve the scalability
of XCS-based classifier systems. (1) Building blocks of knowledge are extracted
fromsmaller problems of a Boolean domain and reused in learning
more complex, large-scale problems in the domain, for the first time. By
utilizing the learnt knowledge from small-scale problems, the developed
XCSCFC (i.e. XCS with Code-Fragment Conditions) system readily solves
problems of a scale that existing LCS and GP approaches cannot, e.g. the
135-bitMUX problem. (2) The introduction of the code fragments in classifier
actions in XCSCFA (i.e. XCS with Code-Fragment Actions) enables the
rich representation of GP, which when couples with the divide and conquer
approach of LCS, to successfully solve various complex, overlapping
and niche imbalance Boolean problems that are difficult to solve using numeric
action based XCS. (3) The underlying patterns in a problem domain
are encapsulated in classifier rules encoded by a cyclic representation. The
developed XCSSMA system produces general solutions of any scale n for
a number of important Boolean problems, for the first time in the field of
LCS, e.g. parity problems. (4) Optimal solutions for various real-valued
problems are evolved by extending the existing real-valued XCSR system
with code-fragment actions to XCSRCFA. Exploiting the combined power
of GP and LCS techniques, XCSRCFA successfully learns various continuous
action and function approximation problems that are difficult to learn
using the base techniques.
This research work has shown that LCSs can scale to complex, largescale
problems through reusing learnt knowledge. The messy nature, disassociation of
message to condition order, masking, feature construction, and reuse of extracted
knowledge add additional abilities to the XCS family of LCSs. The ability to use
rich encoding in antecedent GP-like codefragments or consequent cyclic representation
leads to the evolution of accurate, maximally general and compact solutions in learning
various complex Boolean as well as real-valued problems. Effectively exploiting
the combined power of GP and LCS techniques, various continuous action
and function approximation problems are solved in a simple and straight
forward manner.
The analysis of the evolved rules reveals, for the first time in XCS, that
no matter how specific or general the initial classifiers are, all the optimal
classifiers are converged through the mechanism ‘be specific then generalize’
near the final stages of evolution. Also that standard XCS does not use
all available information or all available genetic operators to evolve optimal
rules, whereas the developed code-fragment action based systems effectively use figure
and ground information during the training process.
Thiswork has created a platformto explore the reuse of learnt functionality,
not just terminal knowledge as present, which is needed to replicate human capabilities
Improved techniques for phishing email detection based on random forest and firefly-based support vector machine learning algorithms.
Master of Science in Computer Science. University of KwaZulu-Natal, Durban, 2014.Electronic fraud is one of the major challenges faced by the vast majority of online internet users today. Curbing this menace is not an easy task, primarily because of the rapid rate at which fraudsters change their mode of attack. Many techniques have been proposed in the academic literature to handle e-fraud. Some of them include: blacklist, whitelist, and machine learning (ML) based techniques. Among all these techniques, ML-based techniques have proven to be the most efficient, because of their ability to detect new fraudulent attacks as they appear.There are three commonly perpetrated electronic frauds, namely: email spam, phishing and network intrusion. Among these three, more financial loss has been incurred owing to phishing attacks. This research investigates and reports the use of MLand Nature Inspired technique in the domain of phishing detection, with the foremost objective of developing a dynamic and robust phishing email classifier with improved classification accuracy and reduced processing time.Two approaches to phishing email detection are proposed, and two email classifiers are developed based on the proposed approaches. In the first approach, a random forest algorithm is used to construct decision trees,which are,in turn,used for email classification. The second approach introduced a novel MLmethod that hybridizes firefly algorithm (FFA) and support vector machine (SVM). The hybridized method consists of three major stages: feature extraction phase, hyper-parameter selection phase and email classification phase. In the feature extraction phase, the feature vectors of all the features described in Section 3.6 are extracted and saved in a file for easy access.In the second stage, a novel hyper-parameter search algorithm, developed in this research, is used to generate exponentially growing sequence of paired C and Gamma (γ) values. FFA is then used to optimize the generated SVM hyper-parameters and to also find the best hyper-parameter pair. Finally, in the third phase, SVM is used to carry out the classification. This new approach addresses the problem of hyper-parameter optimization in SVM, and in turn, improves the classification speed and accuracy of SVM. Using two publicly available email datasets, some experiments are performed to evaluate the performance of the two proposed phishing email detection techniques. During the evaluation of each approach, a set of features (well suited for phishing detection) are extracted from the training dataset and used to constructthe classifiers. Thereafter, the trained classifiers are evaluated on the test dataset. The evaluations produced very good results. The RF-based classifier yielded a classification accuracy of 99.70%, a FP rate of 0.06% and a FN rate of 2.50%. Also, the hybridized classifier (known as FFA_SVM) produced a classification accuracy of 99.99%, a FP rate of 0.01% and a FN rate of 0.00%
UM ESTUDO DE MAPEAMENTO SISTEMÁTICO DA MINERAÇÃO DE DADOS PARA CENÁRIOS DE BIG DATA
O volume de dados produzidos tem crescido em larga escala nos últimos anos. Esses dados são de diferentes fontes e diversificados formatos, caracterizando as principais dimensões do Big Data: grande volume, alta velocidade de crescimento e grande variedade de dados. O maior desafio é como gerar informação de qualidade para inferir insights significativos de tais dados variados e grandes. A Mineração de Dados é o processo de identificar, em dados, padrões válidos, novos e potencialmente úteis. No entanto, a infraestrutura de tecnologia da informação tradicional não é capaz de atender as demandas deste novo cenário. O termo atualmente conhecido como Big Data Mining refere-se à extração de informação a partir de grandes bases de dados. Uma questão a ser respondida é como a comunidade científica está abordando o processo de Big Data Mining? Seria válido identificar quais tarefas, métodos e algoritmos vêm sendo aplicados para extrair conhecimento neste contexto. Este artigo tem como objetivo identificar na literatura os trabalhos de pesquisa já realizados no contexto do Big Data Mining. Buscou-se identificar as áreas mais abordadas, os tipos de problemas tratados, as tarefas aplicadas na extração de conhecimento, os métodos aplicados para a realização das tarefas, os algoritmos para a implementação dos métodos, os tipos de dados que vêm sendo minerados, fonte e estrutura dos mesmos. Um estudo de mapeamento sistemático foi conduzido, foram examinados 78 estudos primários. Os resultados obtidos apresentam uma compreensão panorâmica da área investigada, revelando as principais tarefas, métodos e algoritmos aplicados no Big Data Mining
Trusted execution: applications and verification
Useful security properties arise from sealing data to specific units of code. Modern processors featuring Intel’s TXT and AMD’s SVM achieve this by a process of measured and trusted execution. Only code which has the correct measurement can access the data, and this code runs in an environment trusted from observation and interference.
We discuss the history of attempts to provide security for hardware platforms, and review the literature in the field. We propose some applications which would benefit from use of trusted execution, and discuss functionality enabled by trusted execution. We present in more detail a novel variation on Diffie-Hellman key exchange which removes some reliance on random number generation.
We present a modelling language with primitives for trusted execution, along with its semantics. We characterise an attacker who has access to all the capabilities of the hardware. In order to achieve automatic analysis of systems using trusted execution without attempting to search a potentially infinite state space, we define transformations that reduce the number of times the attacker needs to use trusted execution to a pre-determined bound. Given reasonable assumptions we prove the soundness of the transformation: no secrecy attacks are lost by applying it. We then describe using the StatVerif extensions to ProVerif to model the bounded invocations of trusted execution. We show the analysis of realistic systems, for which we provide case studies
Матеріали 1-го симпозіуму з передових освітніх технологій - Том 1: AET
Матеріали 1-го симпозіуму з передових освітніх технологій.Proceedings of the 1st Symposium on Advances in Educational Technology