82 research outputs found
Adaptive rule-based malware detection employing learning classifier systems
Efficient and accurate malware detection is increasingly becoming a necessity for society to operate. Existing malware detection systems have excellent performance in identifying known malware for which signatures are available, but poor performance in anomaly detection for zero day exploits for which signatures have not yet been made available or targeted attacks against a specific entity. The primary goal of this thesis is to provide evidence for the potential of learning classier systems to improve the accuracy of malware detection.
A customized system based on a state-of-the-art learning classier system is presented for adaptive rule-based malware detection, which combines a rule-based expert system with evolutionary algorithm based reinforcement learning, thus creating a self-training adaptive malware detection system which dynamically evolves detection rules.
This system is analyzed on a benchmark of malicious and non-malicious files. Experimental results show that the system can outperform C4.5, a well-known non-adaptive machine learning algorithm, under certain conditions. The results demonstrate the system\u27s ability to learn effective rules from repeated presentations of a tagged training set and show the degree of generalization achieved on an independent test set.
This thesis is an extension and expansion of the work published in the Security, Trust, and Privacy for Software Applications workshop in COMPSAC 2011 - the 35th Annual IEEE Signature Conference on Computer Software and Applications --Abstract, page iii
MILCS: A mutual information learning classifier system
This paper introduces a new variety of learning classifier system (LCS), called MILCS, which utilizes mutual information as fitness feedback. Unlike most LCSs, MILCS is specifically designed for supervised learning. MILCS's design draws on an analogy to the structural learning approach of cascade correlation networks. We present preliminary results, and contrast them to results from XCS. We discuss the explanatory power of the resulting rule sets, and introduce a new technique for visualizing explanatory power. Final comments include future directions for this research, including investigations in neural networks and other systems. Copyright 2007 ACM
The XMM Cluster Survey: a new cluster candidate sample and detailed selection function
In this thesis we present the XCS DR3 cluster candidate list. This represents the first major update of the XMM Cluster Survey since 2005. The candidate list comprises of 1365 entries with more than 300 detected counts distributed over 229 deg2. We note that a larger area (523 deg2) is available for the study of X-ray point sources and that the new XCS point source sample has more than 130,000 entries. After redshift follow-up and X-ray spectral analysis, these 1365 clusters will comprise the largest homogeneous sample of medium to high redshift X-ray clusters ever compiled. The future science applications of the XCS DR3 clusters include the study of the evolution of X-ray scaling relations and a measurement of cosmological parameters. In support of these science applications, we also present in this thesis detailed selection functions for the XCS. These selection functions allow us to quantify the number of clusters we didn’t detect in our survey regions. We have taken two approaches to the determination of the selection function: the use of simple (circular & isothermal) β models and the use of ‘observations’ of synthetic clusters from the CLEF N-body simulation. The β model work has allowed us to explore how the selection function depends on key cluster parameters such as luminosity, temperature, redshift, core size and profile shape. We have further explored how the selection function depends on the underlying cosmological model and applied our results to XCS cosmology forecasting (Sahlen et al. 2009). The CLEF work has allowed us to explore more complex cluster properties, such as core temperature, core shape, substructure and ellipticity. In summary, the combination of the cluster catalogues and selection functions presented herein will facilitate field leading science applications for many years to come
Controlled self-organisation using learning classifier systems
The complexity of technical systems increases, breakdowns occur quite often. The mission of organic computing is to tame these challenges by providing degrees of freedom for self-organised behaviour. To achieve these goals, new methods have to be developed. The proposed observer/controller architecture constitutes one way to achieve controlled self-organisation. To improve its design, multi-agent scenarios are investigated. Especially, learning using learning classifier systems is addressed
Model-free reconstruction of neuronal network connectivity from calcium imaging signals
A systematic assessment of global neural network connectivity through direct
electrophysiological assays has remained technically unfeasible even in
dissociated neuronal cultures. We introduce an improved algorithmic approach
based on Transfer Entropy to reconstruct approximations to network structural
connectivities from network activity monitored through calcium fluorescence
imaging. Based on information theory, our method requires no prior assumptions
on the statistics of neuronal firing and neuronal connections. The performance
of our algorithm is benchmarked on surrogate time-series of calcium
fluorescence generated by the simulated dynamics of a network with known
ground-truth topology. We find that the effective network topology revealed by
Transfer Entropy depends qualitatively on the time-dependent dynamic state of
the network (e.g., bursting or non-bursting). We thus demonstrate how
conditioning with respect to the global mean activity improves the performance
of our method. [...] Compared to other reconstruction strategies such as
cross-correlation or Granger Causality methods, our method based on improved
Transfer Entropy is remarkably more accurate. In particular, it provides a good
reconstruction of the network clustering coefficient, allowing to discriminate
between weakly or strongly clustered topologies, whereas on the other hand an
approach based on cross-correlations would invariantly detect artificially high
levels of clustering. Finally, we present the applicability of our method to
real recordings of in vitro cortical cultures. We demonstrate that these
networks are characterized by an elevated level of clustering compared to a
random graph (although not extreme) and by a markedly non-local connectivity.Comment: 54 pages, 8 figures (+9 supplementary figures), 1 table; submitted
for publicatio
Contributions to comprehensible classification
xxx, 240 p.La tesis doctoral descrita en esta memoria ha contribuido a la mejora de dos tipos de algoritmos declasificación comprensibles: algoritmos de \'arboles de decisión consolidados y algoritmos de inducciónde reglas tipo PART.En cuanto a las contribuciones a la consolidación de algoritmos de árboles de decisión, se hapropuesto una nueva estrategia de remuestreo que ajusta el número de submuestras para permitir cambiarla distribución de clases en las submuestras sin perder información. Utilizando esta estrategia, la versiónconsolidada de C4.5 (CTC) obtiene mejores resultados que un amplio conjunto de algoritmoscomprensibles basados en algoritmos genéticos y clásicos. Tres nuevos algoritmos han sido consolidados:una variante de CHAID (CHAID*) y las versiones Probability Estimation Tree de C4.5 y CHAID* (C4.4y CHAIC). Todos los algoritmos consolidados obtienen mejores resultados que sus algoritmos de\'arboles de decisión base, con tres algoritmos consolidados clasificándose entre los cuatro mejores en unacomparativa. Finalmente, se ha analizado el efecto de la poda en algoritmos simples y consolidados de\'arboles de decisión, y se ha concluido que la estrategia de poda propuesta en esta tesis es la que obtiene mejores resultados.En cuanto a las contribuciones a algoritmos tipo PART de inducción de reglas, una primerapropuesta cambia varios aspectos de como PART genera \'arboles parciales y extrae reglas de estos, locual resulta en clasificadores con mejor capacidad de generalizar y menor complejidad estructuralcomparando con los generados por PART. Una segunda propuesta utiliza \'arboles completamentedesarrollados, en vez de parcialmente desarrollados, y genera conjuntos de reglas que obtienen aúnmejores resultados de clasificación y una complejidad estructural menor. Estas dos nuevas propuestas y elalgoritmo PART original han sido complementadas con variantes basadas en CHAID* para observar siestos beneficios pueden ser trasladados a otros algoritmos de \'arboles de decisión y se ha observado, dehecho, que los algoritmos tipo PART basados en CHAID* también crean clasificadores más simples ycon mejor capacidad de clasificar que CHAID
Self-similar scaling and evolution in the galaxy cluster X-ray Luminosity-Temperature relation
We investigate the form and evolution of the X-ray luminosity-temperature
(LT) relation of a sample of 114 galaxy clusters observed with Chandra at
0.1<z<1.3. The clusters were divided into subsamples based on their X-ray
morphology or whether they host strong cool cores. We find that when the core
regions are excluded, the most relaxed clusters (or those with the strongest
cool cores) follow an LT relation with a slope that agrees well with simple
self-similar expectations. This is supported by an analysis of the gas density
profiles of the systems, which shows self-similar behaviour of the gas profiles
of the relaxed clusters outside the core regions. By comparing our data with
clusters in the REXCESS sample, which extends to lower masses, we find evidence
that the self-similar behaviour of even the most relaxed clusters breaks at
around 3.5keV. By contrast, the LT slopes of the subsamples of unrelaxed
systems (or those without strong cool cores) are significantly steeper than the
self-similar model, with lower mass systems appearing less luminous and higher
mass systems appearing more luminous than the self-similar relation. We argue
that these results are consistent with a model of non-gravitational energy
input in clusters that combines central heating with entropy enhancements from
merger shocks. Such enhancements could extend the impact of central energy
input to larger radii in unrelaxed clusters, as suggested by our data. We also
examine the evolution of the LT relation, and find that while the data appear
inconsistent with simple self-similar evolution, the differences can be
plausibly explained by selection bias, and thus we find no reason to rule out
self-similar evolution. We show that the fraction of cool core clusters in our
(non-representative) sample decreases at z>0.5 and discuss the effect of this
on measurements of the evolution in the LT relation.Comment: 21 pages, 15 figures. Submitted to MNRAS. Comments welcom
Controlled self-organisation using learning classifier systems
The complexity of technical systems increases, breakdowns occur quite often. The mission of organic computing is to tame these challenges by providing degrees of freedom for self-organised behaviour. To achieve these goals, new methods have to be developed. The proposed observer/controller architecture constitutes one way to achieve controlled self-organisation. To improve its design, multi-agent scenarios are investigated. Especially, learning using learning classifier systems is addressed
MINES: Mutual Information Neuro-Evolutionary System
Mutual information neuro-evolutionary system (MINES) presents a novel self-governing approach to
determine the optimal quantity and connectivity of the hidden layer of a three layer feed-forward neural
network founded on theoretical and practical basis. The system is a combination of a feed-forward neural
network, back-propagation algorithm, genetic algorithm, mutual information and clustering. Back-propagation
is used for parameter learning to reduce the system’s error; while mutual information aides
back-propagation to follow an effective path in the weight space. A genetic algorithm changes the incoming
synaptic connections of the hidden nodes, based on the fitness provided by the mutual information
from the error space to the hidden layer, to perform structural learning. Mutual information determines
the appropriate synapses, connecting the hidden nodes to the input layer; however, in effect it also links
the back-propagation to the genetic algorithm. Weight clustering is applied to reduce hidden nodes having
similar functionality; i.e. those possessing same connectivity patterns and close Euclidean angle in
the weight space. Finally, the performance of the system is assessed on two theoretical and one empirical
problems. A nonlinear polynomial regression problem and the well known two-spiral classification task
are used to evaluate the theoretical performance of the system. Forecasting daily crude oil prices are
conducted to observe the performance of MINES on a real world application
- …