6 research outputs found
Event-driven Hybrid Classifier Systems and Online Learning for Soccer Game Strategies
The field of robot soccer is a useful setting for the study of artificial intelligence and machin
A tandem evolutionary algorithm for identifying causal rules from complex data
We propose a new evolutionary approach for discovering causal rules in complex classification problems from batch data. Key aspects include (a) the use of a hypergeometric probability mass function as a principled statistic for assessing fitness that quantifies the probability that the observed association between a given clause and target class is due to chance, taking into account the size of the dataset, the amount of missing data, and the distribution of outcome categories, (b) tandem age-layered evolutionary algorithms for evolving parsimonious archives of conjunctive clauses, and disjunctions of these conjunctions, each of which have probabilistically significant associations with outcome classes, and (c) separate archive bins for clauses of different orders, with dynamically adjusted order-specific thresholds. The method is validated on majority-on and multiplexer benchmark problems exhibiting various combinations of heterogeneity, epistasis, overlap, noise in class associations, missing data, extraneous features, and imbalanced classes. We also validate on a more realistic synthetic genome dataset with heterogeneity, epistasis, extraneous features, and noise. In all synthetic epistatic benchmarks, we consistently recover the true causal rule sets used to generate the data. Finally, we discuss an application to a complex real-world survey dataset designed to inform possible ecohealth interventions for Chagas disease
A New Evolutionary Algorithm For Mining Noisy, Epistatic, Geospatial Survey Data Associated With Chagas Disease
The scientific community is just beginning to understand some of the profound affects that feature interactions and heterogeneity have on natural systems. Despite the belief that these nonlinear and heterogeneous interactions exist across numerous real-world systems (e.g., from the development of personalized drug therapies to market predictions of consumer behaviors), the tools for analysis have not kept pace. This research was motivated by the desire to mine data from large socioeconomic surveys aimed at identifying the drivers of household infestation by a Triatomine insect that transmits the life-threatening Chagas disease. To decrease the risk of transmission, our colleagues at the laboratory of applied entomology and parasitology have implemented mitigation strategies (known as Ecohealth interventions); however, limited resources necessitate the search for better risk models. Mining these complex Chagas survey data for potential predictive features is challenging due to imbalanced class outcomes, missing data, heterogeneity, and the non-independence of some features.
We develop an evolutionary algorithm (EA) to identify feature interactions in Big Datasets with desired categorical outcomes (e.g., disease or infestation). The method is non-parametric and uses the hypergeometric PMF as a fitness function to tackle challenges associated with using p-values in Big Data (e.g., p-values decrease inversely with the size of the dataset). To demonstrate the EA effectiveness, we first test the algorithm on three benchmark datasets. These include two classic Boolean classifier problems: (1) the majority-on problem and (2) the multiplexer problem, as well as (3) a simulated single nucleotide polymorphism (SNP) disease dataset. Next, we apply the EA to real-world Chagas Disease survey data and successfully archived numerous high-order feature interactions associated with infestation that would not have been discovered using traditional statistics. These feature interactions are also explored using network analysis. The spatial autocorrelation of the genetic data (SNPs of Triatoma dimidiata) was captured using geostatistics. Specifically, a modified semivariogram analysis was performed to characterize the SNP data and help elucidate the movement of the vector within two villages. For both villages, the SNP information showed strong spatial autocorrelation albeit with different geostatistical characteristics (sills, ranges, and nuggets). These metrics were leveraged to create risk maps that suggest the more forested village had a sylvatic source of infestation, while the other village had a domestic/peridomestic source. This initial exploration into using Big Data to analyze disease risk shows that novel and modified existing statistical tools can improve the assessment of risk on a fine-scale
Improving the Scalability of XCS-Based Learning Classifier Systems
Using evolutionary intelligence and machine learning techniques, a broad
range of intelligent machines have been designed to perform different
tasks. An intelligent machine learns by perceiving its environmental status
and taking an action that maximizes its chances of success.
Human beings have the ability to apply knowledge learned from a
smaller problem to more complex, large-scale problems of the same or a
related domain, but currently the vast majority of evolutionary machine
learning techniques lack this ability. This lack of ability to apply the already
learned knowledge of a domain results in consuming more than
the necessary resources and time to solve complex, large-scale problems
of the domain. As the problem increases in size, it becomes difficult and
even sometimes impractical (if not impossible) to solve due to the needed
resources and time. Therefore, in order to scale in a problem domain, a
systemis needed that has the ability to reuse the learned knowledge of the
domain and/or encapsulate the underlying patterns in the domain.
To extract and reuse building blocks of knowledge or to encapsulate
the underlying patterns in a problem domain, a rich encoding is needed,
but the search space could then expand undesirably and cause bloat, e.g.
as in some forms of genetic programming (GP). Learning classifier systems
(LCSs) are a well-structured evolutionary computation based learning
technique that have pressures to implicitly avoid bloat, such as fitness
sharing through niche based reproduction.
The proposed thesis is that an LCS can scale to complex problems in
a domain by reusing the learnt knowledge from simpler problems of the
domain and/or encapsulating the underlying patterns in the domain. Wilson’s
XCS is used to implement and test the proposed systems, which is a well-tested,
online learning and accuracy based LCS model. To extract the reusable building
blocks of knowledge, GP-tree like, code-fragments are introduced, which are more
than simply another representation (e.g. ternary or real-valued alphabets). This
thesis is extended to capture the underlying patterns in a problemusing a cyclic
representation. Hard problems are experimented to test the newly developed scalable
systems and compare them with benchmark techniques.
Specifically, this work develops four systems to improve the scalability
of XCS-based classifier systems. (1) Building blocks of knowledge are extracted
fromsmaller problems of a Boolean domain and reused in learning
more complex, large-scale problems in the domain, for the first time. By
utilizing the learnt knowledge from small-scale problems, the developed
XCSCFC (i.e. XCS with Code-Fragment Conditions) system readily solves
problems of a scale that existing LCS and GP approaches cannot, e.g. the
135-bitMUX problem. (2) The introduction of the code fragments in classifier
actions in XCSCFA (i.e. XCS with Code-Fragment Actions) enables the
rich representation of GP, which when couples with the divide and conquer
approach of LCS, to successfully solve various complex, overlapping
and niche imbalance Boolean problems that are difficult to solve using numeric
action based XCS. (3) The underlying patterns in a problem domain
are encapsulated in classifier rules encoded by a cyclic representation. The
developed XCSSMA system produces general solutions of any scale n for
a number of important Boolean problems, for the first time in the field of
LCS, e.g. parity problems. (4) Optimal solutions for various real-valued
problems are evolved by extending the existing real-valued XCSR system
with code-fragment actions to XCSRCFA. Exploiting the combined power
of GP and LCS techniques, XCSRCFA successfully learns various continuous
action and function approximation problems that are difficult to learn
using the base techniques.
This research work has shown that LCSs can scale to complex, largescale
problems through reusing learnt knowledge. The messy nature, disassociation of
message to condition order, masking, feature construction, and reuse of extracted
knowledge add additional abilities to the XCS family of LCSs. The ability to use
rich encoding in antecedent GP-like codefragments or consequent cyclic representation
leads to the evolution of accurate, maximally general and compact solutions in learning
various complex Boolean as well as real-valued problems. Effectively exploiting
the combined power of GP and LCS techniques, various continuous action
and function approximation problems are solved in a simple and straight
forward manner.
The analysis of the evolved rules reveals, for the first time in XCS, that
no matter how specific or general the initial classifiers are, all the optimal
classifiers are converged through the mechanism ‘be specific then generalize’
near the final stages of evolution. Also that standard XCS does not use
all available information or all available genetic operators to evolve optimal
rules, whereas the developed code-fragment action based systems effectively use figure
and ground information during the training process.
Thiswork has created a platformto explore the reuse of learnt functionality,
not just terminal knowledge as present, which is needed to replicate human capabilities