160 research outputs found
A multiple expression alignment framework for genetic programming
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsAlignment in the error space is a recent idea to exploit semantic awareness in genetic programming. In a previous contribution, the concepts of optimally aligned and optimally coplanar individuals were introduced, and it was shown that given optimally aligned, or optimally coplanar, individuals, it is possible to construct a globally optimal solution analytically. Consequently, genetic programming methods, aimed at searching for optimally aligned, or optimally coplanar, individuals were introduced. This paper critically discusses those methods, analyzing their major limitations and introduces a new genetic programming system aimed at overcoming those limitations. The presented experimental results, conducted on five real-life symbolic regression problems, show that the proposed algorithms’ outperform not only the existing methods based on the concept of alignment in the error space, but also geometric semantic genetic programming and standard genetic programming
Mining Explicit and Implicit Relationships in Data Using Symbolic Regression
Identification of implicit and explicit relations within observed data is a generic problem commonly encountered in several domains including science, engineering, finance, and more. It forms the core component of data analytics, a process of discovering useful information from data sets that are potentially huge and otherwise incomprehensible. In industries, such information is often instrumental for profitable decision making, whereas in science and engineering it is used to build empirical models, propose new or verify existing theories and explain natural phenomena. In recent times, digital and internet based technologies have proliferated, making it viable to generate and collect large amount of data at low cost. This inturn has resulted in an ever growing need for methods to analyse and draw interpretations from such data quickly and reliably. With this overarching goal, this thesis attempts to make contributions towards developing accurate and efficient methods for discovering such relations through evolutionary search, a method commonly referred to as Symbolic Regression (SR).
A data set of input variables x and a corresponding observed response y is given. The aim is to find an explicit function y = f (x) or an implicit function f (x, y) = 0, which represents the data set. While seemingly simple, the problem is challenging for several reasons. Some of the conventional regression methods try to “guess” a functional form such as linear/quadratic/polynomial, and attempt to do a curve-fitting of the data to the equation, which may limit the possibility of discovering more complex relations, if they exist. On the other hand, there are meta-modelling techniques such as response surface method, Kriging, etc., that model the given data accurately, but provide a “black-box” predictor instead of an expression. Such approximations convey little or no insights about how the variables and responses are dependent on each other, or their relative contribution to the output. SR attempts to alleviate the above two extremes by providing a structure which evolves mathematical expressions instead of assuming them. Thus, it is flexible enough to represent the data, but at the same time provides useful insights instead of a black-box predictor. SR can be categorized as part of Explainable Artificial Intelligence and can contribute to Trustworthy Artificial Intelligence.
The works proposed in this thesis aims to integrate the concept of “semantics” deeper into Genetic Programming (GP) and Evolutionary Feature Synthesis, which are the two algorithms usually employed for conducting SR. The semantics will be integrated into well-known components of the algorithms such as compactness, diversity, recombination, constant optimization, etc. The main contribution of this thesis is the proposal of two novel operators to generate expressions based on Linear Programming and Mixed Integer Programming with the aim of controlling the length of the discovered expressions without compromising on the accuracy. In the experiments, these operators are proven to be able to discover expressions with better accuracy and interpretability on many explicit and implicit benchmarks. Moreover, some applications of SR on real-world data sets are shown to demonstrate the practicality of the proposed approaches. Besides, in related to practical problems, how GP can be applied to effectively solve the Resource Constrained Scheduling Problems is also presented
CAD interface and framework for curve optimisation applications
Computer Aided Design is currently expanding its boundaries to include more design features in its processes. Design is identified as an iterative process converging to solutions satisfying a set of constraints. Its close relation with optimisation indicate that there is strong potential for the integration of optimisation and CAD. The problem addressed in this thesis lies in interfacing the geometric representation of design with other non-geometric aspects. The example of free-form curve modelling is taken to investigate such relationships. Assumptions are made that Optimisation is powered by Evolutionary Computing algorithms like Genetic Algorithms (GA). The geometric definition of curves is commonly supported by NURBS, whose construction constraints are defined locally at the data points. Here the NURBS formulation is used with GA in an attempt to provide complementary handles on the curves shape other than the usual data point coordinates and control points weights. Differential properties are used for optimising NURBS, Hermite interpolation allows for the definition of higher order constraints (tangent, normal, bi-normal) at data points. The assignment of parameter values at the data points, known as parameterisation also provides control of the curve’s shape. Curve optimisation is also performed at the geometric modelling level. Old mathematical theorems established by Frénet and further developed by other mathematicians provide means of defining a curve’s shape with it’s intrinsic equations. Such representation is possible by using Function Representation (F-rep) algebra available in the ACIS software. Frep allows more generic and exact means of interfacing with the curve’s geometry and new functionality for curve inspection and optimisation are proposed in this thesis. The integration of optimisation findings and CAD are documented in the definition of a framework. The framework architecture proposed reconstructs a new CAD environment from separate elements bolted together in a generic Application Programming Interface (API) named “Oli interface”. Functionality created to interface optimisation and CAD makes a requirement list of the work that both sides should undertake to achieve design optimisation in the CAD environment.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Recommended from our members
Automatic classification of digital communication signal modulations
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel UniversityAutomatic modulation classification detects the modulation type of received communication signals. It has important applications in military scenarios to facilitate jamming, intelligence, surveillance, and threat analysis. The renewed interest from civilian scenes has been fuelled by the development of intelligent communications systems such as cognitive radio and software defined radio. More specifically, it is complementary to adaptive modulation and coding where a modulation can be deployed from a set of candidates according to the channel condition and system specification for improved spectrum efficiency and link reliability. In this research, we started by improving some existing methods for higher classification accuracy but lower complexity. Machine learning techniques such as k-nearest neighbour and support vector machine have been adopted for simplified decision making using known features. Logistic regression, genetic algorithm and genetic programming have been incorporated for improved classification performance through feature selection and combination. We have also developed a new distribution test based classifier which is tailored for modulation classification
with the inspiration from Kolmogorov-Smirnov test. The proposed classifier is shown to have improved accuracy and robustness over the standard distribution test. For blind classification in imperfect channels, we developed the combination of minimum distance centroid estimator and non-parametric likelihood function for blind modulation classification without the prior knowledge on channel noise. The centroid estimator provides joint estimation of channel gain and carrier phase o set where both can be compensated in the following nonparametric likelihood function. The non-parametric likelihood function, in the meantime, provide likelihood evaluation without a specifically assumed noise model. The combination has shown to have higher robustness when different noise types are considered. To push modulation classification techniques into a more timely setting, we also developed the principle for blind classification in MIMO systems. The classification is achieved through expectation maximization channel estimation and likelihood based classification. Early results have
shown bright prospect for the method while more work is needed to further optimize the method and to provide a more thorough validation.School of Engineering and Design Brunel University London, the Faculty of Engineering University of Liverpool, and the University of Liverpool Graduate Association (Hong Kong)
Enabling Machine Science through Distributed Human Computing
Distributed human computing techniques have been shown to be effective ways of accessing the problem-solving capabilities of a large group of anonymous individuals over the World Wide Web. They have been successfully applied to such diverse domains as computer security, biology and astronomy. The success of distributed human computing in various domains suggests that it can be utilized for complex collaborative problem solving. Thus it could be used for machine science : utilizing machines to facilitate the vetting of disparate human hypotheses for solving scientific and engineering problems.
In this thesis, we show that machine science is possible through distributed human computing methods for some tasks. By enabling anonymous individuals to collaborate in a way that parallels the scientific method -- suggesting hypotheses, testing and then communicating them for vetting by other participants -- we demonstrate that a crowd can together define robot control strategies, design robot morphologies capable of fast-forward locomotion and contribute features to machine learning models for residential electric energy usage. We also introduce a new methodology for empowering a fully automated robot design system by seeding it with intuitions distilled from the crowd.
Our findings suggest that increasingly large, diverse and complex collaborations that combine people and machines in the right way may enable problem solving in a wide range of fields
Front Matter - Soft Computing for Data Mining Applications
Efficient tools and algorithms for knowledge discovery in large data sets have been devised during the recent years. These methods exploit the capability of computers to search huge amounts of data in a fast and effective manner. However, the data to be analyzed is imprecise and afflicted with uncertainty. In the case of heterogeneous data sources such as text, audio and video, the data might moreover be ambiguous and partly conflicting. Besides, patterns and relationships of interest are usually vague and approximate. Thus, in order to make the information mining process more robust or say, human-like methods for searching and learning it requires tolerance towards imprecision, uncertainty and exceptions. Thus, they have approximate reasoning capabilities and are capable of handling partial truth. Properties of the aforementioned kind are typical soft computing. Soft computing techniques like Genetic
Complexity, Emergent Systems and Complex Biological Systems:\ud Complex Systems Theory and Biodynamics. [Edited book by I.C. Baianu, with listed contributors (2011)]
An overview is presented of System dynamics, the study of the behaviour of complex systems, Dynamical system in mathematics Dynamic programming in computer science and control theory, Complex systems biology, Neurodynamics and Psychodynamics.\u
Genetic programming for manufacturing optimisation.
A considerable number of optimisation techniques have been proposed for the solution of problems associated with the manufacturing process. Evolutionary computation methods, a group of non-deterministic search algorithms that employ the concept of Darwinian strife for survival to guide the search for optimal solutions, have been extensively used for this purpose.
Genetic programming is an evolutionary algorithm that evolves variable-length solution representations in the form of computer programs. While genetic programming has produced successful applications in a variety of optimisation fields, genetic programming methodologies for the solution of manufacturing optimisation problems have rarely been reported. The applicability of genetic programming in the field of manufacturing optimisation is investigated in this thesis. Three well-known problems were used for this purpose: the one-machine total tardiness problem, the cell-formation problem and the multiobjective process planning selection problem. The main contribution of this thesis is the introduction of novel genetic programming frameworks for the solution of these problems.
In the case of the one-machine total tardiness problem genetic programming employed combinations of dispatching rules for the indirect representation of job schedules. The hybridisation of genetic programming with alternative search algorithms was proposed for the solution of more difficult problem instances. In addition, genetic programming was used for the evolution of new dispatching rules that challenged the efficiency of man-made dispatching rules for the solution of the problem.
An integrated genetic programming - hierarchical clustering approach was proposed for the solution of simple and advanced formulations of the cell-formation problem. The proposed framework produced competitive results to alternative methodologies that have been proposed for the solution of the same problem. The evolution of similarity coefficients that can be used in combination with clustering techniques for the solution of cell-formation problems was also investigated.
Finally, genetic programming was combined with a number of evolutionary multiobjective techniques for the solution of the multiobjective process planning selection problem. Results on test problems illustrated the ability of the proposed methodology to provide a wealth of potential solutions to the decision-maker
- …