160 research outputs found

    A multiple expression alignment framework for genetic programming

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsAlignment in the error space is a recent idea to exploit semantic awareness in genetic programming. In a previous contribution, the concepts of optimally aligned and optimally coplanar individuals were introduced, and it was shown that given optimally aligned, or optimally coplanar, individuals, it is possible to construct a globally optimal solution analytically. Consequently, genetic programming methods, aimed at searching for optimally aligned, or optimally coplanar, individuals were introduced. This paper critically discusses those methods, analyzing their major limitations and introduces a new genetic programming system aimed at overcoming those limitations. The presented experimental results, conducted on five real-life symbolic regression problems, show that the proposed algorithms’ outperform not only the existing methods based on the concept of alignment in the error space, but also geometric semantic genetic programming and standard genetic programming

    Mining Explicit and Implicit Relationships in Data Using Symbolic Regression

    Full text link
    Identification of implicit and explicit relations within observed data is a generic problem commonly encountered in several domains including science, engineering, finance, and more. It forms the core component of data analytics, a process of discovering useful information from data sets that are potentially huge and otherwise incomprehensible. In industries, such information is often instrumental for profitable decision making, whereas in science and engineering it is used to build empirical models, propose new or verify existing theories and explain natural phenomena. In recent times, digital and internet based technologies have proliferated, making it viable to generate and collect large amount of data at low cost. This inturn has resulted in an ever growing need for methods to analyse and draw interpretations from such data quickly and reliably. With this overarching goal, this thesis attempts to make contributions towards developing accurate and efficient methods for discovering such relations through evolutionary search, a method commonly referred to as Symbolic Regression (SR). A data set of input variables x and a corresponding observed response y is given. The aim is to find an explicit function y = f (x) or an implicit function f (x, y) = 0, which represents the data set. While seemingly simple, the problem is challenging for several reasons. Some of the conventional regression methods try to “guess” a functional form such as linear/quadratic/polynomial, and attempt to do a curve-fitting of the data to the equation, which may limit the possibility of discovering more complex relations, if they exist. On the other hand, there are meta-modelling techniques such as response surface method, Kriging, etc., that model the given data accurately, but provide a “black-box” predictor instead of an expression. Such approximations convey little or no insights about how the variables and responses are dependent on each other, or their relative contribution to the output. SR attempts to alleviate the above two extremes by providing a structure which evolves mathematical expressions instead of assuming them. Thus, it is flexible enough to represent the data, but at the same time provides useful insights instead of a black-box predictor. SR can be categorized as part of Explainable Artificial Intelligence and can contribute to Trustworthy Artificial Intelligence. The works proposed in this thesis aims to integrate the concept of “semantics” deeper into Genetic Programming (GP) and Evolutionary Feature Synthesis, which are the two algorithms usually employed for conducting SR. The semantics will be integrated into well-known components of the algorithms such as compactness, diversity, recombination, constant optimization, etc. The main contribution of this thesis is the proposal of two novel operators to generate expressions based on Linear Programming and Mixed Integer Programming with the aim of controlling the length of the discovered expressions without compromising on the accuracy. In the experiments, these operators are proven to be able to discover expressions with better accuracy and interpretability on many explicit and implicit benchmarks. Moreover, some applications of SR on real-world data sets are shown to demonstrate the practicality of the proposed approaches. Besides, in related to practical problems, how GP can be applied to effectively solve the Resource Constrained Scheduling Problems is also presented

    CAD interface and framework for curve optimisation applications

    Get PDF
    Computer Aided Design is currently expanding its boundaries to include more design features in its processes. Design is identified as an iterative process converging to solutions satisfying a set of constraints. Its close relation with optimisation indicate that there is strong potential for the integration of optimisation and CAD. The problem addressed in this thesis lies in interfacing the geometric representation of design with other non-geometric aspects. The example of free-form curve modelling is taken to investigate such relationships. Assumptions are made that Optimisation is powered by Evolutionary Computing algorithms like Genetic Algorithms (GA). The geometric definition of curves is commonly supported by NURBS, whose construction constraints are defined locally at the data points. Here the NURBS formulation is used with GA in an attempt to provide complementary handles on the curves shape other than the usual data point coordinates and control points weights. Differential properties are used for optimising NURBS, Hermite interpolation allows for the definition of higher order constraints (tangent, normal, bi-normal) at data points. The assignment of parameter values at the data points, known as parameterisation also provides control of the curve’s shape. Curve optimisation is also performed at the geometric modelling level. Old mathematical theorems established by Frénet and further developed by other mathematicians provide means of defining a curve’s shape with it’s intrinsic equations. Such representation is possible by using Function Representation (F-rep) algebra available in the ACIS software. Frep allows more generic and exact means of interfacing with the curve’s geometry and new functionality for curve inspection and optimisation are proposed in this thesis. The integration of optimisation findings and CAD are documented in the definition of a framework. The framework architecture proposed reconstructs a new CAD environment from separate elements bolted together in a generic Application Programming Interface (API) named “Oli interface”. Functionality created to interface optimisation and CAD makes a requirement list of the work that both sides should undertake to achieve design optimisation in the CAD environment.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Enabling Machine Science through Distributed Human Computing

    Get PDF
    Distributed human computing techniques have been shown to be effective ways of accessing the problem-solving capabilities of a large group of anonymous individuals over the World Wide Web. They have been successfully applied to such diverse domains as computer security, biology and astronomy. The success of distributed human computing in various domains suggests that it can be utilized for complex collaborative problem solving. Thus it could be used for machine science : utilizing machines to facilitate the vetting of disparate human hypotheses for solving scientific and engineering problems. In this thesis, we show that machine science is possible through distributed human computing methods for some tasks. By enabling anonymous individuals to collaborate in a way that parallels the scientific method -- suggesting hypotheses, testing and then communicating them for vetting by other participants -- we demonstrate that a crowd can together define robot control strategies, design robot morphologies capable of fast-forward locomotion and contribute features to machine learning models for residential electric energy usage. We also introduce a new methodology for empowering a fully automated robot design system by seeding it with intuitions distilled from the crowd. Our findings suggest that increasingly large, diverse and complex collaborations that combine people and machines in the right way may enable problem solving in a wide range of fields

    Front Matter - Soft Computing for Data Mining Applications

    Get PDF
    Efficient tools and algorithms for knowledge discovery in large data sets have been devised during the recent years. These methods exploit the capability of computers to search huge amounts of data in a fast and effective manner. However, the data to be analyzed is imprecise and afflicted with uncertainty. In the case of heterogeneous data sources such as text, audio and video, the data might moreover be ambiguous and partly conflicting. Besides, patterns and relationships of interest are usually vague and approximate. Thus, in order to make the information mining process more robust or say, human-like methods for searching and learning it requires tolerance towards imprecision, uncertainty and exceptions. Thus, they have approximate reasoning capabilities and are capable of handling partial truth. Properties of the aforementioned kind are typical soft computing. Soft computing techniques like Genetic

    Complexity, Emergent Systems and Complex Biological Systems:\ud Complex Systems Theory and Biodynamics. [Edited book by I.C. Baianu, with listed contributors (2011)]

    Get PDF
    An overview is presented of System dynamics, the study of the behaviour of complex systems, Dynamical system in mathematics Dynamic programming in computer science and control theory, Complex systems biology, Neurodynamics and Psychodynamics.\u

    Genetic programming for manufacturing optimisation.

    Get PDF
    A considerable number of optimisation techniques have been proposed for the solution of problems associated with the manufacturing process. Evolutionary computation methods, a group of non-deterministic search algorithms that employ the concept of Darwinian strife for survival to guide the search for optimal solutions, have been extensively used for this purpose. Genetic programming is an evolutionary algorithm that evolves variable-length solution representations in the form of computer programs. While genetic programming has produced successful applications in a variety of optimisation fields, genetic programming methodologies for the solution of manufacturing optimisation problems have rarely been reported. The applicability of genetic programming in the field of manufacturing optimisation is investigated in this thesis. Three well-known problems were used for this purpose: the one-machine total tardiness problem, the cell-formation problem and the multiobjective process planning selection problem. The main contribution of this thesis is the introduction of novel genetic programming frameworks for the solution of these problems. In the case of the one-machine total tardiness problem genetic programming employed combinations of dispatching rules for the indirect representation of job schedules. The hybridisation of genetic programming with alternative search algorithms was proposed for the solution of more difficult problem instances. In addition, genetic programming was used for the evolution of new dispatching rules that challenged the efficiency of man-made dispatching rules for the solution of the problem. An integrated genetic programming - hierarchical clustering approach was proposed for the solution of simple and advanced formulations of the cell-formation problem. The proposed framework produced competitive results to alternative methodologies that have been proposed for the solution of the same problem. The evolution of similarity coefficients that can be used in combination with clustering techniques for the solution of cell-formation problems was also investigated. Finally, genetic programming was combined with a number of evolutionary multiobjective techniques for the solution of the multiobjective process planning selection problem. Results on test problems illustrated the ability of the proposed methodology to provide a wealth of potential solutions to the decision-maker
    • …
    corecore