146 research outputs found
Sequential Symbolic Regression with Genetic Programming
This chapter describes the Sequential Symbolic Regression (SSR) method, a new strategy for function approximation in symbolic regression. The SSR method is inspired by the sequential covering strategy from machine learning, but instead of sequentially reducing the size of the
problem being solved, it sequentially transforms the original problem into potentially simpler problems. This transformation is performed according to the semantic distances between the desired and obtained outputs and a geometric semantic operator. The rationale behind SSR is that, after generating a suboptimal function f via symbolic regression, the output errors can be approximated by another function in a subsequent iteration. The method was tested in eight polynomial functions, and compared with canonical genetic programming (GP) and geometric semantic genetic programming (SGP). Results showed that SSR significantly outperforms SGP and presents no statistical difference to GP. More importantly, they show the potential of the proposed strategy: an effective way of applying geometric semantic operators to combine different (partial) solutions, avoiding the exponential growth problem arising from the use of these operators
Supporting medical decisions for treating rare diseases through genetic programming
Bakurov, I., Castelli, M., Vanneschi, L., & Freitas, M. J. (2019). Supporting medical decisions for treating rare diseases through genetic programming. In P. Kaufmann, & P. A. Castillo (Eds.), Applications of Evolutionary Computation: 22nd International Conference, EvoApplications 2019, Held as Part of EvoStar 2019, Proceedings (pp. 187-203). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11454 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-030-16692-2_13. ISBN: 978-3-030-16691-5; Online ISBN: 978-3-030-16692-2Casa dos Marcos is the largest specialized medical and residential center for rare diseases in the Iberian Peninsula. The large number of patients and the uniqueness of their diseases demand a considerable amount of diverse and highly personalized therapies, that are nowadays largely managed manually. This paper aims at catering for the emergent need of efficient and effective artificial intelligence systems for the support of the everyday activities of centers like Casa dos Marcos. We present six predictive data models developed with a genetic programming based system which, integrated into a web-application, enabled data-driven support for the therapists in Casa dos Marcos. The presented results clearly indicate the usefulness of the system in assisting complex therapeutic procedures for children suffering from rare diseases.authorsversionpublishe
Investigating the Use of Geometric Semantic Operators in Vectorial Genetic Programming
Azzali, I., Vanneschi, L., & Giacobini, M. (2020). Investigating the Use of Geometric Semantic Operators in Vectorial Genetic Programming. In T. Hu, N. Lourenço, E. Medvet, & F. Divina (Eds.), Genetic Programming - 23rd European Conference, EuroGP 2020, Held as Part of EvoStar 2020, Proceedings (pp. 52-67). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12101 LNCS). Springer. https://doi.org/10.1007/978-3-030-44094-7_4 ------- This work was partially supported by FCT, Portugal through funding of LASIGE Research Unit (UID/CEC/00408/2019), and projects PREDICT (PTDC/CCI-IF/29877/2017), BINDER (PTDC/CCI-INF/29168/2017), GADgET (DSAIPA/DS/0022/2018) and AICE (DSAIPA/DS/0113/2019).Vectorial Genetic Programming (VE_GP) is a new GP approach for panel data forecasting. Besides permitting the use of vectors as terminal symbols to represent time series and including aggregation functions to extract time series features, it introduces the possibility of evolving the window of aggregation. The local aggregation of data allows the identification of meaningful patterns overcoming the drawback of considering always the previous history of a series of data. In this work, we investigate the use of geometric semantic operators (GSOs) in VE_GP, comparing its performance with traditional GP with GSOs. Experiments are conducted on two real panel data forecasting problems, one allowing the aggregation on moving windows, one not. Results show that classical VE_GP is the best approach in both cases in terms of predictive accuracy, suggesting that GSOs are not able to evolve efficiently individuals when time series are involved. We discuss the possible reasons of this behaviour, to understand how we could design valuable GSOs for time series in the future.authorsversionpublishe
Investigation of landslide failure mechanisms adjacent to lignite mining operations in North Bohemia (Czech Republic) through a limit equilibrium/finite element modelling approach
This is the author accepted manuscript. The final version is available from Elsevier via the DOI in this record.Understanding the impact of data uncertainty is a fundamental part of ensuring safe design of manmade excavations. Although good levels of knowledge are achievable from field investigations and experience, a natural geological environment is subject to intrinsic variability that may compromise the correct prediction of the system response to the perturbations caused by mining, with direct consequences for the stability and safety of the operations. Different types of geoscientific evidence, including geological, geomorphic, geotechnical, geomatics, and geophysical data have been used to develop and perform two-dimensional Limit Equilibrium and Finite Element Method stability analyses of a lignite open-pit mine in North Bohemia (Czech Republic) affected by recent landslides. A deterministic-probabilistic approach was adopted to investigate the effect of uncertainty of the input parameters on model response. The key factors affecting the system response were identified by specific Limit Equilibrium sensitivity analyses and studied in further detail by Finite Element probabilistic analyses and the results were compared. The work highlights that complementary use of both approaches can be recommended for routine checks of model response and interpretation of the associated results. Such an approach allows a reduction of system uncertainty and provides an improved understanding of the landslides under study. Importantly, two separate failure mechanisms have been identified from the analyses performed and verified through comparisons with inclinometer data and field observations. The results confirm that the water table level and material input parameters have the greatest influence on the stability of the slope.This work was supported by the Research Fund for Coal and Steel of the European Union [grant number 752504]
Fitness landscape of the cellular automata majority problem: View from the Olympus
In this paper we study cellular automata (CAs) that perform the computational
Majority task. This task is a good example of what the phenomenon of emergence
in complex systems is. We take an interest in the reasons that make this
particular fitness landscape a difficult one. The first goal is to study the
landscape as such, and thus it is ideally independent from the actual
heuristics used to search the space. However, a second goal is to understand
the features a good search technique for this particular problem space should
possess. We statistically quantify in various ways the degree of difficulty of
searching this landscape. Due to neutrality, investigations based on sampling
techniques on the whole landscape are difficult to conduct. So, we go exploring
the landscape from the top. Although it has been proved that no CA can perform
the task perfectly, several efficient CAs for this task have been found.
Exploiting similarities between these CAs and symmetries in the landscape, we
define the Olympus landscape which is regarded as the ''heavenly home'' of the
best local optima known (blok). Then we measure several properties of this
subspace. Although it is easier to find relevant CAs in this subspace than in
the overall landscape, there are structural reasons that prevent a searcher
from finding overfitted CAs in the Olympus. Finally, we study dynamics and
performance of genetic algorithms on the Olympus in order to confirm our
analysis and to find efficient CAs for the Majority problem with low
computational cost
A Dispersion Operator for Geometric Semantic Genetic Programming
Recent advances in geometric semantic genetic programming (GSGP) have shown that the results obtained by these methods can outperform those obtained by classical genetic programming algorithms, in particular in the context of symbolic regression. However, there are still many open issues on how to improve their search mechanism. One of these issues is how to get around the fact that the GSGP crossover operator cannot generate solutions that are placed outside the convex hull formed by the individuals of the current population. Although the mutation operator alleviates this problem, we cannot guarantee it will find promising regions of the search space within feasible computational time. In this direction, this paper proposes a new geometric dispersion operator that uses multiplicative factors to move individuals to less dense areas of the search space around the target solution before applying semantic genetic operators. Experiments in sixteen datasets show that the results obtained by the proposed operator are statistically significantly better than those produced by GSGP and that the operator does indeed spread the solutions around the target solution
Computational Intelligence for Life Sciences
Computational Intelligence (CI) is a computer science discipline encompassing the theory, design, development and application of biologically and linguistically derived computational paradigms. Traditionally, the main elements of CI are Evolutionary Computation, Swarm Intelligence, Fuzzy Logic, and Neural Networks. CI aims at proposing new algorithms able to solve complex computational problems by taking inspiration from natural phenomena. In an intriguing turn of events, these nature-inspired methods have been widely adopted to investigate a plethora of problems related to nature itself. In this paper we present a variety of CI methods applied to three problems in life sciences, highlighting their effectiveness: we describe how protein folding can be faced by exploiting Genetic Programming, the inference of haplotypes can be tackled using Genetic Algorithms, and the estimation of biochemical kinetic parameters can be performed by means of Swarm Intelligence. We show that CI methods can generate very high quality solutions, providing a sound methodology to solve complex optimization problems in life sciences
A Model for Analysing the Collective Dynamic Behaviour and Characterising the Exploitation of Population-Based Algorithms
Several previous studies have focused on modelling and analysing the collective dynamic behaviour of population-based algorithms. However, an empirical approach for identifying and characterising such a behaviour is surprisingly lacking. In this paper, we present a new model to capture this collective behaviour, and to extract and quantify features associated with it. The proposed model studies the topological distribution of an algorithm's activity from both a genotypic and a phenotypic perspective, and represents population dynamics using multiple levels of abstraction. The model can have different instantiations. Here it has been implemented using a modified version of self-organising maps. These are used to represent and track the population motion in the fitness landscape as the algorithm operates on solving a problem. Based on this model, we developed a set of features that characterise the population's collective dynamic behaviour. By analysing them and revealing their dependency on fitness distributions, we were then able to define an indicator of the exploitation behaviour of an algorithm. This is an entropy-based measure that assesses the dependency on fitness distributions of different features of population dynamics. To test the proposed measures, evolutionary algorithms with different crossover operators, selection pressure levels and population handling techniques have been examined, which lead populations to exhibit a wide range of exploitation-exploration behaviours. </jats:p
A comparison of machine learning techniques for survival prediction in breast cancer
<p>Abstract</p> <p>Background</p> <p>The ability to accurately classify cancer patients into risk classes, i.e. to predict the outcome of the pathology on an individual basis, is a key ingredient in making therapeutic decisions. In recent years gene expression data have been successfully used to complement the clinical and histological criteria traditionally used in such prediction. Many "gene expression signatures" have been developed, i.e. sets of genes whose expression values in a tumor can be used to predict the outcome of the pathology. Here we investigate the use of several machine learning techniques to classify breast cancer patients using one of such signatures, the well established <it>70-gene signature</it>.</p> <p>Results</p> <p>We show that Genetic Programming performs significantly better than Support Vector Machines, Multilayered Perceptrons and Random Forests in classifying patients from the NKI breast cancer dataset, and comparably to the scoring-based method originally proposed by the authors of the 70-gene signature. Furthermore, Genetic Programming is able to perform an automatic feature selection.</p> <p>Conclusions</p> <p>Since the performance of Genetic Programming is likely to be improvable compared to the out-of-the-box approach used here, and given the biological insight potentially provided by the Genetic Programming solutions, we conclude that Genetic Programming methods are worth further investigation as a tool for cancer patient classification based on gene expression data.</p
- …