40 research outputs found

    Explorations in Parallel Linear Genetic Programming

    No full text
    Linear Genetic Programming (LGP) is a powerful problem-solving technique, but one with several significant weaknesses. LGP programs consist of a linear sequence of instructions, where each instruction may reuse previously computed results. This structure makes LGP programs compact and powerful, however it also introduces the problem of instruction dependencies. The notion of instruction dependencies expresses the concept that certain instructions rely on other instructions. Instruction dependencies are often disrupted during crossover or mutation when one or more instructions undergo modification. This disruption can cause disproportionately large changes in program output resulting in non-viable offspring and poor algorithm performance. Motivated by biological inspiration and the issue of code disruption, we develop a new form of LGP called Parallel LGP (PLGP). PLGP programs consist of n lists of instructions. These lists are executed in parallel, and the resulting vectors are summed to produce the overall program output. PLGP limits the disruptive effects of crossover and mutation, which allows PLGP to significantly outperform regular LGP. We examine the PLGP architecture and determine that large PLGP programs can be slow to converge. To improve the convergence time of large PLGP programs we develop a new form of PLGP called Cooperative Coevolution PLGP (CC PLGP). CC PLGP adapts the concept of cooperative coevolution to the PLGP architecture. CC PLGP optimizes all program components in parallel, allowing CC PLGP to converge significantly faster than conventional PLGP. We examine the CC PLGP architecture and determine that performanc

    The 11th Conference of PhD Students in Computer Science

    Get PDF

    New strategies for efficient and practical genetic programming.

    Get PDF
    2006/2007In the last decades, engineers and decision makers expressed a growing interest in the development of effective modeling and simulation methods to understand or predict the behavior of many phenomena in science and engineering. Many of these phenomena are translated in mathematical models for convenience and to carry out an easy interpretation. Methods commonly employed for this purpose include, for example, Neural Networks, Simulated Annealing, Genetic Algorithms, Tabu search, and so on. These methods all seek for the optimal or near optimal values of a predefined set of parameters of a model built a priori. But in this case, a suitable model should be known beforehand. When the form of this model cannot be found, the problem can be seen from another level where the goal is to find a program or a mathematical representation which can solve the problem. According to this idea the modeling step is performed automatically thanks to a quality criterion which drives the building process. In this thesis, we focus on the Genetic Programming (GP) approach as an automatic method for creating computer programs by means of artificial evolution based upon the original contributions of Darwin and Mendel. While GP has proven to be a powerful means for coping with problems in which finding a solution and its representation is difficult, its practical applicability is still severely limited by several factors. First, the GP approach is inherently a stochastic process. It means there is no guarantee to obtain a satisfactory solution at the end of the evolutionary loop. Second, the performances on a given problem may be strongly dependent on a broad range of parameters, including the number of variables involved, the quantity of data for each variable, the size and composition of the initial population, the number of generations and so on. On the contrary, when one uses Genetic Programming to solve a problem, he has two expectancies: on the one hand, maximize the probability to obtain an acceptable solution, and on the other hand, minimize the amount of computational resources to get this solution. Initially we present innovative and challenging applications related to several fields in science (computer science and mechanical science) which participate greatly in the experience gained in the GP field. Then we propose new strategies for improving the performances of the GP approach in terms of efficiency and accuracy. We probe our approach on a large set of benchmark problems in three different domains. Furthermore we introduce a new approach based on GP dedicated to symbolic regression of multivariate data-sets where the underlying phenomenon is best characterized by a discontinuous function. These contributions aim to provide a better understanding of the key features and the underlying relationships which make enhancements successful in improving the original algorithm.Negli ultimi anni, ingegneri e progettisti hanno espresso un interesse crescente nello sviluppo di nuovi metodi di simulazione e di modellazione per comprendere e predire il comportamento di diversi fenomeni sia in ambito scientifico che ingegneristico. Molti di questi fenomeni vengono descritti attraverso modelli matematici che ne facilitano l'interpretazione. A questo fine, i metodi più comunemente impiegati sono, le tecniche basate sui Reti Neurali, Simulated Annealing, gli Algoritmi Genetici, la ricerca Tabu, ecc. Questi metodi vanno a determinare i valori ottimali o quasi ottimali dei parametri di un modello costruito a priori. E evidente che in tal caso, si dovrebbe conoscere in anticipo un modello idoneo. Quando ciò non è possibile, il problema deve essere considerato da un altro punto di vista: l'obiettivo è trovare un programma o una rappresentazione matematica che possano risolvere il problema. A questo scopo, la fase di modellazione è svolta automaticamente in funzione di un criterio qualitativo che guida il processo di ricerca. Il tema di ricerca di questa tesi è la programmazione genetica (“Genetic Programming” che chiameremo GP) e le sue applicazioni. La programmazione genetica si può definire come un metodo automatico per la generazione di programmi attraverso una simulazione artificiale dei principi relativi all'evoluzione naturale basata sui contributi originali di Darwin e di Mendel. La programmazione genetica ha dimostrato di essere un potente mezzo per affrontare quei problemi in cui trovare una soluzione e la sua rappresentazione è difficile. Però la sua applicabilità rimane severamente limitata da diversi fattori. In primo luogo, il metodo GP è inerentemente un processo stocastico. Ciò significa che non garantisce che una soluzione soddisfacente sarà trovata alla fine del ciclo evolutivo. In secondo luogo, le prestazioni su un dato problema dipendono fortemente da una vasta gamma di parametri, compresi il numero di variabili impiegate, la quantità di dati per ogni variabile, la dimensione e la composizione della popolazione iniziale, il numero di generazioni e così via. Al contrario, un utente della programmazione genetica ha due aspettative: da una parte, massimizzare la probabilità di ottenere una soluzione accettabile, e dall'altra, minimizzare la quantità di risorse di calcolo per ottenerla. Nella fase iniziale di questo lavoro sono state considerate delle applicazioni particolarmente innovative relative a diversi campi della scienza (informatica e meccanica) che hanno contributo notevolmente all'esperienza acquisita nel campo della programmazione genetica. In questa tesi si propone un nuovo procedimento con lo scopo di migliorare le prestazioni della programmazione genetica in termini di efficienza ed accuratezza. Abbiamo testato il nostro approccio su un ampio insieme di benchmarks in tre domini applicativi diversi. Si propone inoltre una tecnica basata sul GP per la regressione simbolica di data-set multivariati dove il fenomeno di fondo è caratterizzato da una funzione discontinua. Questi contributi cercano di fornire una comprensione migliore degli elementi chiave e dei meccanismi interni che hanno consentito il miglioramento dell'algoritmo originale.XX Ciclo198

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    Seventh Biennial Report : June 2003 - March 2005

    No full text

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    Automatically evolving rule induction algorithms with grammar-based genetic programming

    Get PDF
    In the last 30 years, research in the field of rule induction algorithms produced a large number of algorithms. However, these algorithms are usually obtained from the combination of a basic rule induction algorithm (typically following the sequential covering approach) with new evaluation functions, pruning methods and stopping criteria for refining or producing rules, generating many "new" and more sophisticated sequential covering algorithms. We cannot deny that these attempts to improve the basic sequential covering approach have succeeded. Hence, if manually changing these major components of rule induction algorithms can result in new, significantly better ones, why not to automate this process to make it more cost-effective? This is the core idea of this work: to automate the process of designing rule induction algorithms by means of grammar-based genetic programming. Grammar-based Genetic Programming (GGP) is a special type of evolutionary algorithm used to automatically evolve computer programs. The most interesting feature of this type of algorithm is that it incorporates a grammar into its search mechanism, which expresses prior knowledge about the problem being solved. Since we have a lot of previous knowledge about how humans design rule induction algorithms, this type of algorithm is intuitively a suitable tool to automatically evolve rule induction algorithms. The grammar given to the proposed GGP system includes knowledge about how humans- design rule induction algorithms, and also presents some new elements which could work in rule induction algorithms, but to the best of our knowledge were not previously tested. The GG P system aims to evolve rule induction algorithms under two different frameworks, as follows. In the first framework, the GGP is used to evolve robust rule induction algorithms, i.e., algorithms which were designed to be applied to virtually any classification data set, like a manually-designed rule induction algorithm. In the second framework, the GGP is applied to evolve rule induction algorithms tailored to a specific application XVI domain, i.e., rule induction algorithms tailored to a single data set. Note that the latter framework is hardly feasible on a hard scale in the case of conventional, manually-designed algorithms, since the number of classification data sets greatly outnumbers the number of rule induction algorithms designers. However, it is clearly feasible on a large scale when using the proposed system, which automates the process of rule induction algorithm design and implementation. Overall, extensive computational experiments with 20 VCI data sets and 5 bioinformatics data sets showed that effective rule induction algorithms can be automatically generated using the GGP in both frameworks. Moreover, the automatically evolved rule induction algorithms were shown to be competitive with (and overall slightly better than) four well-known manually designed rule induction algorithms when comparing their predictive accuracies. The proposed GGP system was also compared to a grammar-based hillclimbing system, and experimental results showed that the GGP system is a more effective method to evolve rule induction algorithms than the grammar-based hillclimbing method. At last, a multi-objective version of the GGP (based on the concept of Pareto dominance) was also proposed, and experiments were performed to evolve robust rule induction algorithms which generate both accurate and simple models. The results showed that in most of the cases the GGP system can produce rule induction algorithms which are competitive in predictive accuracy to wellknown human-designed rule induction algorithms, but generate simpler classification modes - i.e., smaller rule sets, intuitively easier to be interpreted by the user

    Mining a Small Medical Data Set by Integrating the Decision Tree and t-test

    Get PDF
    [[abstract]]Although several researchers have used statistical methods to prove that aspiration followed by the injection of 95% ethanol left in situ (retention) is an effective treatment for ovarian endometriomas, very few discuss the different conditions that could generate different recovery rates for the patients. Therefore, this study adopts the statistical method and decision tree techniques together to analyze the postoperative status of ovarian endometriosis patients under different conditions. Since our collected data set is small, containing only 212 records, we use all of these data as the training data. Therefore, instead of using a resultant tree to generate rules directly, we use the value of each node as a cut point to generate all possible rules from the tree first. Then, using t-test, we verify the rules to discover some useful description rules after all possible rules from the tree have been generated. Experimental results show that our approach can find some new interesting knowledge about recurrent ovarian endometriomas under different conditions.[[journaltype]]國外[[incitationindex]]EI[[booktype]]紙本[[countrycodes]]FI

    Automated Compilation Framework for Scratchpad-based Real-Time Systems

    Get PDF
    ScratchPad Memory (SPM) is highly adopted in real-time systems as it exhibits a predictable behaviour. SPM is software-managed by explicitly inserting instructions to move code and data transfers between the SPM and the main memory. However, it is a tedious job to decide how to manage the SPM and to manually modify the code to insert memory transfers. Hence, an automated compilation tool is essential to efficiently utilize the SPM. Another key problem with SPM is the latency suffered by the system due to memory transfers. Hiding this latency is important for high-performance systems. In this thesis, we address the problems of managing SPM and reducing the impact of memory latency. To realize the automation of our work, we develop a compilation framework based on the LLVM compiler to analyze and transform the program code. We exploit our framework to improve the performance of the execution of single and multi-tasks in real-time systems. For the single task execution, Worst-Case Execution Time (WCET) is of great importance to assure correct and safe behaviour of the system. So, we propose a WCET-driven allocation technique for data SPM that employs software prefetching to efficiently manage the SPM and to overlap the memory transfer and the task execution in a predictable way. On the other hand, multi-tasking requires the system to be schedulable such that all the tasks can meet their timing requirements. However, executing multiple tasks on a multi-processor platform suffers from the contention of the accesses to the shared main memory. To avoid the contention, several scheduling techniques adopted the 3-phase execution model which executes the task as a sequence of memory and computation phases. This provides the means to avoid the contention as well as to hide the memory latency by using a Direct Memory Access (DMA) engine. Executing memory transfers using the DMA allows overlapping the memory transfers with the computations on the processor. Using the 3-phase model in systems with limited sizes of local SPM may necessitate a segmentation of the task. Automating the segmentation process is necessary especially for systems with large task sets. Hence, we propose a set of efficient segmentation algorithms that follow the 3-phase execution model. The application of these algorithms shows a significant improvement in the system schedulability. For our segmentation algorithms to be more applicable, we extend the 3-phase model to allow programs with multiple paths represented as conditional Directed Acyclic Graphs (DAGs), unlike the previous works that targeted sequential programs. We also introduce a multi-steaming model to exploit the benefits of prefetching by overlapping the memory and computation phases of the same task, which was not allowed in the previous approaches. By combining the automated compilation with the proposed algorithms, we are able to achieve our goal to efficiently manage data SPM in real-time systems

    Simple identification tools in FishBase

    Get PDF
    Simple identification tools for fish species were included in the FishBase information system from its inception. Early tools made use of the relational model and characters like fin ray meristics. Soon pictures and drawings were added as a further help, similar to a field guide. Later came the computerization of existing dichotomous keys, again in combination with pictures and other information, and the ability to restrict possible species by country, area, or taxonomic group. Today, www.FishBase.org offers four different ways to identify species. This paper describes these tools with their advantages and disadvantages, and suggests various options for further development. It explores the possibility of a holistic and integrated computeraided strategy
    corecore