16,277 research outputs found

    Genetic Programming Techniques in Engineering Applications

    Get PDF
    2012/2013Machine learning is a suite of techniques that allow developing algorithms for performing tasks by generalizing from examples. Machine learning systems, thus, may automatically synthesize programs from data. This approach is often feasible and cost-effective where manual programming or manual algorithm design is not. In the last decade techniques based on machine learning have spread in a broad range of application domains. In this thesis, we will present several novel applications of a specific machine Learning technique, called Genetic Programming, to a wide set of engineering applications grounded in real world problems. The problems treated in this work range from the automatic synthesis of regular expressions, to the generation of electricity price forecast, to the synthesis of a model for the tracheal pressure in mechanical ventilation. The results demonstrate that Genetic Programming is indeed a suitable tool for solving complex problems of practical interest. Furthermore, several results constitute a significant improvement over the existing state-of-the-art. The main contribution of this thesis is the design and implementation of a framework for the automatic inference of regular expressions from examples based on Genetic Programming. First, we will show the ability of such a framework to cope with the generation of regular expressions for solving text-extraction tasks from examples. We will experimentally assess our proposal comparing our results with previous proposals on a collection of real-world datasets. The results demonstrate a clear superiority of our approach. We have implemented the approach in a web application that has gained considerable interest and has reached peaks of more 10000 daily accesses. Then, we will apply the framework to a popular "regex golf" challenge, a competition for human players that are required to generate the shortest regular expression solving a given set of problems. Our results rank in the top 10 list of human players worldwide and outperform those generated by the only existing algorithm specialized to this purpose. Hence, we will perform an extensive experimental evaluation in order to compare our proposal to the state-of-the-art proposal in a very close and long-established research field: the generation of a Deterministic Finite Automata (DFA) from a labelled set of examples. Our results demonstrate that the existing state-of-the-art in DFA learning is not suitable for text extraction tasks. We will also show a variant of our framework designed for solving text processing tasks of the search-and-replace form. A common way to automate search-and-replace is to describe the region to be modified and the desired changes through a regular expression and a replacement expression. We will propose a solution to automatically produce both those expressions based only on examples provided by user. We will experimentally assess our proposal on real-word search-and-replace tasks. The results indicate that our proposal is indeed feasible. Finally, we will study the applicability of our framework to the generation of schema based on a sample of the eXtensible Markup Language documents. The eXtensible Markup Language documents are largely used in machine-to-machine interactions and such interactions often require that some constraints are applied to the contents of the documents. These constraints are usually specified in a separate document which is often unavailable or missing. In order to generate a missing schema, we will apply and will evaluate experimentally our framework to solve this problem. In the final part of this thesis we will describe two significant applications from different domains. We will describe a forecasting system for producing estimates of the next day electricity price. The system is based on a combination of a predictor based on Genetic Programming and a classifier based on Neural Networks. Key feature of this system is the ability of handling outliers-i.e., values rarely seen during the learning phase. We will compare our results with a challenging baseline representative of the state-of-the-art. We will show that our proposal exhibits smaller prediction error than the baseline. Finally, we will move to a biomedical problem: estimating tracheal pressure in a patient treated with high-frequency percussive ventilation. High-frequency percussive ventilation is a new and promising non-conventional mechanical ventilatory strategy. In order to avoid barotrauma and volutrauma in patience, the pressure of air insufflated must be monitored carefully. Since measuring the tracheal pressure is difficult, a model for accurately estimating the tracheal pressure is required. We will propose a synthesis of such model by means of Genetic Programming and we will compare our results with the state-of-the-art.XXVI Ciclo198

    An automatic generation of textual pattern rules for digital content filters proposal, using grammatical evolution genetic programming

    Get PDF
    AbstractThis work presents a conceptual proposal to address the problem of intensive human specialized resources that are nowadays required for the maintenance and optimized operation of digital contents filtering in general and anti-spam filtering in particular. The huge amount of spam, malware, virus, and other illegitimate digital contents distributed through network services, represents a considerable waste of physical and technical resources, experts and end users time, in continuous maintenance of anti-spam filters and deletion of spam messages, respectively. The problem of cumbersome and continuous maintenance required to keep anti-spam filtering systems updated and running in an efficient way, is addressed in this work by the means of genetic programming grammatical evolution techniques, for automatic rules generation, having SpamAssassin anti-spam system and SpamAssassin public corpus as the references for the automatic filtering customization

    A teachable semi-automatic web information extraction system based on evolved regular expression patterns

    Get PDF
    This thesis explores Web Information Extraction (WIE) and how it has been used in decision making and to support businesses in their daily operations. The research focuses on a WIE system based on Genetic Programming (GP) with an extensible model to enhance the automatic extractor. This uses a human as a teacher to identify and extract relevant information from the semi-structured HTML webpages. Regular expressions, which have been chosen as the pattern matching tool, are automatically generated based on the training data to provide an improved grammar and lexicon. This particularly benefits the GP system which may need to extend its lexicon in the presence of new tokens in the web pages. These tokens allow the GP method to produce new extraction patterns for new requirements

    Automatic Search-and-Replace From Examples With Coevolutionary Genetic Programming

    Get PDF
    We describe the design and implementation of a system for executing search-and-replace text processing tasks automatically, based only on examples of the desired behavior. The examples consist of pairs describing the original string and the desired modified string. Their construction, thus, does not require any specific technical skill. The system constructs a solution to the specified task that can be used unchanged on popular existing software for text processing. The solution consists of a search pattern coupled with a replacement expression: the former is a regular expression which describes both the strings to be replaced and their portions to be reused in the latter, which describes how to build the modified strings. Our proposed system is internally based on genetic programming and implements a form of cooperative coevolution in which two separate populations are evolved independently, one for search patterns and the other for replacement expressions. We assess our proposal on six tasks of realistic complexity obtaining very good results, both in terms of absolute quality of the solutions and with respect to the challenging baselines considered
    • …
    corecore