7 research outputs found

    A case study on grammatical-based representation for regular expression evolution

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-12433-4_45Proceedings of 8th International Conference on Practical Applications of Agents and Multiagent SystemsRegular expressions, or simply regex, have been widely used as a powerful pattern matching and text extractor tool through decades. Although they provide a powerful and flexible notation to define and retrieve patterns from text, the syntax and the grammatical rules of these regex notations are not easy to use, and even to understand. Any regex can be represented as a Deterministic or Non-Deterministic Finite Automata; so it is possible to design a representation to automatically build a regex, and a optimization algorithm able to find the best regex in terms of complexity. This paper introduces both, a graph-based representation for regex, and a particular heuristic-based evolutionary computing algorithm based on grammatical features from this language in a particular data extraction problem.This work has been partially supported by the Spanish Ministry of Science and Innovation under the projects Castilla-La Mancha project PEII09-0266-6640, COMPUBIODIVE (TIN2007-65989), and by HADA (TIN2007-64718)

    Variable length-based genetic representation to automatically evolve wrappers

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-12433-4_44Proceedings 8th International Conference on Practical Applications of Agents and Multiagent SystemsThe Web has been the star service on the Internet, however the outsized information available and its decentralized nature has originated an intrinsic difficulty to locate, extract and compose information. An automatic approach is required to handle with this huge amount of data. In this paper we present a machine learning algorithm based on Genetic Algorithms which generates a set of complex wrappers, able to extract information from theWeb. The paper presents the experimental evaluation of these wrappers over a set of basic data sets.This work has been partially supported by the Spanish Ministry of Science and Innovation under the projects Castilla-La Mancha project PEII09-0266-6640, COMPUBIODIVE (TIN2007-65989), and by V-LeaF (TIN2008-02729-E/TIN)

    A Polynomial Time Match Test for Large Classes of Extended Regular Expressions

    No full text
    polynomial time match test for large classes of extended regular expressions This item was submitted to Loughborough University's Institutional Repository by the/an author

    Inside the Class of REGEX Languages

    No full text
    We study different possibilities of combining the concept of homomorphic replacement with regular expressions in order to investigate the class of languages given by extended regular expressions with backreferences (REGEX). It is shown in which regard existing and natural ways to do this fail to reach the expressive power of REGEX. Furthermore, the complexity of the membership problem for REGEX with a bounded number of backreferences is considered
    corecore