382 research outputs found
Discovering Restricted Regular Expressions with Interleaving
Discovering a concise schema from given XML documents is an important problem
in XML applications. In this paper, we focus on the problem of learning an
unordered schema from a given set of XML examples, which is actually a problem
of learning a restricted regular expression with interleaving using positive
example strings. Schemas with interleaving could present meaningful knowledge
that cannot be disclosed by previous inference techniques. Moreover, inference
of the minimal schema with interleaving is challenging. The problem of finding
a minimal schema with interleaving is shown to be NP-hard. Therefore, we
develop an approximation algorithm and a heuristic solution to tackle the
problem using techniques different from known inference algorithms. We do
experiments on real-world data sets to demonstrate the effectiveness of our
approaches. Our heuristic algorithm is shown to produce results that are very
close to optimal.Comment: 12 page
Active learning of group-structured environments
The question investigated in this paper is to what extent an input representation influences the success of learning, in particular from the point of view of analyzing agents that can interact with their environment. We investigate learning environments that have a group structure. We introduce a learning model in different variants and study under which circumstances group structures can be learned efficiently from experimenting with group generators (actions). Negative results are presented, even without efficiency constraints, for rather general classes of groups showing that even with group structure, learning an environment from partial information is far from trivial. However, positive results for special subclasses of Abelian groups turn out to be a good starting point for the design of efficient learning algorithms based on structured representations
Searching for Leptoquarks in electron-photon Collisions
We study the production of composite scalar leptoquarks in
colliders, and we show that an machine operating in its mode
is the best way to look for these particles in collisions, due to the
hadronic content of the photon.Comment: 12 pages in REVTeX3. 6 figures appended as postcript files. Report:
IFT-P.014/93 and IFUSP-P 104
Hitting all Maximal Independent Sets of a Bipartite Graph
We prove that given a bipartite graph G with vertex set V and an integer k,
deciding whether there exists a subset of V of size k hitting all maximal
independent sets of G is complete for the class Sigma_2^P.Comment: v3: minor chang
Variable length-based genetic representation to automatically evolve wrappers
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-12433-4_44Proceedings 8th International Conference on Practical Applications of Agents and Multiagent SystemsThe Web has been the star service on the Internet, however the outsized information available and its decentralized nature has originated an intrinsic difficulty to locate, extract and compose information. An automatic approach is required to handle with this huge amount of data. In this paper we present a machine learning algorithm based on Genetic Algorithms which generates a set of complex wrappers, able to extract information from theWeb. The paper presents the experimental evaluation of these wrappers over a set of basic data sets.This work has been partially supported by the Spanish Ministry of Science
and Innovation under the projects Castilla-La Mancha project PEII09-0266-6640, COMPUBIODIVE
(TIN2007-65989), and by V-LeaF (TIN2008-02729-E/TIN)
Mining State-Based Models from Proof Corpora
Interactive theorem provers have been used extensively to reason about
various software/hardware systems and mathematical theorems. The key challenge
when using an interactive prover is finding a suitable sequence of proof steps
that will lead to a successful proof requires a significant amount of human
intervention. This paper presents an automated technique that takes as input
examples of successful proofs and infers an Extended Finite State Machine as
output. This can in turn be used to generate proofs of new conjectures. Our
preliminary experiments show that the inferred models are generally accurate
(contain few false-positive sequences) and that representing existing proofs in
such a way can be very useful when guiding new ones.Comment: To Appear at Conferences on Intelligent Computer Mathematics 201
A case study on grammatical-based representation for regular expression evolution
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-12433-4_45Proceedings of 8th International Conference on Practical Applications of Agents and Multiagent SystemsRegular expressions, or simply regex, have been widely used as a powerful pattern matching and text extractor tool through decades. Although they provide a powerful and flexible notation to define and retrieve patterns from text, the syntax and the grammatical rules of these regex notations are not easy to use, and even to understand. Any regex can be represented as a Deterministic or Non-Deterministic Finite Automata; so it is possible to design a representation to automatically build a regex, and a optimization algorithm able to find the best regex in terms of complexity. This paper introduces both, a graph-based representation for regex, and a particular heuristic-based evolutionary computing algorithm based on grammatical features from this language in a particular data extraction problem.This work has been partially supported by the Spanish Ministry of Science and Innovation
under the projects Castilla-La Mancha project PEII09-0266-6640, COMPUBIODIVE
(TIN2007-65989), and by HADA (TIN2007-64718)
Learning Rational Functions
International audienceRational functions are transformations from words to words that can be defined by string transducers. Rational functions are also captured by deterministic string transducers with lookahead. We show for the first time that the class of rational functions can be learned in the limit with polynomial time and data, when represented by string transducers with lookahead in the diagonal-minimal normal form that we introduce
- …