33 research outputs found
Discovery of Maximally Frequent Tag Tree Patterns in Semistructured Data (New Developments of Theory of Computation and Algorithms)
Refutable Inference of Formal Graph Systems and NLC Graph Grammars (Models of Computation and Algorithms)
Polynomial Time Matching Algorithms for Tree Structured Patterns (Foundations of Computer Science)
A Hierarchy of Tree Edit Distance Measures (Theoretical Computer Science and its Applications)
Learning of Elementary Formal Systems with Two Clauses using Queries and Their Languages(New Trends in Theory of Computation and Algorithm)
Polynomial Time Inductive Inference of Ordered Term Trees with Contractible Variables from Positive Data (New Aspects of Theoretical Computer Science)
Polynomial Time Learnabilities of Tree Patterns with Internal Structured Variables from Queries (New Aspects of Theoretical Computer Science)
A Polynomial Time Matching Algorithm of Structured Ordered Tree Patterns for Data Mining from Semistructured Data
Polynomial time algorithms for finding unordered tree patterns with internal variables
Abstract. Many documents such as Web documents or XML files have tree structures. A term tree is an unordered tree pattern consisting of internal variables and tree structures. In order to extract meaningful and hidden knowledge from such tree structured documents, we consider a minimal language (MINL) problem for term trees. The MINL problem for term trees is to find a term tree t such that the language generated by t is minimal among languages, generated by term trees, which contain all given tree structured data. Firstly, we show that the MINL problem for regular term trees is computable in polynomial time if the number of edge labels is infinite. Next, we show that the MINL problems with optimizing the size of an output term tree are NP-complete. Finally, in order to show that our polynomial time algorithm for the MINL problem can be applied to data mining from real-world Web documents, we show that regular term tree languages are polynomial time inductively inferable from positive data if the number of edge labels is infinite.