33 research outputs found
Polynomial time algorithms for finding unordered tree patterns with internal variables
Abstract. Many documents such as Web documents or XML files have tree structures. A term tree is an unordered tree pattern consisting of internal variables and tree structures. In order to extract meaningful and hidden knowledge from such tree structured documents, we consider a minimal language (MINL) problem for term trees. The MINL problem for term trees is to find a term tree t such that the language generated by t is minimal among languages, generated by term trees, which contain all given tree structured data. Firstly, we show that the MINL problem for regular term trees is computable in polynomial time if the number of edge labels is infinite. Next, we show that the MINL problems with optimizing the size of an output term tree are NP-complete. Finally, in order to show that our polynomial time algorithm for the MINL problem can be applied to data mining from real-world Web documents, we show that regular term tree languages are polynomial time inductively inferable from positive data if the number of edge labels is infinite.