6,014 research outputs found
Conditional Random Fields for XML Applications
XML tree labeling is the problem of classifying elements in XML documents. It is a fundamental task for applications like XML transformation, schema matching, and information extraction. In this paper we propose XCRFs, conditional random fields for XML tree labeling. Dealing with trees often raises complexity problems. We describe optimization methods by means of constraints and combination techniques that allow XCRFs to be used in real tasks and in interactive machine learning programs. We show that domain knowledge in XML applications easily transfers in XCRFs thanks to constraints and combination of XCRFs. We describe an approach based on XCRF to learn tree transformations. The approach allows to solve xml data integration tasks and restructuration tasks. We have developed an open source toolbox for XCRFs. We use it to propose a Web service for the generation of personalized RSS feeds from HTML pages
The use of data-mining for the automatic formation of tactics
This paper discusses the usse of data-mining for the automatic formation of tactics. It was presented at the Workshop on Computer-Supported Mathematical Theory Development held at IJCAR in 2004. The aim of this project is to evaluate the applicability of data-mining techniques to the automatic formation of tactics from large corpuses of proofs. We data-mine information from large proof corpuses to find commonly occurring patterns. These patterns are then evolved into tactics using genetic programming techniques
Temporal expression normalisation in natural language texts
Automatic annotation of temporal expressions is a research challenge of great
interest in the field of information extraction. In this report, I describe a
novel rule-based architecture, built on top of a pre-existing system, which is
able to normalise temporal expressions detected in English texts. Gold standard
temporally-annotated resources are limited in size and this makes research
difficult. The proposed system outperforms the state-of-the-art systems with
respect to TempEval-2 Shared Task (value attribute) and achieves substantially
better results with respect to the pre-existing system on top of which it has
been developed. I will also introduce a new free corpus consisting of 2822
unique annotated temporal expressions. Both the corpus and the system are
freely available on-line.Comment: 7 pages, 1 figure, 5 table
- âŠ