Recognizing textual entailment (TE) is the task of deciding whether a sentence or a text implies another, e.g. the sentence ‘Ostriches put their heads into the sand to avoid the wind’ entails ‘Ostriches bury their heads in the sand’. While a trivial task for humans in many ordinary situations, the problem of recognizing TE has proven extremely difficult for machine learning algorithms. Participants in the third PASCAL RTE workshop reported an accuracy of at most 80% (Giampiccolo et al., 2007). Current approaches to the recognition of TE often use word alignment functions, exploiting syntactic and structural properties of the text. In order to find out whether semantic techniques are also valuable, we analyzed existing training sets from the PASCAL Challenges. Some semantic properties that are essential for defining the validity of an entailment were examined in detail and subsequently annotated. The first stage consisted of analyzing semantic properties that are essential for TE. In the second stage, we annotated the most common relevant properties. In the third stage we revised the scheme and reapplied it to the datasets of RTE1, RTE2 and RTE3. In order to simplify the process of annotation, we developed an xml annotation scheme and we built a tool for executing the actual annotation task. We found that the defined annotation scheme is widely applicable, as 64.4% of all valid entailments pairs that we reviewed, could be annotated. Further work includes the testing of machine learning algorithms on the annotated dataset
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.