37 research outputs found
A Pattern-mining Driven Study on Differences of Newspapers in Expressing Temporal Information
This paper studies the differences between different types of newspapers in
expressing temporal information, which is a topic that has not received much
attention. Techniques from the fields of temporal processing and pattern mining
are employed to investigate this topic. First, a corpus annotated with temporal
information is created by the author. Then, sequences of temporal information
tags mixed with part-of-speech tags are extracted from the corpus. The TKS
algorithm is used to mine skip-gram patterns from the sequences. With these
patterns, the signatures of the four newspapers are obtained. In order to make
the signatures uniquely characterize the newspapers, we revise the signatures
by removing reference patterns. Through examining the number of patterns in the
signatures and revised signatures, the proportion of patterns containing
temporal information tags and the specific patterns containing temporal
information tags, it is found that newspapers differ in ways of expressing
temporal information.Comment: 19 page
Cross-linguistic annotation of narrativity for English/French verb tense disambiguation
This paper presents manual and automatic annotation experiments for a pragmatic verb tense feature (narrativity) in English/French parallel corpora. The feature is considered to play an important role for translating English Simple Past tense into French, where three different tenses are available. Whether the French Passe ́ Compose ́, Passe ́ Simple or Imparfait should be used is highly dependent on a longer-range context, in which either narrative events ordered in time or mere non-narrative state of affairs in the past are described. This longer-range context is usually not available to current machine translation (MT) systems, that are trained on parallel corpora. Annotating narrativity prior to translation is therefore likely to help current MT systems. Our experiments show that narrativity can be reliably identified with kappa-values of up to 0.91 in manual annotation and with F1 scores of up to 0.72 in automatic annotation