Quotation extraction is a widely useful task both from a sociological and
from a Natural Language Processing perspective. However, very little data is
available to study this task in languages other than English. In this paper, we
present a manually annotated corpus of 1676 newswire texts in French for
quotation extraction and source attribution. We first describe the composition
of our corpus and the choices that were made in selecting the data. We then
detail the annotation guidelines and annotation process, as well as a few
statistics about the final corpus and the obtained balance between quote types
(direct, indirect and mixed, which are particularly challenging). We end by
detailing our inter-annotator agreement between the 8 annotators who worked on
manual labelling, which is substantially high for such a difficult linguistic
phenomenon