We propose to use image captions from the Web as a previously underutilized
resource for paraphrases (i.e., texts with the same "message") and to create
and analyze a corresponding dataset. When an image is reused on the Web, an
original caption is often assigned. We hypothesize that different captions for
the same image naturally form a set of mutual paraphrases. To demonstrate the
suitability of this idea, we analyze captions in the English Wikipedia, where
editors frequently relabel the same image for different articles. The paper
introduces the underlying mining technology, the resulting Wikipedia-IPC
dataset, and compares known paraphrase corpora with respect to their syntactic
and semantic paraphrase similarity to our new resource. In this context, we
introduce characteristic maps along the two similarity dimensions to identify
the style of paraphrases coming from different sources. An annotation study
demonstrates the high reliability of the algorithmically determined
characteristic maps