Shaping the history of words

Abstract

In textual analysis, many corpora include texts in chronological order and in many cases this temporal connotation is crucial to an understanding of their inner structure. In a typical bag-of-words approach, data are organized in contingency tables, the rows reporting the frequency of each word over time-points (shown in columns). These discrete data (temporal patterns for frequen-cies) may be viewed as continuous objects represented by functional relation-ships. This study aimed at identifying a specific sequential pattern for each word as a functional object and at grouping these word patterns in clusters. A model-based clustering procedure is proposed, with specific reference to a cor-pus of end-of-year messages delivered by the ten Presidents of the Italian Republic covering the period from 1949 to 2011

    Similar works