4 research outputs found
Statistically Significant Detection of Linguistic Change
We propose a new computational approach for tracking and detecting
statistically significant linguistic shifts in the meaning and usage of words.
Such linguistic shifts are especially prevalent on the Internet, where the
rapid exchange of ideas can quickly change a word's meaning. Our meta-analysis
approach constructs property time series of word usage, and then uses
statistically sound change point detection algorithms to identify significant
linguistic shifts.
We consider and analyze three approaches of increasing complexity to generate
such linguistic property time series, the culmination of which uses
distributional characteristics inferred from word co-occurrences. Using
recently proposed deep neural language models, we first train vector
representations of words for each time period. Second, we warp the vector
spaces into one unified coordinate system. Finally, we construct a
distance-based distributional time series for each word to track it's
linguistic displacement over time.
We demonstrate that our approach is scalable by tracking linguistic change
across years of micro-blogging using Twitter, a decade of product reviews using
a corpus of movie reviews from Amazon, and a century of written books using the
Google Book-ngrams. Our analysis reveals interesting patterns of language usage
change commensurate with each medium.Comment: 11 pages, 7 figures, 4 table