research

Generating all Possible Palindromes from Ngram Corpora

Abstract

International audienceWe address the problem of generating all possible palindromes from a corpus of Ngrams. Palin-dromes are texts that read the same both ways. Short palindromes (" race car ") usually carry precise , significant meanings. Long palindromes are often less meaningful, but even harder to generate. The palindrome generation problem has never been addressed, to our knowledge, from a strictly combinatorial point of view. The main difficulty is that generating palindromes require the simultaneous consideration of two interrelated levels in a sequence: the " character " and the " word " levels. Although the problem seems very combina-torial, we propose an elegant yet non-trivial graph structure that can be used to generate all possible palindromes from a given corpus of Ngrams, with a linear complexity. We illustrate our approach with short and long palindromes obtained from the Google Ngram corpus. We show how we can control the semantics, to some extent, by using arbitrary text corpora to bias the probabilities of certain sets of words. More generally this work addresses the issue of modelling human virtuosity from a combinatorial viewpoint, as a means to understand human creativity

    Similar works