We consider the communication of natural language text from a source to a
destination over noiseless and character-erasure channels. We exploit
language's inherent correlations and predictability to constrain transmission
costs by allowing the destination to predict or complete words with potential
dissimilarity with the source text. Concretely, our objective is to obtain
achievable (cˉ,sˉ) pairs, where cˉ is the average
transmission cost at the source and sˉ is the average semantic
similarity measured via cosine similarity between vector embedding of words at
the source and those predicted/completed at the destination. We obtain
(cˉ,sˉ) pairs for neural language and first-order Markov
chain-based small language models (SLM) for prediction, using both a threshold
policy that transmits a word if its cosine similarity with that
predicted/completed at the destination is below a threshold, and a periodic
policy, which transmits words after a specific interval and predicts/completes
the words in between, at the destination. We adopt an SLM for word completion.
We demonstrate that, when communication occurs over a noiseless channel, the
threshold policy achieves a higher sˉ for a given cˉ than the
periodic policy and that the sˉ achieved with the neural SLM is greater
than or equal to that of the Markov chain-based algorithm for the same
cˉ. The improved performance comes with a higher complexity in terms of
time and computing requirements. However, when communication occurs over a
character-erasure channel, all prediction algorithms and scheduling policies
perform poorly. Furthermore, if character-level Huffman coding is used, the
required cˉ to achieve a given sˉ is reduced, but the above
observations still apply