A virtual, intelligent, or computational assistant (e.g., also referred to simply as an “assistant”) is described that is configured to perform text normalization when converting text to speech (e.g., when synthesizing audio data for output to a user). The assistant may perform text normalization by determining how pronounce a particular set of characters (e.g., word, homonyms, number, date, acronym, abbreviation, etc.) based on the context in-which the particular set of characters is used. For instance, when performing text to speech on the text “1233 St. Andrew St.” (e.g., when reading an address aloud), the assistant may determine that the first use of the set of characters “St.” should be pronounced as “saint” as it is a prefix of a street address and that the second use of the set of characters “St.” should be pronounced as “street” as it is a suffix of a street address