4 research outputs found

    The laws of "LOL": Computational approaches to sociolinguistic variation in online discussions

    Get PDF
    When speaking or writing, a person often chooses one form of language over another based on social constraints, including expectations in a conversation, participation in a global change, or expression of underlying attitudes. Sociolinguistic variation (e.g. choosing "going" versus "goin'") can reveal consistent social differences such as dialects and consistent social motivations such as audience design. While traditional sociolinguistics studies variation in spoken communication, computational sociolinguistics investigates written communication on social media. The structured nature of online discussions and the diversity of language patterns allow computational sociolinguists to test highly specific hypotheses about communication, such different configurations of listener "audience." Studying communication choices in online discussions sheds light on long-standing sociolinguistic questions that are hard to tackle, and helps social media platforms anticipate their members' complicated patterns of participation in conversations. To that end, this thesis explores open questions in sociolinguistic research by quantifying language variation patterns in online discussions. I leverage the "birds-eye" view of social media to focus on three major questions in sociolinguistics research relating to authors' participation in online discussions. First, I test the role of conversation expectations in the context of content bans and crisis events, and I show that authors vary their language to adjust to audience expectations in line with community standards and shared knowledge. Next, I investigate language change in online discussions and show that language structure, more than social context, explains word adoption. Lastly, I investigate the expression of social attitudes among multilingual speakers, and I find that such attitudes can explain language choice when the attitudes have a clear social meaning based on the discussion context. This thesis demonstrates the rich opportunities that social media provides for addressing sociolinguistic questions and provides insight into how people adapt to the communication affordances in online platforms.Ph.D

    Geolocation with subsampled microblog social media

    Get PDF
    The article of record as published may be found at http://dx.doi.org/10.1145/2733373.2806357.We propose a data-driven geolocation method on microblog text. Key idea underlying our approach is sparse coding, an unsupervised learning algorithm. Unlike conventional positioning algorithms, we geolocate a user by identifying features extracted from her social media text. We also present an enhancement robust to erasure of words in the text and report our experimental results with uniformly or randomly subsampled microblog text. Our solution features a novel two-step procedure consisting of upconversion and iterative refinement by joint sparse coding. As a result, we can reduce the amount of input data required by geolocation while preserving good prediction accuracy. In the light of information preservation and privacy, we remark potential applications of these results.Funded by Naval Postgraduate School (Naval Supply Systems Command award)National Science Foundation Graduate Research FellowshipGrant No. DGE1144152 (NSF)Agreement No. DGE1144152 (NPS
    corecore