50,509 research outputs found

    #Bieber + #Blast = #BieberBlast: Early Prediction of Popular Hashtag Compounds

    Full text link
    Compounding of natural language units is a very common phenomena. In this paper, we show, for the first time, that Twitter hashtags which, could be considered as correlates of such linguistic units, undergo compounding. We identify reasons for this compounding and propose a prediction model that can identify with 77.07% accuracy if a pair of hashtags compounding in the near future (i.e., 2 months after compounding) shall become popular. At longer times T = 6, 10 months the accuracies are 77.52% and 79.13% respectively. This technique has strong implications to trending hashtag recommendation since newly formed hashtag compounds can be recommended early, even before the compounding has taken place. Further, humans can predict compounds with an overall accuracy of only 48.7% (treated as baseline). Notably, while humans can discriminate the relatively easier cases, the automatic framework is successful in classifying the relatively harder cases.Comment: 14 pages, 4 figures, 9 tables, published in CSCW (Computer-Supported Cooperative Work and Social Computing) 2016. in Proceedings of 19th ACM conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2016

    Investigating five key predictive text entry with combined distance and keystroke modelling

    Get PDF
    This paper investigates text entry on mobile devices using only five-keys. Primarily to support text entry on smaller devices than mobile phones, this method can also be used to maximise screen space on mobile phones. Reported combined Fitt's law and keystroke modelling predicts similar performance with bigram prediction using a five-key keypad as is currently achieved on standard mobile phones using unigram prediction. User studies reported here show similar user performance on five-key pads as found elsewhere for novice nine-key pad users

    Computational Sociolinguistics: A Survey

    Get PDF
    Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of "Computational Sociolinguistics" that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication: 18th February, 201

    Inference of the Russian drug community from one of the largest social networks in the Russian Federation

    Full text link
    The criminal nature of narcotics complicates the direct assessment of a drug community, while having a good understanding of the type of people drawn or currently using drugs is vital for finding effective intervening strategies. Especially for the Russian Federation this is of immediate concern given the dramatic increase it has seen in drug abuse since the fall of the Soviet Union in the early nineties. Using unique data from the Russian social network 'LiveJournal' with over 39 million registered users worldwide, we were able for the first time to identify the on-line drug community by context sensitive text mining of the users' blogs using a dictionary of known drug-related official and 'slang' terminology. By comparing the interests of the users that most actively spread information on narcotics over the network with the interests of the individuals outside the on-line drug community, we found that the 'average' drug user in the Russian Federation is generally mostly interested in topics such as Russian rock, non-traditional medicine, UFOs, Buddhism, yoga and the occult. We identify three distinct scale-free sub-networks of users which can be uniquely classified as being either 'infectious', 'susceptible' or 'immune'.Comment: 12 pages, 11 figure

    Purported use and self-awareness of cognitive and metacognitive foreign language reading strategies in tertiary education in Mozambique

    Get PDF
    This paper explores the results of a Survey of Reading Strategies (SORS)-based questionnaire administered to 28 university student participants. The study is carried out in a post-colonial multilingual context, Mozambique. The main aims of the paper are to assess the degree of purported use and awareness of participants own use of reading comprehension skills and strategies in a foreign language (English). The participants were tested for their reading text comprehension using an IELTS comprehension test (Cabinda, 2013). The results revealed low reading comprehension levels. Results contrast with results from the SORS-based questionnaire (Cabinda, 2013) which revealed claims of use of a wide range of cognitive, metacognitive and supply strategies – aspects of high level reading ability and text comprehension. Conclusions show that the participants used or claimed to chiefly use metacognitive and cognitive reading strategies equally, matching the behaviour of good readers, but they also reported a high degree of supply strategies to construe meaning from text, mainly code-switching, translation and cognates. The latter confirms results from studies by Jimenez et al. (1995, 1996) and Zhang & Wu (2009), yet do not conclusively show a correlation between the participants’ degree of text comprehension and their effective use of reading skills and strategies to construe meaning. Further conclusions show that the reported high use of these L1 (Portuguese or other) related supply strategies (not used by English L1 readers) does not aid their reading comprehension
    • …
    corecore