50,509 research outputs found
#Bieber + #Blast = #BieberBlast: Early Prediction of Popular Hashtag Compounds
Compounding of natural language units is a very common phenomena. In this
paper, we show, for the first time, that Twitter hashtags which, could be
considered as correlates of such linguistic units, undergo compounding. We
identify reasons for this compounding and propose a prediction model that can
identify with 77.07% accuracy if a pair of hashtags compounding in the near
future (i.e., 2 months after compounding) shall become popular. At longer times
T = 6, 10 months the accuracies are 77.52% and 79.13% respectively. This
technique has strong implications to trending hashtag recommendation since
newly formed hashtag compounds can be recommended early, even before the
compounding has taken place. Further, humans can predict compounds with an
overall accuracy of only 48.7% (treated as baseline). Notably, while humans can
discriminate the relatively easier cases, the automatic framework is successful
in classifying the relatively harder cases.Comment: 14 pages, 4 figures, 9 tables, published in CSCW (Computer-Supported
Cooperative Work and Social Computing) 2016. in Proceedings of 19th ACM
conference on Computer-Supported Cooperative Work and Social Computing (CSCW
2016
Investigating five key predictive text entry with combined distance and keystroke modelling
This paper investigates text entry on mobile devices using only five-keys. Primarily to support text entry on smaller devices than mobile phones, this method can also be used to maximise screen space on mobile phones. Reported combined Fitt's law and keystroke modelling predicts similar performance with bigram prediction using a five-key keypad as is currently achieved on standard mobile phones using unigram prediction. User studies reported here show similar user performance on five-key pads as found elsewhere for novice nine-key pad users
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
Inference of the Russian drug community from one of the largest social networks in the Russian Federation
The criminal nature of narcotics complicates the direct assessment of a drug
community, while having a good understanding of the type of people drawn or
currently using drugs is vital for finding effective intervening strategies.
Especially for the Russian Federation this is of immediate concern given the
dramatic increase it has seen in drug abuse since the fall of the Soviet Union
in the early nineties. Using unique data from the Russian social network
'LiveJournal' with over 39 million registered users worldwide, we were able for
the first time to identify the on-line drug community by context sensitive text
mining of the users' blogs using a dictionary of known drug-related official
and 'slang' terminology. By comparing the interests of the users that most
actively spread information on narcotics over the network with the interests of
the individuals outside the on-line drug community, we found that the 'average'
drug user in the Russian Federation is generally mostly interested in topics
such as Russian rock, non-traditional medicine, UFOs, Buddhism, yoga and the
occult. We identify three distinct scale-free sub-networks of users which can
be uniquely classified as being either 'infectious', 'susceptible' or 'immune'.Comment: 12 pages, 11 figure
Purported use and self-awareness of cognitive and metacognitive foreign language reading strategies in tertiary education in Mozambique
This paper explores the results of a Survey of Reading Strategies (SORS)-based questionnaire administered to 28 university student participants. The study is carried out in a post-colonial multilingual context, Mozambique. The main aims of the paper are to assess the degree of purported use and awareness of participants own use of reading comprehension skills and strategies in a foreign language (English). The participants were tested for their reading text comprehension using an IELTS comprehension test (Cabinda, 2013). The results revealed low reading comprehension levels. Results contrast with results from the SORS-based questionnaire (Cabinda, 2013) which revealed claims of use of a wide range of cognitive, metacognitive and supply strategies – aspects of high level reading ability and text comprehension. Conclusions show that the participants used or claimed to chiefly use metacognitive and cognitive reading strategies equally, matching the behaviour of good readers, but they also reported a high degree of supply strategies to construe meaning from text, mainly code-switching, translation and cognates. The latter confirms results from studies by Jimenez et al. (1995, 1996) and Zhang & Wu (2009), yet do not conclusively show a correlation between the participants’ degree of text comprehension and their effective use of reading skills and strategies to construe meaning. Further conclusions show that the reported high use of these L1 (Portuguese or other) related supply strategies (not used by English L1 readers) does not aid their reading comprehension
- …