60,463 research outputs found
#Bieber + #Blast = #BieberBlast: Early Prediction of Popular Hashtag Compounds
Compounding of natural language units is a very common phenomena. In this
paper, we show, for the first time, that Twitter hashtags which, could be
considered as correlates of such linguistic units, undergo compounding. We
identify reasons for this compounding and propose a prediction model that can
identify with 77.07% accuracy if a pair of hashtags compounding in the near
future (i.e., 2 months after compounding) shall become popular. At longer times
T = 6, 10 months the accuracies are 77.52% and 79.13% respectively. This
technique has strong implications to trending hashtag recommendation since
newly formed hashtag compounds can be recommended early, even before the
compounding has taken place. Further, humans can predict compounds with an
overall accuracy of only 48.7% (treated as baseline). Notably, while humans can
discriminate the relatively easier cases, the automatic framework is successful
in classifying the relatively harder cases.Comment: 14 pages, 4 figures, 9 tables, published in CSCW (Computer-Supported
Cooperative Work and Social Computing) 2016. in Proceedings of 19th ACM
conference on Computer-Supported Cooperative Work and Social Computing (CSCW
2016
Bidirectional syntactic priming across cognitive domains: from arithmetic to language and back
Scheepers et al. (2011) showed that the structure of a correctly solved mathematical equation affects how people subsequently complete sentences containing high vs. low relative-clause attachment ambiguities. Here we investigated whether such effects generalise to different structures and tasks, and importantly, whether they also hold in the reverse direction (i.e., from linguistic to mathematical processing). In a questionnaire-based experiment, participants had to solve structurally left- or right-branching equations (e.g., 5 × 2 + 7 versus 5 + 2 × 7) and to provide sensicality ratings for structurally left- or right-branching adjective-noun-noun compounds (e.g., alien monster movie versus lengthy monster movie). In the first version of the experiment, the equations were used as primes and the linguistic expressions as targets (investigating structural priming from maths to language). In the second version, the order was reversed (language-to-maths priming). Both versions of the experiment showed clear structural priming effects, conceptually replicating and extending the findings from Scheepers et al. (2011). Most crucially, the observed bi-directionality of cross-domain structural priming strongly supports the notion of shared syntactic representations (or recursive procedures to generate and parse them) between arithmetic and language
Extending the adverbial coverage of a NLP oriented resource for French
This paper presents a work on extending the adverbial entries of LGLex: a NLP
oriented syntactic resource for French. Adverbs were extracted from the
Lexicon-Grammar tables of both simple adverbs ending in -ment '-ly' (Molinier
and Levrier, 2000) and compound adverbs (Gross, 1986; 1990). This work relies
on the exploitation of fine-grained linguistic information provided in existing
resources. Various features are encoded in both LG tables and they haven't been
exploited yet. They describe the relations of deleting, permuting, intensifying
and paraphrasing that associate, on the one hand, the simple and compound
adverbs and, on the other hand, different types of compound adverbs. The
resulting syntactic resource is manually evaluated and freely available under
the LGPL-LR license.Comment: Proceedings of the 5th International Joint Conference on Natural
Language Processing (IJCNLP'11), Chiang Mai : Thailand (2011
Meta-Learning for Phonemic Annotation of Corpora
We apply rule induction, classifier combination and meta-learning (stacked
classifiers) to the problem of bootstrapping high accuracy automatic annotation
of corpora with pronunciation information. The task we address in this paper
consists of generating phonemic representations reflecting the Flemish and
Dutch pronunciations of a word on the basis of its orthographic representation
(which in turn is based on the actual speech recordings). We compare several
possible approaches to achieve the text-to-pronunciation mapping task:
memory-based learning, transformation-based learning, rule induction, maximum
entropy modeling, combination of classifiers in stacked learning, and stacking
of meta-learners. We are interested both in optimal accuracy and in obtaining
insight into the linguistic regularities involved. As far as accuracy is
concerned, an already high accuracy level (93% for Celex and 86% for Fonilex at
word level) for single classifiers is boosted significantly with additional
error reductions of 31% and 38% respectively using combination of classifiers,
and a further 5% using combination of meta-learners, bringing overall word
level accuracy to 96% for the Dutch variant and 92% for the Flemish variant. We
also show that the application of machine learning methods indeed leads to
increased insight into the linguistic regularities determining the variation
between the two pronunciation variants studied.Comment: 8 page
Atlas.txt : Exploring Lingusitic Grounding Techniques for Communicating Spatial Information to Blind Users
Peer reviewedPostprin
- …