60,463 research outputs found

    #Bieber + #Blast = #BieberBlast: Early Prediction of Popular Hashtag Compounds

    Full text link
    Compounding of natural language units is a very common phenomena. In this paper, we show, for the first time, that Twitter hashtags which, could be considered as correlates of such linguistic units, undergo compounding. We identify reasons for this compounding and propose a prediction model that can identify with 77.07% accuracy if a pair of hashtags compounding in the near future (i.e., 2 months after compounding) shall become popular. At longer times T = 6, 10 months the accuracies are 77.52% and 79.13% respectively. This technique has strong implications to trending hashtag recommendation since newly formed hashtag compounds can be recommended early, even before the compounding has taken place. Further, humans can predict compounds with an overall accuracy of only 48.7% (treated as baseline). Notably, while humans can discriminate the relatively easier cases, the automatic framework is successful in classifying the relatively harder cases.Comment: 14 pages, 4 figures, 9 tables, published in CSCW (Computer-Supported Cooperative Work and Social Computing) 2016. in Proceedings of 19th ACM conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2016

    Bidirectional syntactic priming across cognitive domains: from arithmetic to language and back

    Get PDF
    Scheepers et al. (2011) showed that the structure of a correctly solved mathematical equation affects how people subsequently complete sentences containing high vs. low relative-clause attachment ambiguities. Here we investigated whether such effects generalise to different structures and tasks, and importantly, whether they also hold in the reverse direction (i.e., from linguistic to mathematical processing). In a questionnaire-based experiment, participants had to solve structurally left- or right-branching equations (e.g., 5 × 2 + 7 versus 5 + 2 × 7) and to provide sensicality ratings for structurally left- or right-branching adjective-noun-noun compounds (e.g., alien monster movie versus lengthy monster movie). In the first version of the experiment, the equations were used as primes and the linguistic expressions as targets (investigating structural priming from maths to language). In the second version, the order was reversed (language-to-maths priming). Both versions of the experiment showed clear structural priming effects, conceptually replicating and extending the findings from Scheepers et al. (2011). Most crucially, the observed bi-directionality of cross-domain structural priming strongly supports the notion of shared syntactic representations (or recursive procedures to generate and parse them) between arithmetic and language

    Extending the adverbial coverage of a NLP oriented resource for French

    Get PDF
    This paper presents a work on extending the adverbial entries of LGLex: a NLP oriented syntactic resource for French. Adverbs were extracted from the Lexicon-Grammar tables of both simple adverbs ending in -ment '-ly' (Molinier and Levrier, 2000) and compound adverbs (Gross, 1986; 1990). This work relies on the exploitation of fine-grained linguistic information provided in existing resources. Various features are encoded in both LG tables and they haven't been exploited yet. They describe the relations of deleting, permuting, intensifying and paraphrasing that associate, on the one hand, the simple and compound adverbs and, on the other hand, different types of compound adverbs. The resulting syntactic resource is manually evaluated and freely available under the LGPL-LR license.Comment: Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP'11), Chiang Mai : Thailand (2011

    Meta-Learning for Phonemic Annotation of Corpora

    Get PDF
    We apply rule induction, classifier combination and meta-learning (stacked classifiers) to the problem of bootstrapping high accuracy automatic annotation of corpora with pronunciation information. The task we address in this paper consists of generating phonemic representations reflecting the Flemish and Dutch pronunciations of a word on the basis of its orthographic representation (which in turn is based on the actual speech recordings). We compare several possible approaches to achieve the text-to-pronunciation mapping task: memory-based learning, transformation-based learning, rule induction, maximum entropy modeling, combination of classifiers in stacked learning, and stacking of meta-learners. We are interested both in optimal accuracy and in obtaining insight into the linguistic regularities involved. As far as accuracy is concerned, an already high accuracy level (93% for Celex and 86% for Fonilex at word level) for single classifiers is boosted significantly with additional error reductions of 31% and 38% respectively using combination of classifiers, and a further 5% using combination of meta-learners, bringing overall word level accuracy to 96% for the Dutch variant and 92% for the Flemish variant. We also show that the application of machine learning methods indeed leads to increased insight into the linguistic regularities determining the variation between the two pronunciation variants studied.Comment: 8 page
    corecore