Search CORE

12,569 research outputs found

Using Word Embedding to Evaluate the Coherence of Topics from Twitter Data

Author: Fang Anjie
Habel Philip
Macdonald Craig
Ounis Iadh
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Scholars often seek to understand topics discussed on Twitter using topic modelling approaches. Several coherence metrics have been proposed for evaluating the coherence of the topics generated by these approaches, including the pre-calculated Pointwise Mutual Information (PMI) of word pairs and the Latent Semantic Analysis (LSA) word representation vectors. As Twitter data contains abbreviations and a number of peculiarities (e.g. hashtags), it can be challenging to train effective PMI data or LSA word representation. Recently, Word Embedding (WE) has emerged as a particularly effective approach for capturing the similarity among words. Hence, in this paper, we propose new Word Embedding-based topic coherence metrics. To determine the usefulness of these new metrics, we compare them with the previous PMI/LSA-based metrics. We also conduct a large-scale crowdsourced user study to determine whether the new Word Embedding-based metrics better align with human preferences. Using two Twitter datasets, our results show that the WE-based metrics can capture the coherence of topics in tweets more robustly and efficiently than the PMI/LSA-based ones

Enlighten

Italian Event Detection Goes Deep Learning

Author: Caselli Tommaso
Publication venue
Publication date: 01/01/2018
Field of study

This paper reports on a set of experiments with different word embeddings to initialize a state-of-the-art Bi-LSTM-CRF network for event detection and classification in Italian, following the EVENTI evaluation exercise. The net- work obtains a new state-of-the-art result by improving the F1 score for detection of 1.3 points, and of 6.5 points for classification, by using a single step approach. The results also provide further evidence that embeddings have a major impact on the performance of such architectures.Comment: to appear at CLiC-it 201

arXiv.org e-Print Archive

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

OpenEdition

Dissertations of the University of Groningen

Recommended from our members

Fostering Public Good Contributions with Symbolic Awards: A Large-Scale Natural Field Experiment at Wikipedia

Author: Gallus Jana
Publication venue: eScholarship, University of California
Publication date: 01/12/2017
Field of study

This natural field experiment tests the effects of purely symbolic awards on volunteer retention in a public goods context. The experiment is conducted at Wikipedia, which faces declining editor retention rates, particularly among newcomers. Randomization assures that award receipt is orthogonal to previous performance. The analysis reveals that awards have a sizeable effect on newcomer retention, which persists over the four quarters following the initial intervention. This is noteworthy for indicating that awards for volunteers can be effective even if they have no impact on the volunteers’ future career opportunities. The awards are purely symbolic, and the status increment they produce is limited to the recipients’ pseudonymous online identities in a community they have just recently joined. The results can be explained by enhanced self-identification with the community, but they are also in line with recent findings on the role of status and reputation, recognition, and evaluation potential in online communities. Data, as supplemental material, are available at http://dx.doi.org/10.1287/mnsc.2016.2540 . This paper was accepted by John List, behavioral economics

eScholarship - University of California

Uncertainty Detection as Approximate Max-Margin Sequence Labelling

Author: Dalianis Hercules
Eriksson Gunnar
Hassel Martin
Karlgren Jussi
Täckström Oscar
Velupillai Sumithra
Publication venue
Publication date: 01/01/2010
Field of study

This paper reports experiments for the CoNLL 2010 shared task on learning to detect hedges and their scope in natural language text. We have addressed the experimental tasks as supervised linear maximum margin prediction problems. For sentence level hedge detection in the biological domain we use an L1-regularised binary support vector machine, while for sentence level weasel detection in the Wikipedia domain, we use an L2-regularised approach. We model the in-sentence uncertainty cue and scope detection task as an L2-regularised approximate maximum margin sequence labelling problem, using the BIO-encoding. In addition to surface level features, we use a variety of linguistic features based on a functional dependency analysis. A greedy forward selection strategy is used in exploring the large set of potential features. Our official results for Task 1 for the biological domain are 85.2 F1-score, for the Wikipedia set 55.4 F1-score. For Task 2, our official results are 2.1 for the entire task with a score of 62.5 for cue detection. After resolving errors and final bugs, our final results are for Task 1, biological: 86.0, Wikipedia: 58.2; Task 2, scopes: 39.6 and cues: 78.5

CiteSeerX

Publikationer från Stockholms universitet

Publikationer från Uppsala Universitet

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive