93 research outputs found
A Semi-automatic Method for Efficient Detection of Stories on Social Media
Twitter has become one of the main sources of news for many people. As
real-world events and emergencies unfold, Twitter is abuzz with hundreds of
thousands of stories about the events. Some of these stories are harmless,
while others could potentially be life-saving or sources of malicious rumors.
Thus, it is critically important to be able to efficiently track stories that
spread on Twitter during these events. In this paper, we present a novel
semi-automatic tool that enables users to efficiently identify and track
stories about real-world events on Twitter. We ran a user study with 25
participants, demonstrating that compared to more conventional methods, our
tool can increase the speed and the accuracy with which users can track stories
about real-world events.Comment: ICWSM'16, May 17-20, Cologne, Germany. In Proceedings of the 10th
International AAAI Conference on Weblogs and Social Media (ICWSM 2016).
Cologne, German
Tweet Acts: A Speech Act Classifier for Twitter
Speech acts are a way to conceptualize speech as action. This holds true for
communication on any platform, including social media platforms such as
Twitter. In this paper, we explored speech act recognition on Twitter by
treating it as a multi-class classification problem. We created a taxonomy of
six speech acts for Twitter and proposed a set of semantic and syntactic
features. We trained and tested a logistic regression classifier using a data
set of manually labelled tweets. Our method achieved a state-of-the-art
performance with an average F1 score of more than . We also explored
classifiers with three different granularities (Twitter-wide, type-specific and
topic-specific) in order to find the right balance between generalization and
overfitting for our task.Comment: ICWSM'16, May 17-20, Cologne, Germany. In Proceedings of the 10th
AAAI Conference on Weblogs and Social Media (ICWSM 2016). Cologne, German
Automatic Detection and Categorization of Election-Related Tweets
With the rise in popularity of public social media and micro-blogging
services, most notably Twitter, the people have found a venue to hear and be
heard by their peers without an intermediary. As a consequence, and aided by
the public nature of Twitter, political scientists now potentially have the
means to analyse and understand the narratives that organically form, spread
and decline among the public in a political campaign. However, the volume and
diversity of the conversation on Twitter, combined with its noisy and
idiosyncratic nature, make this a hard task. Thus, advanced data mining and
language processing techniques are required to process and analyse the data. In
this paper, we present and evaluate a technical framework, based on recent
advances in deep neural networks, for identifying and analysing
election-related conversation on Twitter on a continuous, longitudinal basis.
Our models can detect election-related tweets with an F-score of 0.92 and can
categorize these tweets into 22 topics with an F-score of 0.90.Comment: ICWSM'16, May 17-20, 2016, Cologne, Germany. In Proceedings of the
10th AAAI Conference on Weblogs and Social Media (ICWSM 2016). Cologne,
German
Digital Stylometry: Linking Profiles Across Social Networks
There is an ever growing number of users with accounts on multiple social
media and networking sites. Consequently, there is increasing interest in
matching user accounts and profiles across different social networks in order
to create aggregate profiles of users. In this paper, we present models for
Digital Stylometry, which is a method for matching users through stylometry
inspired techniques. We experimented with linguistic, temporal, and combined
temporal-linguistic models for matching user accounts, using standard and novel
techniques. Using publicly available data, our best model, a combined
temporal-linguistic one, was able to correctly match the accounts of 31% of
5,612 distinct users across Twitter and Facebook.Comment: SocInfo'15, Beijing, China. In proceedings of the 7th International
Conference on Social Informatics (SocInfo 2015). Beijing, Chin
Big Green at WNUT 2020 Shared Task-1: Relation Extraction as Contextualized Sequence Classification
Relation and event extraction is an important task in natural language
processing. We introduce a system which uses contextualized knowledge graph
completion to classify relations and events between known entities in a noisy
text environment. We report results which show that our system is able to
effectively extract relations and events from a dataset of wet lab protocols.Comment: Proceedings of the 6th Workshop on Noisy User-generated Text (W-NUT)
at EMNLP 202
What Are People Asking About COVID-19? A Question Classification Dataset
We present COVID-Q, a set of 1,690 questions about COVID-19 from 13 sources,
which we annotate into 15 question categories and 207 question clusters. The
most common questions in our dataset asked about transmission, prevention, and
societal effects of COVID, and we found that many questions that appeared in
multiple sources were not answered by any FAQ websites of reputable
organizations such as the CDC and FDA. We post our dataset publicly at
https://github.com/JerryWei03/COVID-Q. For classifying questions into 15
categories, a BERT baseline scored 58.1% accuracy when trained on 20 examples
per category, and for a question clustering task, a BERT + triplet loss
baseline achieved 49.5% accuracy. We hope COVID-Q can help either for direct
use in developing applied systems or as a domain-specific resource for model
evaluation.Comment: Published in Proceedings of the 1st Workshop on NLP for COVID-19 at
ACL 202
Interactions of caregiver speech and early word learning in the Speechome corpus : computational explorations
Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 107-110).How do characteristics of caregiver speech contribute to a child's early word learning? What is the relationship between a child's language development and caregivers' speech? Motivated by these general questions, this thesis comprises a series of computational studies on the fined-grained interactions of caregiver speech and one child's early linguistic development, using the naturalistic, high-density longitudinal corpus collected for the Human Speechome Project. The child's first productive use of a word was observed at about 11 months, totaling 517 words by his second birthday. Why did he learn those 517 words at the precise ages that he did? To address this specific question, we examined the relationship of the child's vocabulary growth to prosodic and distributional features of the naturally occurring caregiver speech to which the child was exposed. We measured fundamental frequency, intensity, phoneme duration, word usage frequency, word recurrence and mean length of utterances (MLU) for over one million words of caregivers' speech. We found significant correlations between all 6 variables and the child's age of acquisition (AoA) for individual words, with the best linear combination of these variables producing a correlation of r = -. 55(p < .001). We then used these variables to obtain a model of word acquisition as a function of caregiver input speech. This model was able to accurately predict the AoA of individual words within 55 days of their true AoA. We next looked at the temporal relationships between caregivers' speech and the child's lexical development. This was done by generating time-series for each variables for each caregiver, for each word. These time-series were then time-aligned by AoA. This analysis allowed us to see whether there is a consistent change in caregiver behavior for each of the six variables before and after the AoA of individual words. The six variables in caregiver speech all showed significant temporal relationships with the child's lexical development, suggesting that caregivers tune the prosodic and distributional characteristics of their speech to the linguistic ability of the child. This tuning behavior involves the caregivers progressively shortening their utterance lengths, becoming more redundant and exaggerating prosody more when uttering particular words as the child gets closer to the AoA of those words and reversing this trend as the child moves beyond the AoA. This "tuning" behavior was remarkably consistent across caregivers and variables, all following a very similar pattern. We found significant correlations between the patterns of change in caregiver behavior for each of the 6 variables and the AoA for individual words, with their best linear combination producing a correlation of r = -. 91(p < .001). Though the underlying cause of this strong correlation will require further study, it provides evidence of a new kind for fine-grained adaptive behavior by the caregivers in the context of child language development.by Soroush Vosoughi.S.M
- …