Search CORE

93 research outputs found

A Semi-automatic Method for Efficient Detection of Stories on Social Media

Author: Roy Deb
Vosoughi Soroush
Publication venue
Publication date: 01/05/2016
Field of study

Twitter has become one of the main sources of news for many people. As real-world events and emergencies unfold, Twitter is abuzz with hundreds of thousands of stories about the events. Some of these stories are harmless, while others could potentially be life-saving or sources of malicious rumors. Thus, it is critically important to be able to efficiently track stories that spread on Twitter during these events. In this paper, we present a novel semi-automatic tool that enables users to efficiently identify and track stories about real-world events on Twitter. We ran a user study with 25 participants, demonstrating that compared to more conventional methods, our tool can increase the speed and the accuracy with which users can track stories about real-world events.Comment: ICWSM'16, May 17-20, Cologne, Germany. In Proceedings of the 10th International AAAI Conference on Weblogs and Social Media (ICWSM 2016). Cologne, German

arXiv.org e-Print Archive

DSpace@MIT

Tweet Acts: A Speech Act Classifier for Twitter

Author: Roy Deb
Vosoughi Soroush
Publication venue
Publication date: 31/03/2016
Field of study

Speech acts are a way to conceptualize speech as action. This holds true for communication on any platform, including social media platforms such as Twitter. In this paper, we explored speech act recognition on Twitter by treating it as a multi-class classification problem. We created a taxonomy of six speech acts for Twitter and proposed a set of semantic and syntactic features. We trained and tested a logistic regression classifier using a data set of manually labelled tweets. Our method achieved a state-of-the-art performance with an average F1 score of more than

0.70

. We also explored classifiers with three different granularities (Twitter-wide, type-specific and topic-specific) in order to find the right balance between generalization and overfitting for our task.Comment: ICWSM'16, May 17-20, Cologne, Germany. In Proceedings of the 10th AAAI Conference on Weblogs and Social Media (ICWSM 2016). Cologne, German

arXiv.org e-Print Archive

DSpace@MIT

Association for the Advancement of Artificial Intelligence: AAAI Publications

Automatic Detection and Categorization of Election-Related Tweets

Author: Roy Deb
Vijayaraghavan Prashanth
Vosoughi Soroush
Publication venue
Publication date: 01/05/2016
Field of study

With the rise in popularity of public social media and micro-blogging services, most notably Twitter, the people have found a venue to hear and be heard by their peers without an intermediary. As a consequence, and aided by the public nature of Twitter, political scientists now potentially have the means to analyse and understand the narratives that organically form, spread and decline among the public in a political campaign. However, the volume and diversity of the conversation on Twitter, combined with its noisy and idiosyncratic nature, make this a hard task. Thus, advanced data mining and language processing techniques are required to process and analyse the data. In this paper, we present and evaluate a technical framework, based on recent advances in deep neural networks, for identifying and analysing election-related conversation on Twitter on a continuous, longitudinal basis. Our models can detect election-related tweets with an F-score of 0.92 and can categorize these tweets into 22 topics with an F-score of 0.90.Comment: ICWSM'16, May 17-20, 2016, Cologne, Germany. In Proceedings of the 10th AAAI Conference on Weblogs and Social Media (ICWSM 2016). Cologne, German

arXiv.org e-Print Archive

DSpace@MIT

Digital Stylometry: Linking Profiles Across Social Networks

Author: Roy Deb
Vosoughi Soroush
Zhou Helen
Publication venue
Publication date: 01/01/2015
Field of study

There is an ever growing number of users with accounts on multiple social media and networking sites. Consequently, there is increasing interest in matching user accounts and profiles across different social networks in order to create aggregate profiles of users. In this paper, we present models for Digital Stylometry, which is a method for matching users through stylometry inspired techniques. We experimented with linguistic, temporal, and combined temporal-linguistic models for matching user accounts, using standard and novel techniques. Using publicly available data, our best model, a combined temporal-linguistic one, was able to correctly match the accounts of 31% of 5,612 distinct users across Twitter and Facebook.Comment: SocInfo'15, Beijing, China. In proceedings of the 7th International Conference on Social Informatics (SocInfo 2015). Beijing, Chin

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Crossref

Big Green at WNUT 2020 Shared Task-1: Relation Extraction as Contextualized Sequence Classification

Author: Miller Chris
Vosoughi Soroush
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Relation and event extraction is an important task in natural language processing. We introduce a system which uses contextualized knowledge graph completion to classify relations and events between known entities in a noisy text environment. We report results which show that our system is able to effectively extract relations and events from a dataset of wet lab protocols.Comment: Proceedings of the 6th Workshop on Noisy User-generated Text (W-NUT) at EMNLP 202

arXiv.org e-Print Archive

Crossref

What Are People Asking About COVID-19? A Question Classification Dataset

Author: Huang Chengyu
Vosoughi Soroush
Wei Jason
Wei Jerry
Publication venue
Publication date: 15/09/2020
Field of study

We present COVID-Q, a set of 1,690 questions about COVID-19 from 13 sources, which we annotate into 15 question categories and 207 question clusters. The most common questions in our dataset asked about transmission, prevention, and societal effects of COVID, and we found that many questions that appeared in multiple sources were not answered by any FAQ websites of reputable organizations such as the CDC and FDA. We post our dataset publicly at https://github.com/JerryWei03/COVID-Q. For classifying questions into 15 categories, a BERT baseline scored 58.1% accuracy when trained on 20 examples per category, and for a question clustering task, a BERT + triplet loss baseline achieved 49.5% accuracy. We hope COVID-Q can help either for direct use in developing applied systems or as a domain-specific resource for model evaluation.Comment: Published in Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 202

arXiv.org e-Print Archive

Interactions of caregiver speech and early word learning in the Speechome corpus : computational explorations

Author: Vosoughi Soroush
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2010
Field of study

Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 107-110).How do characteristics of caregiver speech contribute to a child's early word learning? What is the relationship between a child's language development and caregivers' speech? Motivated by these general questions, this thesis comprises a series of computational studies on the fined-grained interactions of caregiver speech and one child's early linguistic development, using the naturalistic, high-density longitudinal corpus collected for the Human Speechome Project. The child's first productive use of a word was observed at about 11 months, totaling 517 words by his second birthday. Why did he learn those 517 words at the precise ages that he did? To address this specific question, we examined the relationship of the child's vocabulary growth to prosodic and distributional features of the naturally occurring caregiver speech to which the child was exposed. We measured fundamental frequency, intensity, phoneme duration, word usage frequency, word recurrence and mean length of utterances (MLU) for over one million words of caregivers' speech. We found significant correlations between all 6 variables and the child's age of acquisition (AoA) for individual words, with the best linear combination of these variables producing a correlation of r = -. 55(p < .001). We then used these variables to obtain a model of word acquisition as a function of caregiver input speech. This model was able to accurately predict the AoA of individual words within 55 days of their true AoA. We next looked at the temporal relationships between caregivers' speech and the child's lexical development. This was done by generating time-series for each variables for each caregiver, for each word. These time-series were then time-aligned by AoA. This analysis allowed us to see whether there is a consistent change in caregiver behavior for each of the six variables before and after the AoA of individual words. The six variables in caregiver speech all showed significant temporal relationships with the child's lexical development, suggesting that caregivers tune the prosodic and distributional characteristics of their speech to the linguistic ability of the child. This tuning behavior involves the caregivers progressively shortening their utterance lengths, becoming more redundant and exaggerating prosody more when uttering particular words as the child gets closer to the AoA of those words and reversing this trend as the child moves beyond the AoA. This "tuning" behavior was remarkably consistent across caregivers and variables, all following a very similar pattern. We found significant correlations between the patterns of change in caregiver behavior for each of the 6 variables and the AoA for individual words, with their best linear combination producing a correlation of r = -. 91(p < .001). Though the underlying cause of this strong correlation will require further study, it provides evidence of a new kind for fine-grained adaptive behavior by the caregivers in the context of child language development.by Soroush Vosoughi.S.M

DSpace@MIT