160 research outputs found
An Effective Approach for Modelling Time Features for Classifying Bursty Topics on Twitter
Several previous approaches attempted to predict bursty topics on Twitter. Such approaches have usually reported that the time information (e.g. the topic popularity over time) of hashtag topics contribute the most to the prediction of bursty topics. In this paper, we propose a novel approach to use time features to predict bursty topics on Twitter. We model the popularity of topics as density curves described by the density function of a beta distribution with different parameters. We then propose various approaches to predict/classify the bursty topics by estimating the parameters of topics, using estimators such as Gradient Decent or Likelihood Maximization. In our experiments, we show that the estimated parameters of topics have a positive effect on classifying bursty topics. In particular, our estimators when combined together improve the bursty topic classification by 6.9 in terms of micro F1 compared to a baseline classifier using hashtag content features
Extracting News Events from Microblogs
Twitter stream has become a large source of information for many people, but
the magnitude of tweets and the noisy nature of its content have made
harvesting the knowledge from Twitter a challenging task for researchers for a
long time. Aiming at overcoming some of the main challenges of extracting the
hidden information from tweet streams, this work proposes a new approach for
real-time detection of news events from the Twitter stream. We divide our
approach into three steps. The first step is to use a neural network or deep
learning to detect news-relevant tweets from the stream. The second step is to
apply a novel streaming data clustering algorithm to the detected news tweets
to form news events. The third and final step is to rank the detected events
based on the size of the event clusters and growth speed of the tweet
frequencies. We evaluate the proposed system on a large, publicly available
corpus of annotated news events from Twitter. As part of the evaluation, we
compare our approach with a related state-of-the-art solution. Overall, our
experiments and user-based evaluation show that our approach on detecting
current (real) news events delivers a state-of-the-art performance
What’s Happening Around the World? A Survey and Framework on Event Detection Techniques on Twitter
© 2019, Springer Nature B.V. In the last few years, Twitter has become a popular platform for sharing opinions, experiences, news, and views in real-time. Twitter presents an interesting opportunity for detecting events happening around the world. The content (tweets) published on Twitter are short and pose diverse challenges for detecting and interpreting event-related information. This article provides insights into ongoing research and helps in understanding recent research trends and techniques used for event detection using Twitter data. We classify techniques and methodologies according to event types, orientation of content, event detection tasks, their evaluation, and common practices. We highlight the limitations of existing techniques and accordingly propose solutions to address the shortcomings. We propose a framework called EDoT based on the research trends, common practices, and techniques used for detecting events on Twitter. EDoT can serve as a guideline for developing event detection methods, especially for researchers who are new in this area. We also describe and compare data collection techniques, the effectiveness and shortcomings of various Twitter and non-Twitter-based features, and discuss various evaluation measures and benchmarking methodologies. Finally, we discuss the trends, limitations, and future directions for detecting events on Twitter
Event Detection and Tracking Detection of Dangerous Events on Social Media
Online social media platforms have become essential tools for communication and information exchange in our lives. It is used for connecting with people and sharing information.
This phenomenon has been intensively studied in the past decade to investigate users’ sentiments for different scenarios and purposes. As the technology advanced and popularity
increased, it led to the use of different terms referring to similar topics which often result in
confusion. We study such trends and intend to propose a uniform solution that deals with
the subject clearly. We gather all these ambiguous terms under the umbrella of the most
recent and popular terms to reach a concise verdict. Many events have been addressed
in recent works that cover only specific types and domains of events. For the sake of
keeping things simple and practical, the events that are extreme, negative, and dangerous
are grouped under the name Dangerous Events (DE). These dangerous events are further
divided into three main categories of action-based, scenario-based, and sentiments-based
dangerous events to specify their characteristics. We then propose deep-learning-based
models to detect events that are dangerous in nature. The deep-learning models that include BERT, RoBERTa, and XLNet provide valuable results that can effectively help solve
the issue of detecting dangerous events using various dimensions. Even though the models
perform well, the main constraint of fewer available event datasets and lower quality of
certain events data affects the performance of these models can be tackled by handling
the issue accordingly.As plataformas online de redes sociais tornaram-se ferramentas essenciais para a comunicação, conexão com outros, e troca de informação nas nossas vidas. Este fenómeno
tem sido intensamente estudado na última década para investigar os sentimentos dos utilizadores em diferentes cenários e para vários propósitos. Contudo, a utilização dos meios
de comunicação social tornou-se mais complexa e num fenómeno mais vasto devido ao
envolvimento de múltiplos intervenientes, tais como empresas, grupos e outras organizações. À medida que a tecnologia avançou e a popularidade aumentou, a utilização de
termos diferentes referentes a tópicos semelhantes gerou confusão. Por outras palavras, os
modelos são treinados segundo a informação de termos e âmbitos específicos. Portanto, a
padronização é imperativa. O objetivo deste trabalho é unir os diferentes termos utilizados
em termos mais abrangentes e padronizados. O perigo pode ser uma ameaça como violência social, desastres naturais, danos intelectuais ou comunitários, contágio, agitação social,
perda económica, ou apenas a difusão de ideologias odiosas e violentas. Estudamos estes
diferentes eventos e classificamos-los em tópicos para que a ténica de deteção baseada em
tópicos possa ser concebida e integrada sob o termo Evento Perigosos (DE). Consequentemente, definimos o termo proposto “Eventos Perigosos” (Dangerous Events) e dividimo-lo
em três categorias principais de modo a especificar as suas características. Sendo estes
denominados Eventos Perigosos, Eventos Perigosos de nível superior, e Eventos Perigosos
de nível inferior. O conjunto de dados MAVEN foi utilizado para a obtenção de conjuntos
de dados para realizar a experiência. Estes conjuntos de dados são filtrados manualmente
com base no tipo de eventos para separar eventos perigosos de eventos gerais. Os modelos
de transformação BERT, RoBERTa, e XLNet foram utilizados para classificar dados de
texto consoante a respetiva categoria de Eventos Perigosos. Os resultados demonstraram
que o desempenho do BERT é superior a outros modelos e pode ser eficazmente utilizado
para a tarefa de deteção de Eventos Perigosos. Salienta-se que a abordagem de divisão
dos conjuntos de dados aumentou significativamente o desempenho dos modelos.
Existem diversos métodos propostos para a deteção de eventos. A deteção destes eventos
(ED) são maioritariamente classificados na categoria de supervisonado e não supervisionados, como demonstrado nos metódos supervisionados, estão incluidos support vector
machine (SVM), Conditional random field (CRF), Decision tree (DT), Naive Bayes (NB),
entre outros. Enquanto a categoria de não supervisionados inclui Query-based, Statisticalbased, Probabilistic-based, Clustering-based e Graph-based. Estas são as duas abordagens
em uso na deteção de eventos e são denonimados de document-pivot and feature-pivot. A
diferença entre estas abordagens é na sua maioria a clustering approach, a forma como
os documentos são utilizados para caracterizar vetores, e a similaridade métrica utilizada
para identificar se dois documentos correspondem ao mesmo evento ou não. Além da
deteção de eventos, a previsão de eventos é um problema importante mas complicado
que engloba diversas dimensões. Muitos destes eventos são difíceis de prever antes de
se tornarem visíveis e ocorrerem. Como um exemplo, é impossível antecipar catástrofes
naturais, sendo apenas detetáveis após o seu acontecimento. Existe um número limitado
de recursos em ternos de conjuntos de dados de eventos. ACE 2005, MAVEN, EVIN são alguns dos exemplos de conjuntos de dados disponíveis para a deteção de evnetos.
Os trabalhos recentes demonstraram que os Transformer-based pre-trained models (PTMs)
são capazes de alcançar desempenho de última geração em várias tarefas de NLP. Estes
modelos são pré-treinados em grandes quantidades de texto. Aprendem incorporações
para as palavras da língua ou representações de vetores de modo a que as palavras que se
relacionem se agrupen no espaço vectorial. Um total de três transformadores diferentes,
nomeadamente BERT, RoBERTa, e XLNet, será utilizado para conduzir a experiência e
tirar a conclusão através da comparação destes modelos.
Os modelos baseados em transformação (Transformer-based) estão em total sintonia utilizando uma divisão de 70,30 dos conjuntos de dados para fins de formação e teste/validação.
A sintonização do hiperparâmetro inclui 10 epochs, 16 batch size, e o optimizador AdamW
com taxa de aprendizagem 2e-5 para BERT e RoBERTa e 3e-5 para XLNet. Para eventos
perigosos, o BERT fornece 60%, o RoBERTa 59 enquanto a XLNet fornece apenas 54%
de precisão geral. Para as outras experiências de configuração de eventos de alto nível, o
BERT e a XLNet dão 71% e 70% de desempenho com RoBERTa em relação aos outros
modelos com 74% de precisão. Enquanto para o DE baseado em acções, DE baseado em
cenários, e DE baseado em sentimentos, o BERT dá 62%, 85%, e 81% respetivamente;
RoBERTa com 61%, 83%, e 71%; a XLNet com 52%, 81%, e 77% de precisão.
Existe a necessidade de clarificar a ambiguidade entre os diferentes trabalhos que abordam
problemas similares utilizando termos diferentes. A ideia proposta de referir acontecimentos especifícos como eventos perigosos torna mais fácil a abordagem do problema em
questão. No entanto, a escassez de conjunto de dados de eventos limita o desempenho dos
modelos e o progresso na deteção das tarefas. A disponibilidade de uma maior quantidade
de informação relacionada com eventos perigosos pode melhorar o desempenho do modelo
existente. É evidente que o uso de modelos de aprendizagem profunda, tais como como
BERT, RoBERTa, e XLNet, pode ajudar a detetar e classificar eventos perigosos de forma
eficiente. Tem sido evidente que a utilização de modelos de aprendizagem profunda, tais
como BERT, RoBERTa, e XLNet, pode ajudar a detetar e classificar eventos perigosos
de forma eficiente. Em geral, o BERT tem um desempenho superior ao do RoBERTa e
XLNet na detecção de eventos perigosos. É igualmente importante rastrear os eventos
após a sua detecção. Por conseguinte, para trabalhos futuros, propõe-se a implementação
das técnicas que lidam com o espaço e o tempo, a fim de monitorizar a sua emergência
com o tempo
Recommended from our members
Verifying baselines for crisis event information classification on Twitter
Social media are rich information sources during and in the aftermath of crisis events such as earthquakes and terrorist attacks. Despite myriad challenges, with the right tools, significant insight can be gained which can assist emergency responders and related applications. However, most extant approaches are incomparable, using bespoke definitions, models, datasets and even evaluation metrics. Furthermore, it is rare that code, trained models, or exhaustive parametrisation details are made openly available. Thus, even confirmation of self-reported performance is problematic; authoritatively determining the state of the art (SOTA) is essentially impossible. Consequently, to begin addressing such endemic ambiguity, this paper seeks to make 3 contributions: 1) the replication and results confirmation of a leading (and generalisable) technique; 2) testing straightforward modifications of the technique likely to improve performance; and 3) the extension of the technique to a novel and complimentary type of crisis-relevant information to demonstrate it’s generalisability
Mining Social Media for Newsgathering: A Review
Social media is becoming an increasingly important data source for learning
about breaking news and for following the latest developments of ongoing news.
This is in part possible thanks to the existence of mobile devices, which
allows anyone with access to the Internet to post updates from anywhere,
leading in turn to a growing presence of citizen journalism. Consequently,
social media has become a go-to resource for journalists during the process of
newsgathering. Use of social media for newsgathering is however challenging,
and suitable tools are needed in order to facilitate access to useful
information for reporting. In this paper, we provide an overview of research in
data mining and natural language processing for mining social media for
newsgathering. We discuss five different areas that researchers have worked on
to mitigate the challenges inherent to social media newsgathering: news
discovery, curation of news, validation and verification of content,
newsgathering dashboards, and other tasks. We outline the progress made so far
in the field, summarise the current challenges as well as discuss future
directions in the use of computational journalism to assist with social media
newsgathering. This review is relevant to computer scientists researching news
in social media as well as for interdisciplinary researchers interested in the
intersection of computer science and journalism.Comment: Accepted for publication in Online Social Networks and Medi
Stretching the life of Twitter classifiers with time-stamped semantic graphs
Social media has become an effective channel for communicating both trends and public opinion on current events. However the automatic topic classification of social media content pose various challenges. Topic classification is a common technique used for automatically capturing themes that emerge from social media streams. However, such techniques are sensitive to the evolution of topics when new event-dependent vocabularies start to emerge (e.g., Crimea becoming relevant to War Conflict during the Ukraine crisis in 2014). Therefore, traditional supervised classification methods which rely on labelled data could rapidly become outdated. In this paper we propose a novel transfer learning approach to address the classification task of new data when the only available labelled data belong to a previous epoch. This approach relies on the incorporation of knowledge from DBpedia graphs. Our findings show promising results in understanding how features age, and how semantic features can support the evolution of topic classifiers
Exploiting Language Models to Classify Events from Twitter
Classifying events is challenging in Twitter because tweets texts have a large amount of temporal data with a lot of noise and various kinds of topics. In this paper, we propose a method to classify events from Twitter. We firstly find the distinguishing terms between tweets in events and measure their similarities with learning language models such as ConceptNet and a latent Dirichlet allocation method for selectional preferences (LDA-SP), which have been widely studied based on large text corpora within computational linguistic relations. The relationship of term words in tweets will be discovered by checking them under each model. We then proposed a method to compute the similarity between tweets based on tweets' features including common term words and relationships among their distinguishing term words. It will be explicit and convenient for applying to k-nearest neighbor techniques for classification. We carefully applied experiments on the Edinburgh Twitter Corpus to show that our method achieves competitive results for classifying events
- …