4 research outputs found
Event Detection from Social Media Stream: Methods, Datasets and Opportunities
Social media streams contain large and diverse amount of information, ranging
from daily-life stories to the latest global and local events and news.
Twitter, especially, allows a fast spread of events happening real time, and
enables individuals and organizations to stay informed of the events happening
now. Event detection from social media data poses different challenges from
traditional text and is a research area that has attracted much attention in
recent years. In this paper, we survey a wide range of event detection methods
for Twitter data stream, helping readers understand the recent development in
this area. We present the datasets available to the public. Furthermore, a few
research opportunitiesComment: 8 page
Summarize Dates First: A Paradigm Shift in Timeline Summarization
Timeline summarization aims at presenting long news stories in a compact manner. State-of-the-art approaches first select the most relevant dates from the original event timeline then produce per-date news summaries. Date selection is driven by either per-date news content or date-level references. When coping with complex event data, characterized by inherent news flow redundancy, this pipeline may encounter relevant issues in both date selection and summarization due to a limited use of news content in date selection and no use of high-level temporal references (e.g., the past month). This paper proposes a paradigm shift in timeline summarization aimed at overcoming the above issues. It presents a new approach, namely Summarize Date First, which focuses on first generating date-level summaries then selecting the most relevant dates on top of summarized knowledge. In the latter stage, it performs date aggregations to consider high-level temporal references as well. The proposed pipeline also supports frequent incremental timeline updates more efficiently than previous approaches. We tested our unsupervised approach both on existing benchmark datasets and on a newly proposed benchmark dataset describing the COVID-19 news timeline. The achieved results were superior to state-of-the-art unsupervised methods and competitive against supervised ones
Grey theory based BP-NN co-training for dense sequence long-term tendency prediction
The file attached to this record is the author's final peer reviewed version.Purpose - The purpose of this paper is to solve the problems existing in topic popularity
prediction in online social networks and advance a fine-grained and long-term
prediction model for lack of sufficient data.
Design/methodology/approach - Based on GM(1,1) and neural networks, a cotraining model for topic tendency prediction is proposed in this paper. The
interpolation based on GM(1,1) is employed to generate fine-grained prediction
values of topic popularity time series and two neural network models are considered
to achieve convergence by transmitting training parameters via their loss functions.
Findings - The experiment results indicate that the integrated model can effectively
predict dense sequence with higher performance than other algorithms, such as NN
and RBF_LSSVM. Furthermore, the Markov chain state transition probability matrix
model is used to improve the prediction results.
Practical implications - Fine-grained and long-term topic popularity prediction,
further improvement could be made by predicting any interpolation in the time
interval of popularity data points.
Originality/value - The paper succeeds in constructing a co-training model with
GM(1,1) and neural networks. Markov chain state transition probability matrix is
deployed for further improvement of popularity tendency prediction
Understanding the topics and opinions from social media content
Social media has become one indispensable part of people’s daily life, as it records and reflects people’s opinions and events of interest, as well as influences people’s perceptions. As the most commonly employed and easily accessed data format on social media, a great deal of the social media textual content is not only factual and objective, but also rich in opinionated information. Thus, besides the topics Internet users are talking about in social media textual content, it is also of great importance to understand the opinions they are expressing. In this thesis, I present my broadly applicable text mining approaches, in order to understand the topics and opinions of user-generated texts on social media, to provide insights about the thoughts of Internet users on entities, events, etc. Specifically, I develop approaches to understand the semantic differences between language-specific editions of Wikipedia, when discussing certain entities from the related topical aspects perspective and the aggregated sentiment bias perspective. Moreover, I employ effective features to detect the reputation-influential sentences for person and company entities in Wikipedia articles, which lead to the detected sentiment bias. Furthermore, I propose neural network models with different levels of attention mechanism, to detect the stances of tweets towards any given target. I also introduce an online timeline generation approach, to detect and summarise the relevant sub-topics in the tweet stream, in order to provide Internet users with some insights about the evolution of major events they are interested in