4 research outputs found

    Event Detection from Social Media Stream: Methods, Datasets and Opportunities

    Full text link
    Social media streams contain large and diverse amount of information, ranging from daily-life stories to the latest global and local events and news. Twitter, especially, allows a fast spread of events happening real time, and enables individuals and organizations to stay informed of the events happening now. Event detection from social media data poses different challenges from traditional text and is a research area that has attracted much attention in recent years. In this paper, we survey a wide range of event detection methods for Twitter data stream, helping readers understand the recent development in this area. We present the datasets available to the public. Furthermore, a few research opportunitiesComment: 8 page

    Summarize Dates First: A Paradigm Shift in Timeline Summarization

    Get PDF
    Timeline summarization aims at presenting long news stories in a compact manner. State-of-the-art approaches first select the most relevant dates from the original event timeline then produce per-date news summaries. Date selection is driven by either per-date news content or date-level references. When coping with complex event data, characterized by inherent news flow redundancy, this pipeline may encounter relevant issues in both date selection and summarization due to a limited use of news content in date selection and no use of high-level temporal references (e.g., the past month). This paper proposes a paradigm shift in timeline summarization aimed at overcoming the above issues. It presents a new approach, namely Summarize Date First, which focuses on first generating date-level summaries then selecting the most relevant dates on top of summarized knowledge. In the latter stage, it performs date aggregations to consider high-level temporal references as well. The proposed pipeline also supports frequent incremental timeline updates more efficiently than previous approaches. We tested our unsupervised approach both on existing benchmark datasets and on a newly proposed benchmark dataset describing the COVID-19 news timeline. The achieved results were superior to state-of-the-art unsupervised methods and competitive against supervised ones

    Grey theory based BP-NN co-training for dense sequence long-term tendency prediction

    Get PDF
    The file attached to this record is the author's final peer reviewed version.Purpose - The purpose of this paper is to solve the problems existing in topic popularity prediction in online social networks and advance a fine-grained and long-term prediction model for lack of sufficient data. Design/methodology/approach - Based on GM(1,1) and neural networks, a cotraining model for topic tendency prediction is proposed in this paper. The interpolation based on GM(1,1) is employed to generate fine-grained prediction values of topic popularity time series and two neural network models are considered to achieve convergence by transmitting training parameters via their loss functions. Findings - The experiment results indicate that the integrated model can effectively predict dense sequence with higher performance than other algorithms, such as NN and RBF_LSSVM. Furthermore, the Markov chain state transition probability matrix model is used to improve the prediction results. Practical implications - Fine-grained and long-term topic popularity prediction, further improvement could be made by predicting any interpolation in the time interval of popularity data points. Originality/value - The paper succeeds in constructing a co-training model with GM(1,1) and neural networks. Markov chain state transition probability matrix is deployed for further improvement of popularity tendency prediction

    Understanding the topics and opinions from social media content

    Get PDF
    Social media has become one indispensable part of people’s daily life, as it records and reflects people’s opinions and events of interest, as well as influences people’s perceptions. As the most commonly employed and easily accessed data format on social media, a great deal of the social media textual content is not only factual and objective, but also rich in opinionated information. Thus, besides the topics Internet users are talking about in social media textual content, it is also of great importance to understand the opinions they are expressing. In this thesis, I present my broadly applicable text mining approaches, in order to understand the topics and opinions of user-generated texts on social media, to provide insights about the thoughts of Internet users on entities, events, etc. Specifically, I develop approaches to understand the semantic differences between language-specific editions of Wikipedia, when discussing certain entities from the related topical aspects perspective and the aggregated sentiment bias perspective. Moreover, I employ effective features to detect the reputation-influential sentences for person and company entities in Wikipedia articles, which lead to the detected sentiment bias. Furthermore, I propose neural network models with different levels of attention mechanism, to detect the stances of tweets towards any given target. I also introduce an online timeline generation approach, to detect and summarise the relevant sub-topics in the tweet stream, in order to provide Internet users with some insights about the evolution of major events they are interested in
    corecore