9 research outputs found

    Automatic summarization of real world events using Twitter

    Get PDF
    Microblogging sites, such as Twitter, have become increasingly popular in recent years for reporting details of real world events via the Web. Smartphone apps enable people to communicate with a global audience to express their opinion and commentate on ongoing situations - often while geographically proximal to the event. Due to the heterogeneity and scale of the data and the fact that some messages are more salient than others for the purposes of understanding any risk to human safety and managing any disruption caused by events, automatic summarization of event-related microblogs is a non-trivial and important problem. In this paper we tackle the task of automatic summarization of Twitter posts, and present three methods that produce summaries by selecting the most representative posts from real-world tweet-event clusters. To evaluate our approaches, we compare them to the state-of-the-art summarization systems and human generated summaries. Our results show that our proposed methods outperform all the other summarization systems for English and non-English corpora

    Topic Discovery on Farsi, English, French, and Arabic Tweets Related to COVID-19 Using Text Mining Techniques

    Get PDF
    Background: Social networks are a good source for monitoring public health during the outbreak of COVID-19, these networks play an important role in identifying useful information. Objectives: This study aims to draw a comparison of the public's reaction in Twitter among the countries of West Asia (a.k.a Middle East) and North Africa in order to make an understanding of their response regarding the same global threat. Methods: 766,630 tweets in four languages (Arabic, English French, and Farsi) tweeted in March 2020, were investigated. Results: The results indicate that the only common theme among all languages is 'government responsibilities (political)' which indicates the importance of this subject for all nations. Conclusion: Although nations react similarly in some aspects, they respond differently in others and therefore, policy localization is a vital step in confronting problems such as COVID-19 pandemic. © 2021 The authors, AIT Austrian Institute of Technology and IOS Press

    Text annotation using textual semantic similarity and term-frequency (Twitter)

    Get PDF
    Researchers on social-media understandably assert that the contributions social media has made on various sectors is massive. Business development managers today have directed a huge amount of effort in strategizing efficient collaboration with both customers and other organizations using social-media. Despite the visible impact social media has made, a lot of digitally shared information is yet to be revealed. Gradually twitter has become the main hub for many Information system researchers, because tweets can freely be accessible in real-time by any one. Motivated by earlier studies where IS researchers addressed big-data analysis and management by employing content analysis techniques, this paper proposes a novel approach to perform unsupervised classification of the tweets into different labels. It introduces a unique algorithm that uses semantic similarity between texts, Term-frequency and a determinant threshold to perform content analysis. The goal of this approach is to extract relevant features from a tweet thus reducing dimension and preparing training datasets that would be used to build classifiers

    Retaining Data from Streams of Social Platforms with Minimal Regret

    Full text link

    ON RELEVANCE FILTERING FOR REAL-TIME TWEET SUMMARIZATION

    Get PDF
    Real-time tweet summarization systems (RTS) require mechanisms for capturing relevant tweets, identifying novel tweets, and capturing timely tweets. In this thesis, we tackle the RTS problem with a main focus on the relevance filtering. We experimented with different traditional retrieval models. Additionally, we propose two extensions to alleviate the sparsity and topic drift challenges that affect the relevance filtering. For the sparsity, we propose leveraging word embeddings in Vector Space model (VSM) term weighting to empower the system to use semantic similarity alongside the lexical matching. To mitigate the effect of topic drift, we exploit explicit relevance feedback to enhance profile representation to cope with its development in the stream over time. We conducted extensive experiments over three standard English TREC test collections that were built specifically for RTS. Although the extensions do not generally exhibit better performance, they are comparable to the baselines used. Moreover, we extended an event detection Arabic tweets test collection, called EveTAR, to support tasks that require novelty in the system's output. We collected novelty judgments using in-house annotators and used the collection to test our RTS system. We report preliminary results on EveTAR using different models of the RTS system.This work was made possible by NPRP grants # NPRP 7-1313-1-245 and # NPRP 7-1330-2-483 from the Qatar National Research Fund (a member of Qatar Foundation)

    An enhanced binary bat and Markov clustering algorithms to improve event detection for heterogeneous news text documents

    Get PDF
    Event Detection (ED) works on identifying events from various types of data. Building an ED model for news text documents greatly helps decision-makers in various disciplines in improving their strategies. However, identifying and summarizing events from such data is a non-trivial task due to the large volume of published heterogeneous news text documents. Such documents create a high-dimensional feature space that influences the overall performance of the baseline methods in ED model. To address such a problem, this research presents an enhanced ED model that includes improved methods for the crucial phases of the ED model such as Feature Selection (FS), ED, and summarization. This work focuses on the FS problem by automatically detecting events through a novel wrapper FS method based on Adapted Binary Bat Algorithm (ABBA) and Adapted Markov Clustering Algorithm (AMCL), termed ABBA-AMCL. These adaptive techniques were developed to overcome the premature convergence in BBA and fast convergence rate in MCL. Furthermore, this study proposes four summarizing methods to generate informative summaries. The enhanced ED model was tested on 10 benchmark datasets and 2 Facebook news datasets. The effectiveness of ABBA-AMCL was compared to 8 FS methods based on meta-heuristic algorithms and 6 graph-based ED methods. The empirical and statistical results proved that ABBAAMCL surpassed other methods on most datasets. The key representative features demonstrated that ABBA-AMCL method successfully detects real-world events from Facebook news datasets with 0.96 Precision and 1 Recall for dataset 11, while for dataset 12, the Precision is 1 and Recall is 0.76. To conclude, the novel ABBA-AMCL presented in this research has successfully bridged the research gap and resolved the curse of high dimensionality feature space for heterogeneous news text documents. Hence, the enhanced ED model can organize news documents into distinct events and provide policymakers with valuable information for decision making

    Stochastic Sampling and Machine Learning Techniques for Social Media State Production

    Get PDF
    The rise in the importance of social media platforms as communication tools has been both a blessing and a curse. For scientists, they offer an unparalleled opportunity to study human social networks. However, these platforms have also been used to propagate misinformation and hate speech with alarming velocity and frequency. The overarching aim of our research is to leverage the data from social media platforms to create and evaluate a high-fidelity, at-scale computational simulation of online social behavior which can provide a deep quantitative understanding of adversaries\u27 use of the global information environment. Our hope is that this type of simulation can be used to predict and understand the spread of misinformation, false narratives, fraudulent financial pump and dump schemes, and cybersecurity threats. To do this, our research team has created an agent-based model that can handle a variety of prediction tasks. This dissertation introduces a set of sampling and deep learning techniques that we developed to predict specific aspects of the evolution of online social networks that have proven to be challenging to accurately predict with the agent-based model. First, we compare different strategies for predicting network evolution with sampled historical data based on community features. We demonstrate that our community-based model outperforms the global one at predicting population, user, and content activity, along with network topology over different datasets. Second, we introduce a deep learning model for burst prediction. Bursts may serve as a signal of topics that are of growing real-world interest. Since bursts can be caused by exogenous phenomena and are indicative of burgeoning popularity, leveraging cross-platform social media data is valuable for predicting bursts within a single social media platform. An LSTM model is proposed in order to capture the temporal dependencies and associations based upon activity information. These volume predictions can also serve as a valuable input for our agent-based model. Finally, we conduct an exploration of Graph Convolutional Networks to investigate the value of weak-ties in classifying academic literature with the use of graph convolutional neural networks. Our experiments look at the results of treating weak-ties as if they were strong-ties to determine if that assumption improves performance. We also examine how node removal affects prediction accuracy by selecting nodes according to different centrality measures. These experiments provide insight for which nodes are most important for the performance of targeted graph convolutional networks. Graph Convolutional Networks are important in the social network context as the sociological and anthropological concept of \u27homophily\u27 allows for the method to use network associations in assisting the attribute predictions in a social network

    Automatic summarization of real world events using Twitter

    Get PDF
    Microblogging sites, such as Twitter, have become increasingly popular in recent years for reporting details of real world events via the Web. Smartphone apps enable people to communicate with a global audience to express their opinion and commentate on ongoing situations - often while geographically proximal to the event. Due to the heterogeneity and scale of the data and the fact that some messages are more salient than others for the purposes of understanding any risk to human safety and managing any disruption caused by events, automatic summarization of event-related microblogs is a non-trivial and important problem. In this paper we tackle the task of automatic summarization of Twitter posts, and present three methods that produce summaries by selecting the most representative posts from real-world tweet-event clusters. To evaluate our approaches, we compare them to the state-of-the-art summarization systems and human generated summaries. Our results show that our proposed methods outperform all the other summarization systems for English and non-English corpora
    corecore