100 research outputs found

    A Comparison of Retweet Prediction Approaches: The Superiority of Random Forest Learning Method

    Get PDF
    We consider the following retweet prediction task: given a tweet, predict whether it will be retweeted. In the past, a wide range of learning methods and features has been proposed for this task. We provide a systematic comparison of the performance of these learning methods and features in terms of prediction accuracy and feature importance. Specifically, from each previously published approach we take the best performing features and group these into two sets: user features and tweet features. In addition, we contrast five learning methods, both linear and non-linear. On top of that, we examine the added value of a previously proposed time-sensitive modeling approach. To the authors’ knowledge this is the first attempt to collect best performing features and contrast linear and non-linear learning methods. We perform our comparisons on a single dataset and find that user features such as the number of times a user is listed, number of followers, and average number of tweets published per day most strongly contribute to prediction accuracy across selected learning methods. We also find that a random forest-based learning, which has not been employed in previous studies, achieves the highest performance among the learning methods we consider. We also find that on top of properly tuned learning methods the benefits of time-sensitive modeling are very limited

    UNIMIB@NEEL-IT: Named Entity Recognition and Linking of Italian Tweets

    Get PDF
    Questo articolo descrive il sistema proposto dal gruppo UNIMIB per il task di Named Entity Recognition and Linking applicato a tweet in lingua italiana (NEEL-IT). Il sistema, che rappresenta un approccio iniziale al problema, \ue8 costituito da tre passaggi fondamentali: (1) Named Entity Recognition tramite l\u2019utilizzo di Conditional Random Fields, (2) Named Entity Linking considerando sia approcci supervisionati sia modelli di linguaggio basati su reti neurali, e (3) NIL clustering tramite un approccio basato su grafi.This paper describes the framework proposed by the UNIMIB Team for the task of Named Entity Recognition and Linking of Italian Tweets (NEEL-IT). The proposed pipeline, which represents an entry level system, is composed of three main steps: (1) Named Entity Recognition using Conditional Random Fields, (2) Named Entity Linking by considering both Supervised and Neural-Network Language models, and (3) NIL clustering byusing a graph-based approach

    FAKE NEWS DETECTION ON THE WEB: A DEEP LEARNING BASED APPROACH

    Get PDF
    The acceptance and popularity of social media platforms for the dispersion and proliferation of news articles have led to the spread of questionable and untrusted information (in part) due to the ease by which misleading content can be created and shared among the communities. While prior research has attempted to automatically classify news articles and tweets as credible and non-credible. This work complements such research by proposing an approach that utilizes the amalgamation of Natural Language Processing (NLP), and Deep Learning techniques such as Long Short-Term Memory (LSTM). Moreover, in Information System’s paradigm, design science research methodology (DSRM) has become the major stream that focuses on building and evaluating an artifact to solve emerging problems. Hence, DSRM can accommodate deep learning-based models with the availability of adequate datasets. Two publicly available datasets that contain labeled news articles and tweets have been used to validate the proposed model’s effectiveness. This work presents two distinct experiments, and the results demonstrate that the proposed model works well for both long sequence news articles and short-sequence texts such as tweets. Finally, the findings suggest that the sentiments, tagging, linguistics, syntactic, and text embeddings are the features that have the potential to foster fake news detection through training the proposed model on various dimensionality to learn the contextual meaning of the news content

    Credibility Evaluation of User-generated Content using Novel Multinomial Classification Technique

    Get PDF
    Awareness about the features of the internet, easy access to data using mobile, and affordable data facilities have caused a lot of traffic on the internet. Digitization came with a lot of opportunities and challenges as well. One of the important advantages of digitization is paperless transactions, and transparency in payment, while data privacy, fake news, and cyber-attacks are the evolving challenges. The extensive use of social media networks and e-commerce websites has caused a lot of user-generated information, misinformation, and disinformation on the Internet. The quality of information depends upon various stages (of information) like generation of information, medium of propagation, and consumption of information. Content being user-generated, information needs a quality assessment before consumption. The loss of information is also necessary to be examined by applying the machine learning approach as the volume of content is extremely huge. This research work focuses on novel multinomial classification (based on multinoulli distribution) techniques to determine the quality of the information in the given content. To evaluate the information content a single algorithm with some processing is not sufficient and various approaches are necessary to evaluate the quality of content.  We propose a novel approach to calculate the bias, for which the Machine Learning model will be fitted appropriately to classify the content correctly. As an empirical study, rotten tomatoes’ movie review data set is used to apply the classification techniques. The accuracy of the system is evaluated using the ROC curve, confusion matrix, and MAP

    Credibility assessment of financial stock tweets

    Get PDF
    © 2020 The Authors Social media plays an important role in facilitating conversations and news dissemination. Specifically, Twitter has recently seen use by investors to facilitate discussions surrounding stock exchange-listed companies. Investors depend on timely, credible information being made available in order to make well-informed investment decisions, with credibility being defined as the believability of information. Much work has been done on assessing credibility on Twitter in domains such as politics and natural disaster events, but the work on assessing the credibility of financial statements is scant within the literature. Investments made on apocryphal information could hamper efforts of social media's aim of providing a transparent arena for sharing news and encouraging discussion of stock market events. This paper presents a novel methodology to assess the credibility of financial stock market tweets, which is evaluated by conducting an experiment using tweets pertaining to companies listed on the London Stock Exchange. Three sets of traditional machine learning classifiers (using three different feature sets) are trained using an annotated dataset. We highlight the importance of considering features specific to the domain in which credibility needs to be assessed for – in the case of this paper, financial features. In total, after discarding non-informative features, 34 general features are combined with over 15 novel financial features for training classifiers. Results show that classifiers trained on both general and financial features can yield improved performance than classifiers trained on general features alone, with Random Forest being the top performer, although the Random Forest model requires more features (37) than that of other classifiers (such as K-Nearest Neighbours − 9) to achieve such performance
    • …
    corecore