1,247 research outputs found

    Data Science, Machine learning and big data in Digital Journalism: A survey of state-of-the-art, challenges and opportunities

    Get PDF
    Digital journalism has faced a dramatic change and media companies are challenged to use data science algo-rithms to be more competitive in a Big Data era. While this is a relatively new area of study in the media landscape, the use of machine learning and artificial intelligence has increased substantially over the last few years. In particular, the adoption of data science models for personalization and recommendation has attracted the attention of several media publishers. Following this trend, this paper presents a research literature analysis on the role of Data Science (DS) in Digital Journalism (DJ). Specifically, the aim is to present a critical literature review, synthetizing the main application areas of DS in DJ, highlighting research gaps, challenges, and op-portunities for future studies. Through a systematic literature review integrating bibliometric search, text min-ing, and qualitative discussion, the relevant literature was identified and extensively analyzed. The review reveals an increasing use of DS methods in DJ, with almost 47% of the research being published in the last three years. An hierarchical clustering highlighted six main research domains focused on text mining, event extraction, online comment analysis, recommendation systems, automated journalism, and exploratory data analysis along with some machine learning approaches. Future research directions comprise developing models to improve personalization and engagement features, exploring recommendation algorithms, testing new automated jour-nalism solutions, and improving paywall mechanisms.Acknowledgements This work was supported by the FCT-Funda?a ? o para a CiĂŞncia e Tecnologia, under the Projects: UIDB/04466/2020, UIDP/04466/2020, and UIDB/00319/2020

    A study of feature exraction techniques for classifying topics and sentiments from news posts

    Get PDF
    Recently, many news channels have their own Facebook pages in which news posts have been released in a daily basis. Consequently, these news posts contain temporal opinions about social events that may change over time due to external factors as well as may use as a monitor to the significant events happened around the world. As a result, many text mining researches have been conducted in the area of Temporal Sentiment Analysis, which one of its most challenging tasks is to detect and extract the key features from news posts that arrive continuously overtime. However, extracting these features is a challenging task due to post’s complex properties, also posts about a specific topic may grow or vanish overtime leading in producing imbalanced datasets. Thus, this study has developed a comparative analysis on feature extraction Techniques which has examined various feature extraction techniques (TF-IDF, TF, BTO, IG, Chi-square) with three different n-gram features (Unigram, Bigram, Trigram), and using SVM as a classifier. The aim of this study is to discover the optimal Feature Extraction Technique (FET) that could achieve optimum accuracy results for both topic and sentiment classification. Accordingly, this analysis is conducted on three news channels’ datasets. The experimental results for topic classification have shown that Chi-square with unigram have proven to be the best FET compared to other techniques. Furthermore, to overcome the problem of imbalanced data, this study has combined the best FET with OverSampling technology. The evaluation results have shown an improvement in classifier’s performance and has achieved a higher accuracy at 93.37%, 92.89%, and 91.92 for BBC, Al-Arabiya, and Al-Jazeera, respectively, compared to what have been obtained on original datasets. Similarly, same combination (Chi-square+Unigram) has been used for sentiment classification and obtained accuracies at rates of 81.87%, 70.01%, 77.36%. However, testing the recognized optimal FET on unseen randomly selected news posts has shown a relatively very low accuracies for both topic and sentiment classification due to the changes of topics and sentiments over time

    A Survey on Prediction of Movie’s Box Office Collection Using Social Media

    Get PDF
    Predicting the box office profits of a movie prior to its world wide release are a significant but also an exigent problem that needs a advanced of Intelligence. Currently, social media has given away its diagnostic strength in a variety of fields, which encourages us to develop social media substance to predict box office profits. The collection of movies in provisions of profit relies on so many features for instance its making studio, type, screenplay superiority, pre release endorsement etc, each of which are usually utilized to approximation their probable achievement at the box office. Nevertheless, the “Wisdom of Crowd” and social media have been accredited as a powerful indication in appreciative customer activities to media. In this survey, we converse the influence of socially created Meta data derived from the social multimedia websites and review the effect of social media on box office collection and success of movies. This survey paper is written for (social networking) investigators who looking for to evaluate prediction of movies using social media. It gives a complete study of social media analytics for social networking, wikis, actually easy syndication feeds, blogs, newsgroups, chat and news feeds etc. Keywords: Social Networking, Social media. Movie’s Box Office, Prediction, Profitability, Sentiment analysi

    Analyzing Granger causality in climate data with time series classification methods

    Get PDF
    Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested

    Context-Aware Message-Level Rumour Detection with Weak Supervision

    Get PDF
    Social media has become the main source of all sorts of information beyond a communication medium. Its intrinsic nature can allow a continuous and massive flow of misinformation to make a severe impact worldwide. In particular, rumours emerge unexpectedly and spread quickly. It is challenging to track down their origins and stop their propagation. One of the most ideal solutions to this is to identify rumour-mongering messages as early as possible, which is commonly referred to as "Early Rumour Detection (ERD)". This dissertation focuses on researching ERD on social media by exploiting weak supervision and contextual information. Weak supervision is a branch of ML where noisy and less precise sources (e.g. data patterns) are leveraged to learn limited high-quality labelled data (Ratner et al., 2017). This is intended to reduce the cost and increase the efficiency of the hand-labelling of large-scale data. This thesis aims to study whether identifying rumours before they go viral is possible and develop an architecture for ERD at individual post level. To this end, it first explores major bottlenecks of current ERD. It also uncovers a research gap between system design and its applications in the real world, which have received less attention from the research community of ERD. One bottleneck is limited labelled data. Weakly supervised methods to augment limited labelled training data for ERD are introduced. The other bottleneck is enormous amounts of noisy data. A framework unifying burst detection based on temporal signals and burst summarisation is investigated to identify potential rumours (i.e. input to rumour detection models) by filtering out uninformative messages. Finally, a novel method which jointly learns rumour sources and their contexts (i.e. conversational threads) for ERD is proposed. An extensive evaluation setting for ERD systems is also introduced
    • …
    corecore