8,626 research outputs found

    User Acquisition and Engagement in Digital News Media

    Get PDF
    Generating revenue has been a major issue for the news industry and journalism over the past decade. In fact, vast availability of free online news sources causes online news media agencies to face user acquisition and engagement as pressing issues more than before. Although digital news media agencies are seeking sustainable relationships with their users, their current business models do not satisfy this demand. As a matter of fact, they need to understand and predict how much an article can engage a reader as a crucial step in attracting readers, and then maximize the engagement using some strategies. Moreover, news media companies need effective algorithmic tools to identify users who are prone to subscription. Last but not least, online news agencies need to make smarter decisions in the way that they deliver articles to users to maximize the potential benefits. In this dissertation, we take the first steps towards achieving these goals and investigate these challenges from data mining /machine learning perspectives. First, we investigate the problem of understanding and predicting article engagement in terms of dwell time as one of the most important factors in digital news media. In particular, we design data exploratory models studying the textual elements (e.g., events, emotions) involved in article stories, and find their relationships with the engagement patterns. In the prediction task, we design a framework to predict the article dwell time based on a deep neural network architecture which exploits the interactions among important elements (i.e., augmented features) in the article content as well as the neural representation of the content to achieve the better performance. In the second part of the dissertation, we address the problem of identifying valuable visitors who are likely to subscribe in the future. We suggest that the decision for subscription is not a sudden, instantaneous action, but it is the informed decision based on positive experience with the newspaper. As such, we propose effective engagement measures and show that they are effective in building the predictive model for subscription. We design a model that predicts not only the potential subscribers but also the time that a user would subscribe. In the last part of this thesis, we consider the paywall problem in online newspapers. The traditional paywall method offers a non-subscribed reader a fixed number of free articles in a period of time (e.g., a month), and then directs the user to the subscription page for further reading. We argue that there is no direct relationship between the number of paywalls presented to readers and the number of subscriptions, and that this artificial barrier, if not used well, may disengage potential subscribers and thus may not well serve its purpose of increasing revenue. We propose an adaptive paywall mechanism to balance the benefit of showing an article against that of displaying the paywall (i.e., terminating the session). We first define the notion of cost and utility that are used to define an objective function for optimal paywall decision making. Then, we model the problem as a stochastic sequential decision process. Finally, we propose an efficient policy function for paywall decision making. All the proposed models are evaluated on real datasets from The Globe and Mail which is a major newspaper in Canada. However, the proposed techniques are not limited to any particular dataset or strict requirement. Alternatively, they are designed based on the datasets and settings which are available and common to most of newspapers. Therefore, the models are general and can be applied by any online newspaper to improve user engagement and acquisition

    Viewability prediction for display advertising

    Get PDF
    As a massive industry, display advertising delivers advertisers’ marketing messages to attract customers through graphic banners on webpages. Display advertising is also the most essential revenue source of online publishers. Currently, advertisers are charged by user response or ad serving. However, recent studies show that users barely click or convert display ads. Moreover, about half of the ads are actually never seen by users. In this case, advertisers cannot enhance their brand awareness and increase return on investment. Publishers also lose much revenue. Therefore, the ad pricing standards are shifting to a new model: ad impressions are paid if they are viewable, not just being responded to or served. The Media Ratings Council’s standard for a viewable display impression is a minimum of 50% of pixels in view for a minimum of one second. To implement viewable impressions as pricing currency, ad viewability should be accurately predicted. Ad viewability prediction can improve the performance of guaranteed ad delivery, real-time bidding, as well as recommender systems. This research is the first to address this important problem of ad viewability prediction. Inspired by the standard definition of viewability, this study proposes to solve the problem from two angles: 1) scrolling behavior and 2) dwell time. In the first phase, ad viewability is predicted by estimating the probability that a user will scroll to the page depth where an ad is located in a specific page view. Two novel probabilistic latent class models (PLC) are proposed. The first PLC model computes constant use and page memberships offline, while the second PLC model computes dynamic memberships in real-time. In the second phase, ad viewability is predicted by estimating the probability that the page depth will be in-view for certain seconds. Machine learning models based on Factorization Machines (FM) and Recurrent Neural Network (RNN) with Long Short Term Memory (LSTM) are proposed to predict the viewability of any given page depth in a specific page view. The experiments show that the proposed algorithms significantly outperform the comparison systems

    Learning to Determine the Quality of News Headlines

    Full text link
    Today, most newsreaders read the online version of news articles rather than traditional paper-based newspapers. Also, news media publishers rely heavily on the income generated from subscriptions and website visits made by newsreaders. Thus, online user engagement is a very important issue for online newspapers. Much effort has been spent on writing interesting headlines to catch the attention of online users. On the other hand, headlines should not be misleading (e.g., clickbaits); otherwise, readers would be disappointed when reading the content. In this paper, we propose four indicators to determine the quality of published news headlines based on their click count and dwell time, which are obtained by website log analysis. Then, we use soft target distribution of the calculated quality indicators to train our proposed deep learning model which can predict the quality of unpublished news headlines. The proposed model not only processes the latent features of both headline and body of the article to predict its headline quality but also considers the semantic relation between headline and body as well. To evaluate our model, we use a real dataset from a major Canadian newspaper. Results show our proposed model outperforms other state-of-the-art NLP models.Comment: 10 Pages, Accepted at the 12th International Conference on Agents and Artificial Intelligence (ICAART) 202

    Data Science, Machine learning and big data in Digital Journalism: A survey of state-of-the-art, challenges and opportunities

    Get PDF
    Digital journalism has faced a dramatic change and media companies are challenged to use data science algo-rithms to be more competitive in a Big Data era. While this is a relatively new area of study in the media landscape, the use of machine learning and artificial intelligence has increased substantially over the last few years. In particular, the adoption of data science models for personalization and recommendation has attracted the attention of several media publishers. Following this trend, this paper presents a research literature analysis on the role of Data Science (DS) in Digital Journalism (DJ). Specifically, the aim is to present a critical literature review, synthetizing the main application areas of DS in DJ, highlighting research gaps, challenges, and op-portunities for future studies. Through a systematic literature review integrating bibliometric search, text min-ing, and qualitative discussion, the relevant literature was identified and extensively analyzed. The review reveals an increasing use of DS methods in DJ, with almost 47% of the research being published in the last three years. An hierarchical clustering highlighted six main research domains focused on text mining, event extraction, online comment analysis, recommendation systems, automated journalism, and exploratory data analysis along with some machine learning approaches. Future research directions comprise developing models to improve personalization and engagement features, exploring recommendation algorithms, testing new automated jour-nalism solutions, and improving paywall mechanisms.Acknowledgements This work was supported by the FCT-Funda?a ? o para a Ciência e Tecnologia, under the Projects: UIDB/04466/2020, UIDP/04466/2020, and UIDB/00319/2020
    • …
    corecore