3,628 research outputs found

    Wisdom of the Crowd Vs Reviews of the Experts: A Case Study Regarding Predicting Movie Box-Office Results

    Get PDF
    Teadlased on aastakümneid tegelenud filmide kassatulu ennustamisega, sest iga aasta linastub suur hulk teoseid, mille tulemused üllatavad nende rahastajaid kas heal või halval viisil, sõltuvalt esialgsetest prognoosidest. Eelnevad uurimustööd on avaldanud vastakaid tulemusi filmikriitikute arvustuste kasutamise kohta filmide kassatulu ennustamiseks. Niisamuti on kaasatud sotsiaalmeedia ühe võimaliku andmeallikana filmide müügiedu prognoosimiseks. Käesolevas töös uuritakse, milline neist kahest erinäolisest allikast on kasulikum ennustamaks parema täpsusega filmide kasumlikkust. Uuritavateks andmeteks oleme kogunud viimase kolme aasta jooksul linastunud Hollywoodi ja Bollywoodi filmid, mis on erineva geograafilise asukoha ning kultuurilise taustaga. Kollektiivse tarkuse näitena uurime sotsiaalvõrgustiku Twitteri andmeid ning võrdleme neid filmikriitikute arvustustega Hollywoodi ning Bollywoodi filmiportaalidest Metacritic ja SahiNahi. Kaasame mitmeid erinevaid tunnuseid ning rakendame erinevaid masinõppe algoritme ennustusmudelite ehitamiseks. Meie vaatluste tulemused näitavad, et võrreldes filmikriitikute eksperthinnangutega pole kollektiivsete teadmiste abil võimalik filmide kassatulu paremini ennustada ega vastupidi.Predicting movie sales figures has been a topic of interest for research for decades since every year there are dozens of movies which surprise investors either in a good or bad way depending on how well the film performs at the box-office compared to the initial expectations. There have been past studies reporting mixed results on using movie critics reviews as one of the sources of information for predicting the movie box-office outcomes. Similarly using social media as a predictor of movie success has been a popular research topic. In this thesis, we perform a case study to evaluate out of two – the (wisdom of the) crowd or the movie critics reviews, which one can predict the outcome of the movies more accurately. We analyze the Hollywood and Bollywood movies from the last three years, which belong to two different geo as well as cultural locations. We used Twitter for collecting the wisdom of the crowd and used movie critics review scores from movie review aggregator sites Metacritic and SahiNahi for Hollywood and Bollywood movies respectively. To perform our evaluation, we extracted various features and used them to build prediction models using different machine learning algorithms. After measuring the performance of prediction models using features from both Twitter and movie critic reviews, we did not find conclusive evidence to declare a clear-cut winner

    Methodologies in Predictive Visual Analytics

    Get PDF
    abstract: Predictive analytics embraces an extensive area of techniques from statistical modeling to machine learning to data mining and is applied in business intelligence, public health, disaster management and response, and many other fields. To date, visualization has been broadly used to support tasks in the predictive analytics pipeline under the underlying assumption that a human-in-the-loop can aid the analysis by integrating domain knowledge that might not be broadly captured by the system. Primary uses of visualization in the predictive analytics pipeline have focused on data cleaning, exploratory analysis, and diagnostics. More recently, numerous visual analytics systems for feature selection, incremental learning, and various prediction tasks have been proposed to support the growing use of complex models, agent-specific optimization, and comprehensive model comparison and result exploration. Such work is being driven by advances in interactive machine learning and the desire of end-users to understand and engage with the modeling process. However, despite the numerous and promising applications of visual analytics to predictive analytics tasks, work to assess the effectiveness of predictive visual analytics is lacking. This thesis studies the current methodologies in predictive visual analytics. It first defines the scope of predictive analytics and presents a predictive visual analytics (PVA) pipeline. Following the proposed pipeline, a predictive visual analytics framework is developed to be used to explore under what circumstances a human-in-the-loop prediction process is most effective. This framework combines sentiment analysis, feature selection mechanisms, similarity comparisons and model cross-validation through a variety of interactive visualizations to support analysts in model building and prediction. To test the proposed framework, an instantiation for movie box-office prediction is developed and evaluated. Results from small-scale user studies are presented and discussed, and a generalized user study is carried out to assess the role of predictive visual analytics under a movie box-office prediction scenario.Dissertation/ThesisDoctoral Dissertation Engineering 201

    Methodologies in Predictive Visual Analytics

    Get PDF
    abstract: Predictive analytics embraces an extensive area of techniques from statistical modeling to machine learning to data mining and is applied in business intelligence, public health, disaster management and response, and many other fields. To date, visualization has been broadly used to support tasks in the predictive analytics pipeline under the underlying assumption that a human-in-the-loop can aid the analysis by integrating domain knowledge that might not be broadly captured by the system. Primary uses of visualization in the predictive analytics pipeline have focused on data cleaning, exploratory analysis, and diagnostics. More recently, numerous visual analytics systems for feature selection, incremental learning, and various prediction tasks have been proposed to support the growing use of complex models, agent-specific optimization, and comprehensive model comparison and result exploration. Such work is being driven by advances in interactive machine learning and the desire of end-users to understand and engage with the modeling process. However, despite the numerous and promising applications of visual analytics to predictive analytics tasks, work to assess the effectiveness of predictive visual analytics is lacking. This thesis studies the current methodologies in predictive visual analytics. It first defines the scope of predictive analytics and presents a predictive visual analytics (PVA) pipeline. Following the proposed pipeline, a predictive visual analytics framework is developed to be used to explore under what circumstances a human-in-the-loop prediction process is most effective. This framework combines sentiment analysis, feature selection mechanisms, similarity comparisons and model cross-validation through a variety of interactive visualizations to support analysts in model building and prediction. To test the proposed framework, an instantiation for movie box-office prediction is developed and evaluated. Results from small-scale user studies are presented and discussed, and a generalized user study is carried out to assess the role of predictive visual analytics under a movie box-office prediction scenario.Dissertation/ThesisDoctoral Dissertation Engineering 201

    SubjQA: A Dataset for Subjectivity and Review Comprehension

    Full text link
    Subjectivity is the expression of internal opinions or beliefs which cannot be objectively observed or verified, and has been shown to be important for sentiment analysis and word-sense disambiguation. Furthermore, subjectivity is an important aspect of user-generated data. In spite of this, subjectivity has not been investigated in contexts where such data is widespread, such as in question answering (QA). We therefore investigate the relationship between subjectivity and QA, while developing a new dataset. We compare and contrast with analyses from previous work, and verify that findings regarding subjectivity still hold when using recently developed NLP architectures. We find that subjectivity is also an important feature in the case of QA, albeit with more intricate interactions between subjectivity and QA performance. For instance, a subjective question may or may not be associated with a subjective answer. We release an English QA dataset (SubjQA) based on customer reviews, containing subjectivity annotations for questions and answer spans across 6 distinct domains.Comment: EMNLP 2020 Long Paper - Camera Read

    Social-media monitoring for cold-start recommendations

    Get PDF
    Generating personalized movie recommendations to users is a problem that most commonly relies on user-movie ratings. These ratings are generally used either to understand the user preferences or to recommend movies that users with similar rating patterns have rated highly. However, movie recommenders are often subject to the Cold-Start problem: new movies have not been rated by anyone, so, they will not be recommended to anyone; likewise, the preferences of new users who have not rated any movie cannot be learned. In parallel, Social-Media platforms, such as Twitter, collect great amounts of user feedback on movies, as these are very popular nowadays. This thesis proposes to explore feedback shared on Twitter to predict the popularity of new movies and show how it can be used to tackle the Cold-Start problem. It also proposes, at a finer grain, to explore the reputation of directors and actors on IMDb to tackle the Cold-Start problem. To assess these aspects, a Reputation-enhanced Recommendation Algorithm is implemented and evaluated on a crawled IMDb dataset with previous user ratings of old movies,together with Twitter data crawled from January 2014 to March 2014, to recommend 60 movies affected by the Cold-Start problem. Twitter revealed to be a strong reputation predictor, and the Reputation-enhanced Recommendation Algorithm improved over several baseline methods. Additionally, the algorithm also proved to be useful when recommending movies in an extreme Cold-Start scenario, where both new movies and users are affected by the Cold-Start problem
    corecore