99 research outputs found

    Detecting Spoilers in Movie Reviews with External Movie Knowledge and User Networks

    Full text link
    Online movie review platforms are providing crowdsourced feedback for the film industry and the general public, while spoiler reviews greatly compromise user experience. Although preliminary research efforts were made to automatically identify spoilers, they merely focus on the review content itself, while robust spoiler detection requires putting the review into the context of facts and knowledge regarding movies, user behavior on film review platforms, and more. In light of these challenges, we first curate a large-scale network-based spoiler detection dataset LCS and a comprehensive and up-to-date movie knowledge base UKM. We then propose MVSD, a novel Multi-View Spoiler Detection framework that takes into account the external knowledge about movies and user activities on movie review platforms. Specifically, MVSD constructs three interconnecting heterogeneous information networks to model diverse data sources and their multi-view attributes, while we design and employ a novel heterogeneous graph neural network architecture for spoiler detection as node-level classification. Extensive experiments demonstrate that MVSD advances the state-of-the-art on two spoiler detection datasets, while the introduction of external knowledge and user interactions help ground robust spoiler detection. Our data and code are available at https://github.com/Arthur-Heng/Spoiler-DetectionComment: EMNLP 202

    #Languagemixing on Twitter

    Get PDF
    The influence of the English language on the world stage is such that it now constitutes a kind of global Lingua Franca. As such, English has supplanted French as the language of diplomacy, of culture, and of social prestige. This role reversal entails some residual opposition in France, and in consequence, the use of English expressions and vocabulary by French continues to be a controversial subject in France, as it has been for decades. Regulations are still being implemented to control the French language. Nowadays, social media has been an important tool in our society. Twitter has become a popular means of communication used in a variety of fields, such as politics, journalism, and academia. This widely used online platform has an impact on the way people express themselves and is changing language usage worldwide at an unprecedented pace. The language used online reflects the linguistic battle that has been going on for several decades in French society today. In my dissertation, I investigate the factors prompting the use of English and French language mixing on Twitter in France. The use of acronyms, hashtags as well as another language may be used as strategies to reach a wider audience. The need for visibility and audience maximization seem to be important factors for linguistic choice on Twitter. This study enables a deeper understanding of users' linguistic behavior online. The implications are important and allow for a rise in awareness of intercultural and cross-language exchanges.Includes bibliographical reference

    Hyperlink-extended pseudo relevance feedback for improved microblog retrieval

    Get PDF
    Microblog retrieval has received much attention in recent years due to the wide spread of social microblogging platforms such as Twitter. The main motive behind microblog retrieval is to serve users searching a big collection of microblogs a list of relevant documents (microblogs) matching their search needs. What makes microblog retrieval different from normal web retrieval is the short length of the user queries and the documents that you search in, which leads to a big vocabulary mismatch problem. Many research studies investigated different approaches for microblog retrieval. Query expansion is one of the approaches that showed stable performance for improving microblog retrieval effectiveness. Query expansion is used mainly to overcome the vocabulary mismatch problem between user queries and short relevant documents. In our work, we investigate existing query expansion method (Pseudo Relevance Feedback - PRF) comprehensively, and propose an extension using the information from hyperlinks attached to the top relevant documents. Our experimental results on TREC microblog data showed that Pseudo Relevance Feedback (PRF) alone could outperform many retrieval approaches if configured properly. We showed that combining the expansion terms with the original query by a weight, not to dilute the effect of the original query, could lead to superior results. The weighted combine of the expansion terms is different than what is commonly used in the literature by appending the expansion terms to the original query without weighting. We experimented using different weighting schemes, and empirically found that assigning a small weight for the expansion terms 0.2, and 0.8 for the original query performs the best for the three evaluation sets 2011, 2012, and 2013. We applied the previous weighting scheme to the most reported PRF configuration used in the literature and measured the retrieval performance. The P@30 performance achieved using our weighting scheme was 0.485, 0.4136, and 0.4811 compared to 0.4585, 0.3548, and 0.3861 without applying weighting for the three evaluation sets 2011, 2012 and 2013 respectively. The MAP performance achieved using our weighting scheme was 0.4386, 0.2845, and 0.3262 compared to 0.3592, 0.2074, and 0.2256 without applying weighting for the three evaluation sets 2011, 2012 and 2013 respectively. Results also showed that utilizing hyperlinked documents attached to the top relevant tweets in query expansion improves the results over traditional PRF. By utilizing hyperlinked documents in the query expansion our best runs achieved 0.5000, 0.4339, and 0.5546 P@30 compared to 0.4864, 0.4203, and 0.5322 when applying traditional PRF, and 0.4587, 0.3044, and 0.3584 MAP when applying traditional PRF compared to 0.4405, 0.2850, and 0.3492 when utilizing the hyperlinked document contents (using web page titles, and meta-descriptions) for the three evaluation sets 2011, 2012 and 2013 respectively. We explored different types of information extracted from the hyperlinked documents; we show that using the document titles and meta-descriptions helps in improving the retrieval performance the most. On the other hand, using the meta- keywords degraded the retrieval performance. For the test set released in 2013, using our hyperlinked-extended approach achieved the best improvement over the PRF baseline, 0.5546 P@30 compared to 0.5322 and 0.3584 MAP compared to 0.3492. For the test sets released in 2011 and 2012 we got less improvements over PRF, 0.5000, 0.4339 P@30 compared to 0.4864, 0.4203, and 0.4587, 0.3044 MAP compared to 0.4405, 0.2850. We showed that this behavior was due to the age of the collection, where a lot of hyperlinked documents were taken down or moved and we couldn\u27t get their information. Our best results achieved using hyperlink-extended PRF achieved statistically significant improvements over the traditional PRF for the test sets released in 2011, and 2013 using paired t-test with p-value \u3c 0.05. Moreover, our proposed approach outperformed the best results reported at TREC microblog track for the years 2011, and 2013, which applied more sophisticated algorithms. Our proposed approach achieved 0.5000, 0.5546 P@30 compared to 0.4551, 0.5528 achieved by the best runs in TREC, and 0.4587, 0.3584 MAP compared to 0.3350, 0.3524 for the evaluation sets of 2011 and 2013 respectively. The main contributions of our work can be listed as follows: 1. Providing a comprehensive study for the usage of traditional PRF with microblog retrieval using various configurations. 2. Introducing a hyperlink-based PRF approach for microblog retrieval by utilizing hyperlinks embedded in initially retrieved tweets, which showed a significant improvement to retrieval effectiveness

    Did you watch #TheWalkingDead last night? an examination of television hashtags and Twitter activity

    Get PDF
    This study examined on-screen hashtags and Twitter activity associated with four television programs (The Walking Dead, Pretty Little Liars, Scandal and Hannibal). Twitter facilitates real time discussions, allowing “water cooler conversations” about television to occur while shows air live. Hashtags organize these conversations around topics of interest. Active viewers will migrate to new media sources, searching for additional content that interests them. The act of complementarity increases their level of media enjoyment. The desire for this additional content dictates the viewer’s behavior. Network producers also promote media convergence, utilizing websites and social media to build word of mouth advertising for their programs. The combination of an abundance of exceptional programs and producer-driven media convergence might be causing viewers to feel a stronger urge to migrate to new media. A content analysis was conducted on three episodes per program, noting the use of any on-screen hashtags. Next, Twitter activity information was pulled using analytics software Radian6. Various comparisons were made, such as the number of mentions of title-based hashtags versus plot-related hashtags and cable versus network program hashtags. An analysis of hashtag characteristics (such as the hashtag screen location and the length of screen time it received) provided information on how networks are currently utilizing hashtags on-screen, and how audiences are using these hashtags in their Twitter conversations. Networks are placing a higher value on audience engagement. They are mining online data to improve their understanding of how existing viewers are reacting to their shows. The upcoming Nielsen and Twitter partnership will incorporate engagement in a new television rating. By understanding how viewers use sites like Twitter and Tumblr, networks can fine tune their dialogue with viewers

    Subjectivity Analysis In Opinion Mining - A Systematic Literature Review

    Get PDF
    Subjectivity analysis determines existence of subjectivity in text using subjective clues.It is the first task in opinion mining process.The difference between subjectivity analysis and polarity determination is the latter process subjective text to determine the orientation as positive or negative.There were many techniques used to solve the problem of segregating subjective and objective text.This paper used systematic literature review (SLR) to compile the undertaking study in subjective analysis.SLR is a literature review that collects multiple and critically analyse multiple studies to answer the research questions.Eight research questions were drawn for this purpose.Information such as technique,corpus,subjective clues representation and performance were extracted from 97 articles known as primary studies.This information was analysed to identify the strengths and weaknesses of the technique,affecting elements to the performance and missing elements from the subjectivity analysis.The SLR has found that majority of the study are using machine learning approach to identify and learn subjective text due to the nature of subjectivity analysis problem that is viewed as classification problem.The performance of this approach outperformed other approaches though currently it is at satisfactory level.Therefore,more studies are needed to improve the performance of subjectivity analysis

    Netprov

    Get PDF
    Netprov is an emerging interdisciplinary digital art form that offers a literature-based “show” of insightful, healing satire that is as deep as the novels of the past. This accessible history of Netprov emerges out of an ongoing conversation about the changing roles and power dynamics of author and reader in an age of real-time interactivity. Rob Wittig describes a literary genre in which all the world is a platform and all participants are players. Beyond serving as a history of the genre, this book includes tips and examples to help those new to the genre teach and create netprovs

    Netprov

    Get PDF
    Netprov is an emerging interdisciplinary digital art form that offers a literature-based “show” of insightful, healing satire that is as deep as the novels of the past. This accessible history of Netprov emerges out of an ongoing conversation about the changing roles and power dynamics of author and reader in an age of real-time interactivity. Rob Wittig describes a literary genre in which all the world is a platform and all participants are players. Beyond serving as a history of the genre, this book includes tips and examples to help those new to the genre teach and create netprovs
    corecore