9 research outputs found

    Quantifying Engagement with Citations on Wikipedia

    Get PDF
    Wikipedia, the free online encyclopedia that anyone can edit, is one of the most visited sites on the Web and a common source of information for many users. As an encyclopedia, Wikipedia is not a source of original information, but was conceived as a gateway to secondary sources: according to Wikipedia's guidelines, facts must be backed up by reliable sources that reflect the full spectrum of views on the topic. Although citations lie at the very heart of Wikipedia, little is known about how users interact with them. To close this gap, we built client-side instrumentation for logging all interactions with links leading from English Wikipedia articles to cited references during one month, and conducted the first analysis of readers' interaction with citations on Wikipedia. We find that overall engagement with citations is low: about one in 300 page views results in a reference click (0.29% overall; 0.56% on desktop; 0.13% on mobile). Matched observational studies of the factors associated with reference clicking reveal that clicks occur more frequently on shorter pages and on pages of lower quality, suggesting that references are consulted more commonly when Wikipedia itself does not contain the information sought by the user. Moreover, we observe that recent content, open access sources and references about life events (births, deaths, marriages, etc) are particularly popular. Taken together, our findings open the door to a deeper understanding of Wikipedia's role in a global information economy where reliability is ever less certain, and source attribution ever more vital.Comment: The Web Conference WWW 2020, 10 page

    Differentiating users by language and location estimation in sentiment analisys of informal text during major public events

    Get PDF
    In recent years there has been intense work on the analysis of social media to support marketing campaigns. A proper methodology for sentiment analysis is a crucial asset in this regard. However, when monitoring major public events the behaviour or social media users may be strongly biased by punctual actions of the participating characters and the sense of group belonging, which is typically linked to specific geographical areas. In this paper, we present a solution combining a location prediction methodology with an unsupervised technique for sentiment analysis to assess automatically the engagement of social network users in different countries during an event with worldwide impact. As far as the authors know, this is the first time such techniques are jointly considered. We demonstrate that the technique is coherent with the intrinsic disposition of individual users to typical actions of the characters participating in the events, as well as with the sense of group belonging.Ministerio de EconomĂ­a, Industria y Competitividad | Ref. TEC2016-76465-C2-2-RXunta de Galicia | Ref. GRC2014/046Xunta de Galicia | Ref. ED341D R2016/01

    On the Value of Wikipedia as a Gateway to the Web

    Get PDF
    By linking to external websites, Wikipedia can act as a gateway to the Web. To date, however, little is known about the amount of traffic generated by Wikipedia's external links. We fill this gap in a detailed analysis of usage logs gathered from Wikipedia users' client devices. Our analysis proceeds in three steps: First, we quantify the level of engagement with external links, finding that, in one month, English Wikipedia generated 43M clicks to external websites, in roughly even parts via links in infoboxes, cited references, and article bodies. Official links listed in infoboxes have by far the highest click-through rate (CTR), 2.47% on average. In particular, official links associated with articles about businesses, educational institutions, and websites have the highest CTR, whereas official links associated with articles about geographical content, television, and music have the lowest CTR. Second, we investigate patterns of engagement with external links, finding that Wikipedia frequently serves as a stepping stone between search engines and third-party websites, effectively fulfilling information needs that search engines do not meet. Third, we quantify the hypothetical economic value of the clicks received by external websites from English Wikipedia, by estimating that the respective website owners would need to pay a total of $7--13 million per month to obtain the same volume of traffic via sponsored search. Overall, these findings shed light on Wikipedia's role not only as an important source of information, but also as a high-traffic gateway to the broader Web ecosystem.Comment: The Web Conference WWW 2021, 12 page

    Predicting the impact of online news articles – is information necessary? : application to COVID-19 articles

    Get PDF
    We exploit the Twitter platform to create a dataset of news articles derived from tweets concerning COVID-19, and use the associated tweets to define a number of popularity measures. The focus on (potentially) biomedical news articles allows the quantity of biomedically valid information (as extracted by biomedical relation extraction) to be included in the list of explored features. Aside from forming part of a systematic correlation exploration, the features – ranging from the semantic relations through readability measures to the article’s digital content – are used within a number of machine learning classifier and regression algorithms. Unsurprisingly, the results support that for more complex articles (as determined by a readability measure) more sophisticated syntactic structure may be expected. A weak correlation is found with information within an article suggesting that other factors, such as numbers of videos, have a notable impact on the popularity of a news article. The best popularity prediction performance is obtained using a random forest machine learning algorithm, and the feature describing the quantity of biomedical information is in the top 3 most important features in almost a third of the experiments performed. Additionally, this feature is found to be more valuable than the widely used named entity recognition

    Predicting the impact of online news articles – is information necessary?

    Get PDF
    We exploit the Twitter platform to create a dataset of news articles derived from tweets concerning COVID-19, and use the associated tweets to define a number of popularity measures. The focus on (potentially) biomedical news articles allows the quantity of biomedically valid information (as extracted by biomedical relation extraction) to be included in the list of explored features. Aside from forming part of a systematic correlation exploration, the features – ranging from the semantic relations through readability measures to the article’s digital content – are used within a number of machine learning classifier and regression algorithms. Unsurprisingly, the results support that for more complex articles (as determined by a readability measure) more sophisticated syntactic structure may be expected. A weak correlation is found with information within an article suggesting that other factors, such as numbers of videos, have a notable impact on the popularity of a news article. The best popularity prediction performance is obtained using a random forest machine learning algorithm, and the feature describing the quantity of biomedical information is in the top 3 most important features in almost a third of the experiments performed. Additionally, this feature is found to be more valuable than the widely used named entity recognition

    Predictive Analysis on Twitter: Techniques and Applications

    Full text link
    Predictive analysis of social media data has attracted considerable attention from the research community as well as the business world because of the essential and actionable information it can provide. Over the years, extensive experimentation and analysis for insights have been carried out using Twitter data in various domains such as healthcare, public health, politics, social sciences, and demographics. In this chapter, we discuss techniques, approaches and state-of-the-art applications of predictive analysis of Twitter data. Specifically, we present fine-grained analysis involving aspects such as sentiment, emotion, and the use of domain knowledge in the coarse-grained analysis of Twitter data for making decisions and taking actions, and relate a few success stories

    Extracting keywords from tweets

    Get PDF
    Nos últimos anos, uma enorme quantidade de informações foi disponibilizada na Internet. As redes sociais estão entre as que mais contribuem para esse aumento no volume de dados. O Twitter, em particular, abriu o caminho, enquanto plataforma social, para que pessoas e organizações possam interagir entre si, gerando grandes volumes de dados a partir dos quais é possível extrair informação útil. Uma tal quantidade de dados, permitirá por exemplo, revelar-se importante se e quando, vários indivíduos relatarem sintomas de doença ao mesmo tempo e no mesmo lugar. Processar automaticamente um tal volume de informações e obter a partir dele conhecimento útil, torna-se, no entanto, uma tarefa impossível para qualquer ser humano. Os extratores de palavras-chave surgem neste contexto como uma ferramenta valiosa que visa facilitar este trabalho, ao permitir, de uma forma rápida, ter acesso a um conjunto de termos caracterizadores do documento. Neste trabalho, tentamos contribuir para um melhor entendimento deste problema, avaliando a eficácia do YAKE (um algoritmo de extração de palavras-chave não supervisionado) em cima de um conjunto de tweets, um tipo de texto, caracterizado não só pelo seu reduzido tamanho, mas também pela sua natureza não estruturada. Embora os extratores de palavras-chave tenham sido amplamente aplicados a textos genéricos, como a relatórios, artigos, entre outros, a sua aplicabilidade em tweets é escassa e até ao momento não foi disponibilizado formalmente nenhum conjunto de dados. Neste trabalho e por forma a contornar esse problema optámos por desenvolver e tornar disponível uma nova coleção de dados, um importante contributo para que a comunidade científica promova novas soluções neste domínio. O KWTweet foi anotado por 15 anotadores e resultou em 7736 tweets anotados. Com base nesta informação, pudemos posteriormente avaliar a eficácia do YAKE! contra 9 baselines de extração de palavra-chave não supervisionados (TextRank, KP-Miner, SingleRank, PositionRank, TopicPageRank, MultipartiteRank, TopicRank, Rake e TF.IDF). Os resultados obtidos demonstram que o YAKE! tem um desempenho superior quando comparado com os seus competidores, provando-se assim a sua eficácia neste tipo de textos. Por fim, disponibilizamos uma demo que visa demonstrar o funcionamento do YAKE! Nesta plataforma web, os utilizadores têm a possibilidade de fazer uma pesquisa por utilizador ou hashtag e dessa forma obter as palavras chave mais relevantes através de uma nuvem de palavra

    Predicting User Engagement on Twitter with Real-World Events

    No full text
    People invest time, attention, and emotion while engaging in various activities in the real-world, for either purposes of awareness or participation. Social media platforms such as Twitter offer tremendous opportunities for people to become engaged in such real-world events through information sharing and communicating about these events. However, little is understood about the factors that affect people's Twitter engagement in such real-world events. In this paper, we address this question by first operationalizing a person's Twitter engagement in real-world events such as posting, retweeting, or replying to tweets about such events. Next, we construct statistical models that examine multiple predictive factors associated with four different perspectives of users' Twitter engagement, and quantify their potential influence on predicting the (i) presence; and (ii) degree — of the user's engagement with 643 real-world events. We also consider the effect of these factors with respect to a finer granularization of the different categories of events. We find that the measures of people's prior Twitter activities, topical interests, geolocation, and social network structures are all variously correlated to their engagement with real-world events

    The Web of Corruption:A Tardean Analysis of the Shifting Constructions of the Elios Scandal in the Hungarian Online News Media

    Get PDF
    Although corruption portrayals within the news media have become a regularly analysed topic in Organisation and Management Studies, the construction of scandals within the online realm is still under-researched. Organisational scholars call for studies to analyse corruption in online media due to the highly participatory sense- making processes that distinguish this context from traditional press. Analysing scandalisation online is important because interactions in this realm define and curb corruption.This thesis responds to these points by exploring the co-production of corruption scandals within online news articles as occurring through narrative developments and hyperlink relations. To address the processual and participatory aspects of online corruption scandalisation, it engages with the theories of Gabriel Tarde. Particularly, the Tardean lens allows this thesis to analyse articles with their embedded hyperlinks as sense-making crossroads of information flows that accumulate into the rhythmical meanderings of scandal narratives.Empirically, the thesis focuses on the Hungarian organisational and political Elios scandal. It investigates the articles of the news outlets of Origo and Index, and their hyperlinks. Thematic analysis is used for studying the textual data, and argumentation analysis for the hyperlink interactions.This results in the identification of three narrative-construction periods: (1) scandalisation, (2) anti-scandalisation and moderation, and (3) counter- scandalisation. The thesis shows that hyperlinks play an important role in these meaning constructions. On the one hand, hyperlinks represent online sense-making channels, leading to reliable and relevant sources. However, through the avoidance of hyperlinking opposing arguments, these contribute to one-sided, meaning- constructions. Furthermore, the thesis demonstrates how the corruption scandal is gradually diverted and replaced with the sensationalist counter-scandalising Soros- narrative that provokes social currents, such as Antisemitism. Overall, this thesis contributes to the literature on corruption within the media by illustrating how hyperlinks and gradual narrative-developments are strategically used to shape the meaning-constructions around scandals
    corecore