2,402 research outputs found

    Estimating tourism statistics with Wikipedia page views

    Get PDF
    Decision makers depend on socio-economic indicators to shape the world we inhabit. Reports of these indicators are often delayed due to the effort involved in gathering and aggregating the underlying data. Our increasing interactions with large scale technological systems are generating vast datasets on global human behaviour which are immediately accessible. Here we analyse whether data on how often people view Wikipedia articles might help us to improve estimates of the current number of tourists leaving the UK. Our analyses suggest that in the absence of sufficient history, Wikipedia page views provide an advantage. We conclude that when using adaptive models, Wikipedia usage opens up the possibility to improve estimates of tourism demand

    On the Value of Wikipedia as a Gateway to the Web

    Get PDF
    By linking to external websites, Wikipedia can act as a gateway to the Web. To date, however, little is known about the amount of traffic generated by Wikipedia's external links. We fill this gap in a detailed analysis of usage logs gathered from Wikipedia users' client devices. Our analysis proceeds in three steps: First, we quantify the level of engagement with external links, finding that, in one month, English Wikipedia generated 43M clicks to external websites, in roughly even parts via links in infoboxes, cited references, and article bodies. Official links listed in infoboxes have by far the highest click-through rate (CTR), 2.47% on average. In particular, official links associated with articles about businesses, educational institutions, and websites have the highest CTR, whereas official links associated with articles about geographical content, television, and music have the lowest CTR. Second, we investigate patterns of engagement with external links, finding that Wikipedia frequently serves as a stepping stone between search engines and third-party websites, effectively fulfilling information needs that search engines do not meet. Third, we quantify the hypothetical economic value of the clicks received by external websites from English Wikipedia, by estimating that the respective website owners would need to pay a total of $7--13 million per month to obtain the same volume of traffic via sponsored search. Overall, these findings shed light on Wikipedia's role not only as an important source of information, but also as a high-traffic gateway to the broader Web ecosystem.Comment: The Web Conference WWW 2021, 12 page

    The Impact of 9/11 and Other Terrible Global Events on Tourism in the U.S. and Hawaii

    Get PDF
    This paper reviews recent trends in travel and tourism in the U.S. and Hawaii to ascertain how the terrorist attacks of 9/11 and subsequent terrible global events affected their tourism flows and the manner and pace of their recovery. We note that tourism in the U.S. has not fully recovered from 9/11 and other international shocks; indeed recovery of international travel to the U.S. may be a long way off. By contrast, Hawaii tourism is enjoying robust growth in the aftermath of 9/11 as growth in tourist arrivals from the U.S. mainland has more than offset declines in Japanese and other international visitors. We suggest that Hawaii's current tourism boom is in part explained by the diversion of U.S. travel from foreign travel. The paper demonstrates the usefulness of vector error correction models to generate dynamic visitor forecasts which we use to ascertain whether tourism in Hawaii has fully recovered from 9/11 and other terrible international events. The paper considers policy options for facilitating the recovery of international tourism to the U.S.

    Demographic Inference and Representative Population Estimates from Multilingual Social Media Data

    Get PDF
    Social media provide access to behavioural data at an unprecedented scale and granularity. However, using these data to understand phenomena in a broader population is difficult due to their non-representativeness and the bias of statistical inference tools towards dominant languages and groups. While demographic attribute inference could be used to mitigate such bias, current techniques are almost entirely monolingual and fail to work in a global environment. We address these challenges by combining multilingual demographic inference with post-stratification to create a more representative population sample. To learn demographic attributes, we create a new multimodal deep neural architecture for joint classification of age, gender, and organization-status of social media users that operates in 32 languages. This method substantially outperforms current state of the art while also reducing algorithmic bias. To correct for sampling biases, we propose fully interpretable multilevel regression methods that estimate inclusion probabilities from inferred joint population counts and ground-truth population counts. In a large experiment over multilingual heterogeneous European regions, we show that our demographic inference and bias correction together allow for more accurate estimates of populations and make a significant step towards representative social sensing in downstream applications with multilingual social media.Comment: 12 pages, 10 figures, Proceedings of the 2019 World Wide Web Conference (WWW '19

    Estimating socioeconomic indicators using online data

    Get PDF
    Policymakers and businesses need a good understanding of the current state of society to make fully informed decisions. In contrast to traditional approaches to measuring human behaviour, which can be expensive, time-consuming and subject to delay, data on collective online behaviour, such as what people are searching for on Google, is available publicly, rapidly and at low cost. Studies into online behaviour may therefore be able to provide useful insights into collective human behaviour in the real world. Here, we investigate whether online data from social media platforms, such as Instagram and Twitter, and search engine data, specifically data from Google, can help estimate key characteristics of society. In particular, we seek to infer the number of people speaking various languages across different urban areas based on publicly exchanged messages on the photo-sharing platform Instagram. We find that such data can help estimate the spatial distribution of language usage in Greater London. In a parallel analysis, we investigate whether Twitter data is similarly useful. However, our results suggest that data from Instagram is more valuable, as a higher number of posts to the service contain location data. We also investigate whether online data can be used to help estimate economic activity. Specifically, we focus on unemployment rates in the United Kingdom and draw on data retrieved from Google Trends. Our findings reveal that Google search data can help generate quicker estimates of the current level of unemployment before official data is released. We also find that, according to some performance metrics, a variable selection technique based on an elastic net can improve model performance. This thesis highlights the potential for inferences generated from online data to complement official statistics, for example by providing quicker estimates before official figures are released. We suggest that rapid, low-cost measurements of collective human behaviour from publicly available data may provide valuable new insights for policymakers and businesses alike

    Defying easy categorization: Wikipedia as primary, secondary and tertiary resource

    Get PDF
    Wikipedia is the world’s largest information source, used daily by millions of individuals around the world – yet such is its uniqueness and dominance that rarely is the question asked: what exactly is Wikipedia? This article sets out to explore the different categories of source that Wikipedia could be defined as (primary, secondary or tertiary) alongside the varied ways in which Wikipedia is used, which defy easy categorization, exemplified by a broad-ranging literature review and focusing on the English language Wikipedia. It concludes that Wikipedia cannot easily be categorized in any information category but is defined instead by the ways it is used and interpreted by its users

    Wikipedia in the eyes of its beholders: A systematic review of scholarly research on wikipedia readers and readership

    Get PDF
    Hundreds of scholarly studies have investigated various aspects of the immensely popular Wikipedia. Although a number of literature reviews have provided overviews of this vast body of research, none of them has specifically focused on the readers of Wikipedia and issues concerning its readership. In this systematic literature review, we review 99 studies to synthesize current knowledge regarding the readership of Wikipedia and also provide an analysis of research methods employed. The scholarly research has found that Wikipedia is popular not only for lighter topics such as entertainment, but also for more serious topics such as health information and legal background. Scholars, librarians and students are common users of Wikipedia, and it provides a unique opportunity for educating students in digital literacy. We conclude with a summary of key findings, implications for researchers, and implications for the Wikipedia community
    • …
    corecore