59 research outputs found

    Sharing Means Renting?: An Entire-marketplace Analysis of Airbnb

    Full text link
    Airbnb, an online marketplace for accommodations, has experienced a staggering growth accompanied by intense debates and scattered regulations around the world. Current discourses, however, are largely focused on opinions rather than empirical evidences. Here, we aim to bridge this gap by presenting the first large-scale measurement study on Airbnb, using a crawled data set containing 2.3 million listings, 1.3 million hosts, and 19.3 million reviews. We measure several key characteristics at the heart of the ongoing debate and the sharing economy. Among others, we find that Airbnb has reached a global yet heterogeneous coverage. The majority of its listings across many countries are entire homes, suggesting that Airbnb is actually more like a rental marketplace rather than a spare-room sharing platform. Analysis on star-ratings reveals that there is a bias toward positive ratings, amplified by a bias toward using positive words in reviews. The extent of such bias is greater than Yelp reviews, which were already shown to exhibit a positive bias. We investigate a key issue---commercial hosts who own multiple listings on Airbnb---repeatedly discussed in the current debate. We find that their existence is prevalent, they are early-movers towards joining Airbnb, and their listings are disproportionately entire homes and located in the US. Our work advances the current understanding of how Airbnb is being used and may serve as an independent and empirical reference to inform the debate.Comment: WebSci '1

    A web observatory for the machine processability of structured data on the web

    Get PDF
    General human intelligence is needed in order to process Linked Open Data (LOD). On the Semantic Web (SW), content is intended to be machine-processable as well. But the extent to which a machine is able to navigate, access, and process the SW has not been extensively researched. We present LOD Observer, a web observatory that studies the Web from a machine processor's point of view. We do this by reformulating the five star model of LOD publishing in quantifiable terms. Secondly, we built an infrastructure that allows the model's criteria to be quantified over existing datasets. Thirdly, we analyze a significant snapshot of the LOD cloud using this infrastructure and discuss the main problems a machine processor encounters

    Evolution of Online User Behavior During a Social Upheaval

    Full text link
    Social media represent powerful tools of mass communication and information diffusion. They played a pivotal role during recent social uprisings and political mobilizations across the world. Here we present a study of the Gezi Park movement in Turkey through the lens of Twitter. We analyze over 2.3 million tweets produced during the 25 days of protest occurred between May and June 2013. We first characterize the spatio-temporal nature of the conversation about the Gezi Park demonstrations, showing that similarity in trends of discussion mirrors geographic cues. We then describe the characteristics of the users involved in this conversation and what roles they played. We study how roles and individual influence evolved during the period of the upheaval. This analysis reveals that the conversation becomes more democratic as events unfold, with a redistribution of influence over time in the user population. We conclude by observing how the online and offline worlds are tightly intertwined, showing that exogenous events, such as political speeches or police actions, affect social media conversations and trigger changes in individual behavior.Comment: Best Paper Award at ACM Web Science 201

    Bigger Isn’t Better: The Ethical and Scientific Vices of Extra-Large Datasets in Language Models

    Get PDF
    The use of language models in Web applications and other areas of computing and business have grown significantly over the last five years. One reason for this growth is the improvement in performance of language models on a number of benchmarks — but a side effect of these advances has been the adoption of a “bigger is always better” paradigm when it comes to the size of training, testing, and challenge datasets. Drawing on previous criticisms of this paradigm as applied to large training datasets crawled from pre-existing text on the Web, we extend the critique to challenge datasets custom-created by crowdworkers. We present several sets of criticisms, where ethical and scientific issues in language model research reinforce each other: labour injustices in crowdwork, dataset quality and inscrutability, inequities in the research community, and centralized corporate control of the technology. We also present a new type of tool for researchers to use in examining large datasets when evaluating them for quality

    Some challenges for the web observatory vision: field notes from a southampton-tsinghua-kaist collaboration

    No full text
    This paper outlines some challenges for the Web Observatory vision with reference to field notes from a student exchange and research collaboration in December 2013 between the University of Southampton, Tsinghua University and KAIST. These field notes outline a methodological narrative of the practical challenges that we faced in using the Web Observatory in collaborative research. It is suggested that these challenges particularly come in the form of technical, organizational and legal issues. The paper concludes with some proposals for the future of the Web Observatory vision

    Text mining school inspection reports in England with R

    No full text

    What people study when they study Tumblr:Classifying Tumblr-related academic research

    Get PDF
    Purpose: Since its launch in 2007, research has been carried out on the popular social networking website Tumblr. This paper identifies published Tumblr based research, classifies it to understand approaches and methods, and provides methodological recommendations for others. / Design/methodology/approach: Research regarding Tumblr was identified. Following a review of the literature, a classification scheme was adapted and applied, to understand research focus. Papers were quantitatively classified using open coded content analysis of method, subject, approach, and topic. / Findings: The majority of published work relating to Tumblr concentrates on conceptual issues, followed by aspects of the messages sent. This has evolved over time. Perceived benefits are the platform’s long-form text posts, ability to track tags, and the multimodal nature of the platform. Severe research limitations are caused by the lack of demographic, geo-spatial, and temporal metadata attached to individual posts, the limited API, restricted access to data, and the large amounts of ephemeral posts on the site. / Research limitations/implications: This study focuses on Tumblr: the applicability of the approach to other media is not considered. We focus on published research and conference papers: there will be book content which was not found using our method. Tumblr as a platform has falling user numbers which may be of concern to researchers. / Practical implications: We identify practical barriers to research on the Tumblr platform including lack of metadata and access to big data, explaining why Tumblr is not as popular as Twitter in academic studies. - Social implications This paper highlights the breadth of topics covered by social media researchers, which allows us to understand popular online platforms. / Originality/value: There has not yet been an overarching study to look at the methods and purpose of those who study Tumblr. We identify Tumblr related research papers from the first appearing in July 2011 until July 2015. Our classification derived here provides a framework that can be used to analyse social media research, and in which to position Tumblr related work, with recommendations on benefits and limitations of the platform for researchers
    • 

    corecore