97 research outputs found

    Understanding Social Media through Large Volume Measurements

    Get PDF
    The amount of user-generated web content has grown drastically in the past 15 years and many social media services are exceedingly popular nowadays. In this thesis we study social media content creation and consumption through large volume measurements of three prominent social media services, namely Twitter, YouTube, and Wikipedia. Common to the services is that they have millions of users, they are free to use, and the users of the services can both create and consume content. The motivation behind this thesis is to examine how users create and consume social media content, investigate why social media services are as popular as they are, what drives people to contribute on them, and see if it is possible to model the conduct of the users. We study how various aspects of social media content be that for example its creation and consumption or its popularity can be measured, characterized, and linked to real world occurrences. We have gathered more than 20 million tweets, metadata of more than 10 million YouTube videos and a complete six-year page view history of 19 different Wikipedia language editions. We show, for example, daily and hourly patterns for the content creation and consumption, content popularity distributions, characteristics of popular content, and user statistics. We will also compare social media with traditional news services and show the interaction with social media, news, and stock prices. In addition, we combine natural language processing with social media analysis, and discover interesting correlations between news and social media content. Moreover, we discuss the importance of correct measurement methods and show the effects of different sampling methods using YouTube measurements as an example.Sosiaalisen median suosio ja sen käyttäjien luoman sisällön määrä on kasvanut valtavasti viimeisen 15 vuoden aikana ja palvelut kuten Facebook, Instagram, Twitter, YouTube ja Wikipedia ovat erittäin suosittuja. Tässä väitöskirjassa tarkastellaan sosiaalisen median sisällön luonti- ja kulutusmalleja laajavoluumisen mittausdatan kautta. Väitöskirja sisältää mittausdataa Twitter-, YouTube- ja Wikipedia -palveluista. Yhteistä näille kolmelle palvelulle on muuan muassa se, että niillä on miljoonia käyttäjiä, niitä voi käyttää maksutta ja käyttäjät voivat luoda sekä kuluttaa sisältöä. Mittausdata sisältää yli 20 miljoona Twitter -viestiä, metadatatietoja yli kymmenestä miljoonasta YouTube -videosta ja täydellisen artikkelien katselukertojen tiedot kuudelta vuodelta 19 eri Wikipedian kieliversiosta. Tutkimuksen tarkoituksena on tarkastella kuinka käyttäjät luovat ja kuluttavat sisältöä sekä löytää niihin liittyviä malleja, joita voi hyödyntää tiedon jaossa, replikoinnissa ja tallentamisessa. Tutkimuksessa pyritään siis selvittämään miksi miksi sosiaalisen median palvelut ovat niin suosittuja kuin ne nyt ovat, mikä saa käyttäjät tuottamaan sisältöä niihin ja onko palveluiden käyttöä mahdollista mallintaa ja ennakoida. Väitöskirjassa verrataan myös sosiaalisen median ja tavallisten uutispalveluiden luonti- ja kulutusmalleja. Lisäksi näytetään kuinka sosiaalisen median sisältö, uutiset ja pörssikurssi hinnat ovat vuorovaikutuksessa toisiinsa. Väitöskirja sisältää myös pohdintaa oikean mittausmenetelmän valinnasta ja käyttämisestä sekä näytetään eri mittausmenetelmien vaikutuksista tuloksiin YouTube -mittausdatan avulla

    Exploiting Cross-Lingual Representations For Natural Language Processing

    Get PDF
    Traditional approaches to supervised learning require a generous amount of labeled data for good generalization. While such annotation-heavy approaches have proven useful for some Natural Language Processing (NLP) tasks in high-resource languages (like English), they are unlikely to scale to languages where collecting labeled data is di cult and time-consuming. Translating supervision available in English is also not a viable solution, because developing a good machine translation system requires expensive to annotate resources which are not available for most languages. In this thesis, I argue that cross-lingual representations are an effective means of extending NLP tools to languages beyond English without resorting to generous amounts of annotated data or expensive machine translation. These representations can be learned in an inexpensive manner, often from signals completely unrelated to the task of interest. I begin with a review of different ways of inducing such representations using a variety of cross-lingual signals and study algorithmic approaches of using them in a diverse set of downstream tasks. Examples of such tasks covered in this thesis include learning representations to transfer a trained model across languages for document classification, assist in monolingual lexical semantics like word sense induction, identify asymmetric lexical relationships like hypernymy between words in different languages, or combining supervision across languages through a shared feature space for cross-lingual entity linking. In all these applications, the representations make information expressed in other languages available in English, while requiring minimal additional supervision in the language of interest

    What We Know About Wikipedia: A Review of the Literature Analyzing the Project(s).

    No full text
    This article proposes a review of the literature analyzing Wikipedia as a collective system for producing knowledge. JEL Classification: L39, L86, H41, D7

    Wikipedia @ 20

    Get PDF
    Wikipedia’s first twenty years: how what began as an experiment in collaboration became the world’s most popular reference work. We have been looking things up in Wikipedia for twenty years. What began almost by accident—a wiki attached to a nascent online encyclopedia—has become the world’s most popular reference work. Regarded at first as the scholarly equivalent of a Big Mac, Wikipedia is now known for its reliable sourcing and as a bastion of (mostly) reasoned interaction. How has Wikipedia, built on a model of radical collaboration, remained true to its original mission of “free access to the sum of all human knowledge” when other tech phenomena have devolved into advertising platforms? In this book, scholars, activists, and volunteers reflect on Wikipedia’s first twenty years, revealing connections across disciplines and borders, languages and data, the professional and personal. The contributors consider Wikipedia’s history, the richness of the connections that underpin it, and its founding vision. Their essays look at, among other things, the shift from bewilderment to respect in press coverage of Wikipedia; Wikipedia as “the most important laboratory for social scientific and computing research in history”; and the acknowledgment that “free access” includes not just access to the material but freedom to contribute—that the summation of all human knowledge is biased by who documents it. Contributors Phoebe Ayers, Omer Benjakob, Yochai Benkler, William Beutler, Siko Bouterse, Rebecca Thorndike-Breeze, Amy Carleton, Robert Cummings, LiAnna L. Davis, Siân Evans, Heather Ford, Stephen Harrison, Heather Hart, Benjamin Mako Hill, Dariusz Jemielniak, Brian Keegan, Jackie Koerner, Alexandria Lockett, Jacqueline Mabey, Katherine Maher, Michael Mandiberg, Stephane Coillet-Matillon, Cecelia A. Musselman, Eliza Myrie, Jake Orlowitz, Ian A. Ramjohn, Joseph Reagle, Anasuya Sengupta, Aaron Shaw, Melissa Tamani, Jina Valentine, Matthew Vetter, Adele Vrana, Denny Vrandeči

    Critical point of view: a Wikipedia reader

    Get PDF
    For millions of internet users around the globe, the search for new knowledge begins with Wikipedia. The encyclopedia’s rapid rise, novel organization, and freely offered content have been marveled at and denounced by a host of commentators. Critical Point of View moves beyond unflagging praise, well-worn facts, and questions about its reliability and accuracy, to unveil the complex, messy, and controversial realities of a distributed knowledge platform. The essays, interviews and artworks brought together in this reader form part of the overarching Critical Point of View research initiative, which began with a conference in Bangalore (January 2010), followed by events in Amsterdam (March 2010) and Leipzig (September 2010). With an emphasis on theoretical reflection, cultural difference and indeed, critique, contributions to this collection ask: What values are embedded in Wikipedia’s software? On what basis are Wikipedia’s claims to neutrality made? How can Wikipedia give voice to those outside the Western tradition of Enlightenment, or even its own administrative hierarchies? Critical Point of View collects original insights on the next generation of wiki-related research, from radical artistic interventions and the significant role of bots to hidden trajectories of encyclopedic knowledge and the politics of agency and exclusion
    corecore