Search CORE

3,620 research outputs found

A Graph-structured Dataset for Wikipedia Research

Author: Aspert Nicolas
Miz Volodymyr
Ricaud Benjamin
Vandergheynst Pierre
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/03/2019
Field of study

Wikipedia is a rich and invaluable source of information. Its central place on the Web makes it a particularly interesting object of study for scientists. Researchers from different domains used various complex datasets related to Wikipedia to study language, social behavior, knowledge organization, and network theory. While being a scientific treasure, the large size of the dataset hinders pre-processing and may be a challenging obstacle for potential new studies. This issue is particularly acute in scientific domains where researchers may not be technically and data processing savvy. On one hand, the size of Wikipedia dumps is large. It makes the parsing and extraction of relevant information cumbersome. On the other hand, the API is straightforward to use but restricted to a relatively small number of requests. The middle ground is at the mesoscopic scale when researchers need a subset of Wikipedia ranging from thousands to hundreds of thousands of pages but there exists no efficient solution at this scale. In this work, we propose an efficient data structure to make requests and access subnetworks of Wikipedia pages and categories. We provide convenient tools for accessing and filtering viewership statistics or "pagecounts" of Wikipedia web pages. The dataset organization leverages principles of graph databases that allows rapid and intuitive access to subgraphs of Wikipedia articles and categories. The dataset and deployment guidelines are available on the LTS2 website \url{https://lts2.epfl.ch/Datasets/Wikipedia/}

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Link Prediction with Mutual Attention for Text-Attributed Networks

Author: Brochier Robin
Guille Adrien
Velcin Julien
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/03/2019
Field of study

In this extended abstract, we present an algorithm that learns a similarity measure between documents from the network topology of a structured corpus. We leverage the Scaled Dot-Product Attention, a recently proposed attention mechanism, to design a mutual attention mechanism between pairs of documents. To train its parameters, we use the network links as supervision. We provide preliminary experiment results with a citation dataset on two prediction tasks, demonstrating the capacity of our model to learn a meaningful textual similarity.Comment: Added missing referenc

arXiv.org e-Print Archive

WebProt\'eg\'e: A Cloud-Based Ontology Editor

Author: Gonçalves Rafael S.
Horridge Matthew
Musen Mark A.
Nyulas Csongor I.
Tudorache Tania
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/03/2019
Field of study

We present WebProt\'eg\'e, a tool to develop ontologies represented in the Web Ontology Language (OWL). WebProt\'eg\'e is a cloud-based application that allows users to collaboratively edit OWL ontologies, and it is available for use at https://webprotege.stanford.edu. WebProt\'ege\'e currently hosts more than 68,000 OWL ontology projects and has over 50,000 user accounts. In this paper, we detail the main new features of the latest version of WebProt\'eg\'e

arXiv.org e-Print Archive

Crossref

Are All Successful Communities Alike? Characterizing and Predicting the Success of Online Communities

Author: Craswell Nick
Hamilton W.
Jurgens David
Newell Edward
Platt E
Romero Daniel M.
Tan Chenhao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/03/2019
Field of study

The proliferation of online communities has created exciting opportunities to study the mechanisms that explain group success. While a growing body of research investigates community success through a single measure -- typically, the number of members -- we argue that there are multiple ways of measuring success. Here, we present a systematic study to understand the relations between these success definitions and test how well they can be predicted based on community properties and behaviors from the earliest period of a community's lifetime. We identify four success measures that are desirable for most communities: (i) growth in the number of members; (ii) retention of members; (iii) long term survival of the community; and (iv) volume of activities within the community. Surprisingly, we find that our measures do not exhibit very high correlations, suggesting that they capture different types of success. Additionally, we find that different success measures are predicted by different attributes of online communities, suggesting that success can be achieved through different behaviors. Our work sheds light on the basic understanding of what success represents in online communities and what predicts it. Our results suggest that success is multi-faceted and cannot be measured nor predicted by a single measurement. This insight has practical implications for the creation of new online communities and the design of platforms that facilitate such communities.Comment: To appear at The Web Conference 201

arXiv.org e-Print Archive

Crossref

Characterization of Local Attitudes Toward Immigration Using Social Media

Author: Baeza-Yates Ricardo
Carvacho Héctor
Conover Michael
Darwish Kareem
Garcia-Gavilanes Ruth
González-Ibánez Roberto
Hainmueller Jens
Harman GACCT
Herek M
Pennebaker W
Quercia Daniele
Sniderman M
Stephan Cookie White
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/03/2019
Field of study

Migration is a worldwide phenomenon that may generate different reactions in the population. Attitudes vary from those that support multiculturalism and communion between locals and foreigners, to contempt and hatred toward immigrants. Since anti-immigration attitudes are often materialized in acts of violence and discrimination, it is important to identify factors that characterize these attitudes. However, doing so is expensive and impractical, as traditional methods require enormous efforts to collect data. In this paper, we propose to leverage Twitter to characterize local attitudes toward immigration, with a case study on Chile, where immigrant population has drastically increased in recent years. Using semi-supervised topic modeling, we situated 49K users into a spectrum ranging from in-favor to against immigration. We characterized both sides of the spectrum in two aspects: the emotions and lexical categories relevant for each attitude, and the discussion network structure. We found that the discussion is mostly driven by Haitian immigration; that there are temporal trends in tendency and polarity of discussion; and that assortative behavior on the network differs with respect to attitude. These insights may inform policy makers on how people feel with respect to migration, with potential implications on communication of policy and the design of interventions to improve inter-group relations.Comment: 8 pages, accepted at Latin American Web Congress 2019 (co-located with The Web Conference

arXiv.org e-Print Archive

Crossref

Multimodal Emotion Classification

Author: Kursuncu U
Mohammad M
Novak Petra Kralj
Sermanet Pierre
Suet Yan Liew Jasy
Zhang Y
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/03/2019
Field of study

Most NLP and Computer Vision tasks are limited to scarcity of labelled data. In social media emotion classification and other related tasks, hashtags have been used as indicators to label data. With the rapid increase in emoji usage of social media, emojis are used as an additional feature for major social NLP tasks. However, this is less explored in case of multimedia posts on social media where posts are composed of both image and text. At the same time, w.e have seen a surge in the interest to incorporate domain knowledge to improve machine understanding of text. In this paper, we investigate whether domain knowledge for emoji can improve the accuracy of emotion classification task. We exploit the importance of different modalities from social media post for emotion classification task using state-of-the-art deep learning architectures. Our experiments demonstrate that the three modalities (text, emoji and images) encode different information to express emotion and therefore can complement each other. Our results also demonstrate that emoji sense depends on the textual context, and emoji combined with text encodes better information than considered separately. The highest accuracy of 71.98\% is achieved with a training data of 550k posts.Comment: Accepted at the 2nd Emoji Workshop co-located with The Web Conference 201

arXiv.org e-Print Archive

Crossref

Scholar Commons - Institutional Repository of the University of South Carolina

CORE