3,620 research outputs found
A Graph-structured Dataset for Wikipedia Research
Wikipedia is a rich and invaluable source of information. Its central place
on the Web makes it a particularly interesting object of study for scientists.
Researchers from different domains used various complex datasets related to
Wikipedia to study language, social behavior, knowledge organization, and
network theory. While being a scientific treasure, the large size of the
dataset hinders pre-processing and may be a challenging obstacle for potential
new studies. This issue is particularly acute in scientific domains where
researchers may not be technically and data processing savvy. On one hand, the
size of Wikipedia dumps is large. It makes the parsing and extraction of
relevant information cumbersome. On the other hand, the API is straightforward
to use but restricted to a relatively small number of requests. The middle
ground is at the mesoscopic scale when researchers need a subset of Wikipedia
ranging from thousands to hundreds of thousands of pages but there exists no
efficient solution at this scale.
In this work, we propose an efficient data structure to make requests and
access subnetworks of Wikipedia pages and categories. We provide convenient
tools for accessing and filtering viewership statistics or "pagecounts" of
Wikipedia web pages. The dataset organization leverages principles of graph
databases that allows rapid and intuitive access to subgraphs of Wikipedia
articles and categories. The dataset and deployment guidelines are available on
the LTS2 website \url{https://lts2.epfl.ch/Datasets/Wikipedia/}
Link Prediction with Mutual Attention for Text-Attributed Networks
In this extended abstract, we present an algorithm that learns a similarity
measure between documents from the network topology of a structured corpus. We
leverage the Scaled Dot-Product Attention, a recently proposed attention
mechanism, to design a mutual attention mechanism between pairs of documents.
To train its parameters, we use the network links as supervision. We provide
preliminary experiment results with a citation dataset on two prediction tasks,
demonstrating the capacity of our model to learn a meaningful textual
similarity.Comment: Added missing referenc
WebProt\'eg\'e: A Cloud-Based Ontology Editor
We present WebProt\'eg\'e, a tool to develop ontologies represented in the
Web Ontology Language (OWL). WebProt\'eg\'e is a cloud-based application that
allows users to collaboratively edit OWL ontologies, and it is available for
use at https://webprotege.stanford.edu. WebProt\'ege\'e currently hosts more
than 68,000 OWL ontology projects and has over 50,000 user accounts. In this
paper, we detail the main new features of the latest version of WebProt\'eg\'e
Are All Successful Communities Alike? Characterizing and Predicting the Success of Online Communities
The proliferation of online communities has created exciting opportunities to
study the mechanisms that explain group success. While a growing body of
research investigates community success through a single measure -- typically,
the number of members -- we argue that there are multiple ways of measuring
success. Here, we present a systematic study to understand the relations
between these success definitions and test how well they can be predicted based
on community properties and behaviors from the earliest period of a community's
lifetime. We identify four success measures that are desirable for most
communities: (i) growth in the number of members; (ii) retention of members;
(iii) long term survival of the community; and (iv) volume of activities within
the community. Surprisingly, we find that our measures do not exhibit very high
correlations, suggesting that they capture different types of success.
Additionally, we find that different success measures are predicted by
different attributes of online communities, suggesting that success can be
achieved through different behaviors. Our work sheds light on the basic
understanding of what success represents in online communities and what
predicts it. Our results suggest that success is multi-faceted and cannot be
measured nor predicted by a single measurement. This insight has practical
implications for the creation of new online communities and the design of
platforms that facilitate such communities.Comment: To appear at The Web Conference 201
Characterization of Local Attitudes Toward Immigration Using Social Media
Migration is a worldwide phenomenon that may generate different reactions in
the population. Attitudes vary from those that support multiculturalism and
communion between locals and foreigners, to contempt and hatred toward
immigrants. Since anti-immigration attitudes are often materialized in acts of
violence and discrimination, it is important to identify factors that
characterize these attitudes. However, doing so is expensive and impractical,
as traditional methods require enormous efforts to collect data. In this paper,
we propose to leverage Twitter to characterize local attitudes toward
immigration, with a case study on Chile, where immigrant population has
drastically increased in recent years. Using semi-supervised topic modeling, we
situated 49K users into a spectrum ranging from in-favor to against
immigration. We characterized both sides of the spectrum in two aspects: the
emotions and lexical categories relevant for each attitude, and the discussion
network structure. We found that the discussion is mostly driven by Haitian
immigration; that there are temporal trends in tendency and polarity of
discussion; and that assortative behavior on the network differs with respect
to attitude. These insights may inform policy makers on how people feel with
respect to migration, with potential implications on communication of policy
and the design of interventions to improve inter-group relations.Comment: 8 pages, accepted at Latin American Web Congress 2019 (co-located
with The Web Conference
Multimodal Emotion Classification
Most NLP and Computer Vision tasks are limited to scarcity of labelled data.
In social media emotion classification and other related tasks, hashtags have
been used as indicators to label data. With the rapid increase in emoji usage
of social media, emojis are used as an additional feature for major social NLP
tasks. However, this is less explored in case of multimedia posts on social
media where posts are composed of both image and text. At the same time, w.e
have seen a surge in the interest to incorporate domain knowledge to improve
machine understanding of text. In this paper, we investigate whether domain
knowledge for emoji can improve the accuracy of emotion classification task. We
exploit the importance of different modalities from social media post for
emotion classification task using state-of-the-art deep learning architectures.
Our experiments demonstrate that the three modalities (text, emoji and images)
encode different information to express emotion and therefore can complement
each other. Our results also demonstrate that emoji sense depends on the
textual context, and emoji combined with text encodes better information than
considered separately. The highest accuracy of 71.98\% is achieved with a
training data of 550k posts.Comment: Accepted at the 2nd Emoji Workshop co-located with The Web Conference
201
- …