91 research outputs found
A Corpus of Sentence-level Revisions in Academic Writing: A Step towards Understanding Statement Strength in Communication
The strength with which a statement is made can have a significant impact on
the audience. For example, international relations can be strained by how the
media in one country describes an event in another; and papers can be rejected
because they overstate or understate their findings. It is thus important to
understand the effects of statement strength. A first step is to be able to
distinguish between strong and weak statements. However, even this problem is
understudied, partly due to a lack of data. Since strength is inherently
relative, revisions of texts that make claims are a natural source of data on
strength differences. In this paper, we introduce a corpus of sentence-level
revisions from academic writing. We also describe insights gained from our
annotation efforts for this task.Comment: 6 pages, to appear in Proceedings of ACL 2014 (short paper
Tracing Community Genealogy: How New Communities Emerge from the Old
The process by which new communities emerge is a central research issue in
the social sciences. While a growing body of research analyzes the formation of
a single community by examining social networks between individuals, we
introduce a novel community-centered perspective. We highlight the fact that
the context in which a new community emerges contains numerous existing
communities. We reveal the emerging process of communities by tracing their
early members' previous community memberships.
Our testbed is Reddit, a website that consists of tens of thousands of
user-created communities. We analyze a dataset that spans over a decade and
includes the posting history of users on Reddit from its inception to April
2017. We first propose a computational framework for building genealogy graphs
between communities. We present the first large-scale characterization of such
genealogy graphs. Surprisingly, basic graph properties, such as the number of
parents and max parent weight, converge quickly despite the fact that the
number of communities increases rapidly over time. Furthermore, we investigate
the connection between a community's origin and its future growth. Our results
show that strong parent connections are associated with future community
growth, confirming the importance of existing community structures in which a
new community emerges. Finally, we turn to the individual level and examine the
characteristics of early members. We find that a diverse portfolio across
existing communities is the most important predictor for becoming an early
member in a new community.Comment: 10 pages, 7 figures, to appear in Proceedings of ICWSM 2018, data and
more at https://chenhaot.com/papers/community-genealogy.htm
Friendships, Rivalries, and Trysts: Characterizing Relations between Ideas in Texts
Understanding how ideas relate to each other is a fundamental question in
many domains, ranging from intellectual history to public communication.
Because ideas are naturally embedded in texts, we propose the first framework
to systematically characterize the relations between ideas based on their
occurrence in a corpus of documents, independent of how these ideas are
represented. Combining two statistics --- cooccurrence within documents and
prevalence correlation over time --- our approach reveals a number of different
ways in which ideas can cooperate and compete. For instance, two ideas can
closely track each other's prevalence over time, and yet rarely cooccur, almost
like a "cold war" scenario. We observe that pairwise cooccurrence and
prevalence correlation exhibit different distributions. We further demonstrate
that our approach is able to uncover intriguing relations between ideas through
in-depth case studies on news articles and research papers.Comment: 11 pages, 9 figures, to appear in Proceedings of ACL 2017, code and
data available at https://chenhaot.com/pages/idea-relations.html (fixed a
typo
Urban Dreams of Migrants: A Case Study of Migrant Integration in Shanghai
Unprecedented human mobility has driven the rapid urbanization around the
world. In China, the fraction of population dwelling in cities increased from
17.9% to 52.6% between 1978 and 2012. Such large-scale migration poses
challenges for policymakers and important questions for researchers. To
investigate the process of migrant integration, we employ a one-month complete
dataset of telecommunication metadata in Shanghai with 54 million users and 698
million call logs. We find systematic differences between locals and migrants
in their mobile communication networks and geographical locations. For
instance, migrants have more diverse contacts and move around the city with a
larger radius than locals after they settle down. By distinguishing new
migrants (who recently moved to Shanghai) from settled migrants (who have been
in Shanghai for a while), we demonstrate the integration process of new
migrants in their first three weeks. Moreover, we formulate classification
problems to predict whether a person is a migrant. Our classifier is able to
achieve an F1-score of 0.82 when distinguishing settled migrants from locals,
but it remains challenging to identify new migrants because of class imbalance.
This classification setup holds promise for identifying new migrants who will
successfully integrate into locals (new migrants that misclassified as locals).Comment: A modified version. The paper was accepted by AAAI 201
- …