82 research outputs found
People on Drugs: Credibility of User Statements in Health Communities
Online health communities are a valuable source of information for patients and physicians. However, such user-generated resources are often plagued by inaccuracies and misinformation. In this work we propose a method for automatically establishing the credibility of user-generated medical statements and the trustworthiness of their authors by exploiting linguistic cues and distant supervision from expert sources. To this end we introduce a probabilistic graphical model that jointly learns user trustworthiness, statement credibility, and language objectivity. We apply this methodology to the task of extracting rare or unknown side-effects of medical drugs --- this being one of the problems where large scale non-expert data has the potential to complement expert medical knowledge. We show that our method can reliably extract side-effects and filter out false statements, while identifying trustworthy users that are likely to contribute valuable medical information
Representatively Memorable: Sampling the Right Phrase Set to Get the Text Entry Experiment Right
[EN] In text entry experiments, memorability is a desired property of the phrases used as stimuli. Unfortunately, to date
there is no automated method to achieve this effect. As a result, researchers have to use either manually curated Englishonly phrase sets or sampling procedures that do not guarantee phrases being memorable. In response to this need, we
present a novel sampling method based on two core ideas:
a multiple regression model over language-independent features, and the statistical analysis of the corpus from which
phrases will be drawn. Our results show that researchers can
finally use a method to successfully curate their own stimuli targeting potentially any language or domain. The source
code as well as our phrase sets are publicly available.This work is supported by the 7th Framework Program of the
European Commision (FP7/2007-13) under grant agreements
287576 (CASMACAT) and 600707 (tranScriptorium)Leiva, LA.; Sanchis-Trilles, G. (2014). Representatively Memorable: Sampling the Right Phrase Set to Get the Text Entry Experiment Right. ACM. 1709-1712. https://doi.org/10.1145/2556288.2557024S1709171
Probabilistic Grid-based Collision Risk Prediction for Driving Application
International audienceIn the recent years, more and more modern cars have been equipped with perception capabilities. One of the key applications of such perception systems is the estimation of a risk of collision. This is necessary for both Advanced Driver Assistance Systems and Autonomous Navigation. Most approach for risk estimation propose to detect and track the dynamic objects in the scene. Then the risk is estimated as a Time To Collision (TTC) by projecting the object's trajectory in the future. In this paper, we propose a new grid-based approach for collision risk prediction, based on the Hybrid-Sampling Bayesian Occupancy Filter framework. The idea is to compute an estimation of the TTC for each cell of the grid, instead of reasoning on objects. This strategy avoids to solve the difficult problem of multi-objects detection and tracking and provides a probabilistic estimation of the risk associated to each TTC value. After promising initial results, we propose in this paper to evaluate the relevance of the method for real on-road applications, by using a real-time implementation of our method in an experimental vehicle
Cascades: A view from Audience
Cascades on online networks have been a popular subject of study in the past
decade, and there is a considerable literature on phenomena such as diffusion
mechanisms, virality, cascade prediction, and peer network effects. However, a
basic question has received comparatively little attention: how desirable are
cascades on a social media platform from the point of view of users? While
versions of this question have been considered from the perspective of the
producers of cascades, any answer to this question must also take into account
the effect of cascades on their audience. In this work, we seek to fill this
gap by providing a consumer perspective of cascade.
Users on online networks play the dual role of producers and consumers.
First, we perform an empirical study of the interaction of Twitter users with
retweet cascades. We measure how often users observe retweets in their home
timeline, and observe a phenomenon that we term the "Impressions Paradox": the
share of impressions for cascades of size k decays much slower than frequency
of cascades of size k. Thus, the audience for cascades can be quite large even
for rare large cascades. We also measure audience engagement with retweet
cascades in comparison to non-retweeted content. Our results show that cascades
often rival or exceed organic content in engagement received per impression.
This result is perhaps surprising in that consumers didn't opt in to see tweets
from these authors. Furthermore, although cascading content is widely popular,
one would expect it to eventually reach parts of the audience that may not be
interested in the content. Motivated by our findings, we posit a theoretical
model that focuses on the effect of cascades on the audience. Our results on
this model highlight the balance between retweeting as a high-quality content
selection mechanism and the role of network users in filtering irrelevant
content
A Benchmark Study on Sentiment Analysis for Software Engineering Research
A recent research trend has emerged to identify developers' emotions, by
applying sentiment analysis to the content of communication traces left in
collaborative development environments. Trying to overcome the limitations
posed by using off-the-shelf sentiment analysis tools, researchers recently
started to develop their own tools for the software engineering domain. In this
paper, we report a benchmark study to assess the performance and reliability of
three sentiment analysis tools specifically customized for software
engineering. Furthermore, we offer a reflection on the open challenges, as they
emerge from a qualitative analysis of misclassified texts.Comment: Proceedings of 15th International Conference on Mining Software
Repositories (MSR 2018
All Who Wander: On the Prevalence and Characteristics of Multi-community Engagement
Although analyzing user behavior within individual communities is an active
and rich research domain, people usually interact with multiple communities
both on- and off-line. How do users act in such multi-community environments?
Although there are a host of intriguing aspects to this question, it has
received much less attention in the research community in comparison to the
intra-community case. In this paper, we examine three aspects of
multi-community engagement: the sequence of communities that users post to, the
language that users employ in those communities, and the feedback that users
receive, using longitudinal posting behavior on Reddit as our main data source,
and DBLP for auxiliary experiments. We also demonstrate the effectiveness of
features drawn from these aspects in predicting users' future level of
activity.
One might expect that a user's trajectory mimics the "settling-down" process
in real life: an initial exploration of sub-communities before settling down
into a few niches. However, we find that the users in our data continually post
in new communities; moreover, as time goes on, they post increasingly evenly
among a more diverse set of smaller communities. Interestingly, it seems that
users that eventually leave the community are "destined" to do so from the very
beginning, in the sense of showing significantly different "wandering" patterns
very early on in their trajectories; this finding has potentially important
design implications for community maintainers. Our multi-community perspective
also allows us to investigate the "situation vs. personality" debate from
language usage across different communities.Comment: 11 pages, data available at
https://chenhaot.com/pages/multi-community.html, Proceedings of WWW 2015
(updated references
Competition and Selection Among Conventions
In many domains, a latent competition among different conventions determines
which one will come to dominate. One sees such effects in the success of
community jargon, of competing frames in political rhetoric, or of terminology
in technical contexts. These effects have become widespread in the online
domain, where the data offers the potential to study competition among
conventions at a fine-grained level.
In analyzing the dynamics of conventions over time, however, even with
detailed on-line data, one encounters two significant challenges. First, as
conventions evolve, the underlying substance of their meaning tends to change
as well; and such substantive changes confound investigations of social
effects. Second, the selection of a convention takes place through the complex
interactions of individuals within a community, and contention between the
users of competing conventions plays a key role in the convention's evolution.
Any analysis must take place in the presence of these two issues.
In this work we study a setting in which we can cleanly track the competition
among conventions. Our analysis is based on the spread of low-level authoring
conventions in the eprint arXiv over 24 years: by tracking the spread of macros
and other author-defined conventions, we are able to study conventions that
vary even as the underlying meaning remains constant. We find that the
interaction among co-authors over time plays a crucial role in the selection of
them; the distinction between more and less experienced members of the
community, and the distinction between conventions with visible versus
invisible effects, are both central to the underlying processes. Through our
analysis we make predictions at the population level about the ultimate success
of different synonymous conventions over time--and at the individual level
about the outcome of "fights" between people over convention choices.Comment: To appear in Proceedings of WWW 2017, data at
https://github.com/CornellNLP/Macro
- …