82 research outputs found

    People on Drugs: Credibility of User Statements in Health Communities

    Get PDF
    Online health communities are a valuable source of information for patients and physicians. However, such user-generated resources are often plagued by inaccuracies and misinformation. In this work we propose a method for automatically establishing the credibility of user-generated medical statements and the trustworthiness of their authors by exploiting linguistic cues and distant supervision from expert sources. To this end we introduce a probabilistic graphical model that jointly learns user trustworthiness, statement credibility, and language objectivity. We apply this methodology to the task of extracting rare or unknown side-effects of medical drugs --- this being one of the problems where large scale non-expert data has the potential to complement expert medical knowledge. We show that our method can reliably extract side-effects and filter out false statements, while identifying trustworthy users that are likely to contribute valuable medical information

    Representatively Memorable: Sampling the Right Phrase Set to Get the Text Entry Experiment Right

    Full text link
    [EN] In text entry experiments, memorability is a desired property of the phrases used as stimuli. Unfortunately, to date there is no automated method to achieve this effect. As a result, researchers have to use either manually curated Englishonly phrase sets or sampling procedures that do not guarantee phrases being memorable. In response to this need, we present a novel sampling method based on two core ideas: a multiple regression model over language-independent features, and the statistical analysis of the corpus from which phrases will be drawn. Our results show that researchers can finally use a method to successfully curate their own stimuli targeting potentially any language or domain. The source code as well as our phrase sets are publicly available.This work is supported by the 7th Framework Program of the European Commision (FP7/2007-13) under grant agreements 287576 (CASMACAT) and 600707 (tranScriptorium)Leiva, LA.; Sanchis-Trilles, G. (2014). Representatively Memorable: Sampling the Right Phrase Set to Get the Text Entry Experiment Right. ACM. 1709-1712. https://doi.org/10.1145/2556288.2557024S1709171

    Probabilistic Grid-based Collision Risk Prediction for Driving Application

    Get PDF
    International audienceIn the recent years, more and more modern cars have been equipped with perception capabilities. One of the key applications of such perception systems is the estimation of a risk of collision. This is necessary for both Advanced Driver Assistance Systems and Autonomous Navigation. Most approach for risk estimation propose to detect and track the dynamic objects in the scene. Then the risk is estimated as a Time To Collision (TTC) by projecting the object's trajectory in the future. In this paper, we propose a new grid-based approach for collision risk prediction, based on the Hybrid-Sampling Bayesian Occupancy Filter framework. The idea is to compute an estimation of the TTC for each cell of the grid, instead of reasoning on objects. This strategy avoids to solve the difficult problem of multi-objects detection and tracking and provides a probabilistic estimation of the risk associated to each TTC value. After promising initial results, we propose in this paper to evaluate the relevance of the method for real on-road applications, by using a real-time implementation of our method in an experimental vehicle

    Cascades: A view from Audience

    Full text link
    Cascades on online networks have been a popular subject of study in the past decade, and there is a considerable literature on phenomena such as diffusion mechanisms, virality, cascade prediction, and peer network effects. However, a basic question has received comparatively little attention: how desirable are cascades on a social media platform from the point of view of users? While versions of this question have been considered from the perspective of the producers of cascades, any answer to this question must also take into account the effect of cascades on their audience. In this work, we seek to fill this gap by providing a consumer perspective of cascade. Users on online networks play the dual role of producers and consumers. First, we perform an empirical study of the interaction of Twitter users with retweet cascades. We measure how often users observe retweets in their home timeline, and observe a phenomenon that we term the "Impressions Paradox": the share of impressions for cascades of size k decays much slower than frequency of cascades of size k. Thus, the audience for cascades can be quite large even for rare large cascades. We also measure audience engagement with retweet cascades in comparison to non-retweeted content. Our results show that cascades often rival or exceed organic content in engagement received per impression. This result is perhaps surprising in that consumers didn't opt in to see tweets from these authors. Furthermore, although cascading content is widely popular, one would expect it to eventually reach parts of the audience that may not be interested in the content. Motivated by our findings, we posit a theoretical model that focuses on the effect of cascades on the audience. Our results on this model highlight the balance between retweeting as a high-quality content selection mechanism and the role of network users in filtering irrelevant content

    A Benchmark Study on Sentiment Analysis for Software Engineering Research

    Full text link
    A recent research trend has emerged to identify developers' emotions, by applying sentiment analysis to the content of communication traces left in collaborative development environments. Trying to overcome the limitations posed by using off-the-shelf sentiment analysis tools, researchers recently started to develop their own tools for the software engineering domain. In this paper, we report a benchmark study to assess the performance and reliability of three sentiment analysis tools specifically customized for software engineering. Furthermore, we offer a reflection on the open challenges, as they emerge from a qualitative analysis of misclassified texts.Comment: Proceedings of 15th International Conference on Mining Software Repositories (MSR 2018

    All Who Wander: On the Prevalence and Characteristics of Multi-community Engagement

    Full text link
    Although analyzing user behavior within individual communities is an active and rich research domain, people usually interact with multiple communities both on- and off-line. How do users act in such multi-community environments? Although there are a host of intriguing aspects to this question, it has received much less attention in the research community in comparison to the intra-community case. In this paper, we examine three aspects of multi-community engagement: the sequence of communities that users post to, the language that users employ in those communities, and the feedback that users receive, using longitudinal posting behavior on Reddit as our main data source, and DBLP for auxiliary experiments. We also demonstrate the effectiveness of features drawn from these aspects in predicting users' future level of activity. One might expect that a user's trajectory mimics the "settling-down" process in real life: an initial exploration of sub-communities before settling down into a few niches. However, we find that the users in our data continually post in new communities; moreover, as time goes on, they post increasingly evenly among a more diverse set of smaller communities. Interestingly, it seems that users that eventually leave the community are "destined" to do so from the very beginning, in the sense of showing significantly different "wandering" patterns very early on in their trajectories; this finding has potentially important design implications for community maintainers. Our multi-community perspective also allows us to investigate the "situation vs. personality" debate from language usage across different communities.Comment: 11 pages, data available at https://chenhaot.com/pages/multi-community.html, Proceedings of WWW 2015 (updated references

    Competition and Selection Among Conventions

    Full text link
    In many domains, a latent competition among different conventions determines which one will come to dominate. One sees such effects in the success of community jargon, of competing frames in political rhetoric, or of terminology in technical contexts. These effects have become widespread in the online domain, where the data offers the potential to study competition among conventions at a fine-grained level. In analyzing the dynamics of conventions over time, however, even with detailed on-line data, one encounters two significant challenges. First, as conventions evolve, the underlying substance of their meaning tends to change as well; and such substantive changes confound investigations of social effects. Second, the selection of a convention takes place through the complex interactions of individuals within a community, and contention between the users of competing conventions plays a key role in the convention's evolution. Any analysis must take place in the presence of these two issues. In this work we study a setting in which we can cleanly track the competition among conventions. Our analysis is based on the spread of low-level authoring conventions in the eprint arXiv over 24 years: by tracking the spread of macros and other author-defined conventions, we are able to study conventions that vary even as the underlying meaning remains constant. We find that the interaction among co-authors over time plays a crucial role in the selection of them; the distinction between more and less experienced members of the community, and the distinction between conventions with visible versus invisible effects, are both central to the underlying processes. Through our analysis we make predictions at the population level about the ultimate success of different synonymous conventions over time--and at the individual level about the outcome of "fights" between people over convention choices.Comment: To appear in Proceedings of WWW 2017, data at https://github.com/CornellNLP/Macro
    • …
    corecore