64 research outputs found

    Representatively Memorable: Sampling the Right Phrase Set to Get the Text Entry Experiment Right

    Full text link
    [EN] In text entry experiments, memorability is a desired property of the phrases used as stimuli. Unfortunately, to date there is no automated method to achieve this effect. As a result, researchers have to use either manually curated Englishonly phrase sets or sampling procedures that do not guarantee phrases being memorable. In response to this need, we present a novel sampling method based on two core ideas: a multiple regression model over language-independent features, and the statistical analysis of the corpus from which phrases will be drawn. Our results show that researchers can finally use a method to successfully curate their own stimuli targeting potentially any language or domain. The source code as well as our phrase sets are publicly available.This work is supported by the 7th Framework Program of the European Commision (FP7/2007-13) under grant agreements 287576 (CASMACAT) and 600707 (tranScriptorium)Leiva, LA.; Sanchis-Trilles, G. (2014). Representatively Memorable: Sampling the Right Phrase Set to Get the Text Entry Experiment Right. ACM. 1709-1712. https://doi.org/10.1145/2556288.2557024S1709171

    Cascades: A view from Audience

    Full text link
    Cascades on online networks have been a popular subject of study in the past decade, and there is a considerable literature on phenomena such as diffusion mechanisms, virality, cascade prediction, and peer network effects. However, a basic question has received comparatively little attention: how desirable are cascades on a social media platform from the point of view of users? While versions of this question have been considered from the perspective of the producers of cascades, any answer to this question must also take into account the effect of cascades on their audience. In this work, we seek to fill this gap by providing a consumer perspective of cascade. Users on online networks play the dual role of producers and consumers. First, we perform an empirical study of the interaction of Twitter users with retweet cascades. We measure how often users observe retweets in their home timeline, and observe a phenomenon that we term the "Impressions Paradox": the share of impressions for cascades of size k decays much slower than frequency of cascades of size k. Thus, the audience for cascades can be quite large even for rare large cascades. We also measure audience engagement with retweet cascades in comparison to non-retweeted content. Our results show that cascades often rival or exceed organic content in engagement received per impression. This result is perhaps surprising in that consumers didn't opt in to see tweets from these authors. Furthermore, although cascading content is widely popular, one would expect it to eventually reach parts of the audience that may not be interested in the content. Motivated by our findings, we posit a theoretical model that focuses on the effect of cascades on the audience. Our results on this model highlight the balance between retweeting as a high-quality content selection mechanism and the role of network users in filtering irrelevant content

    Competition and Selection Among Conventions

    Full text link
    In many domains, a latent competition among different conventions determines which one will come to dominate. One sees such effects in the success of community jargon, of competing frames in political rhetoric, or of terminology in technical contexts. These effects have become widespread in the online domain, where the data offers the potential to study competition among conventions at a fine-grained level. In analyzing the dynamics of conventions over time, however, even with detailed on-line data, one encounters two significant challenges. First, as conventions evolve, the underlying substance of their meaning tends to change as well; and such substantive changes confound investigations of social effects. Second, the selection of a convention takes place through the complex interactions of individuals within a community, and contention between the users of competing conventions plays a key role in the convention's evolution. Any analysis must take place in the presence of these two issues. In this work we study a setting in which we can cleanly track the competition among conventions. Our analysis is based on the spread of low-level authoring conventions in the eprint arXiv over 24 years: by tracking the spread of macros and other author-defined conventions, we are able to study conventions that vary even as the underlying meaning remains constant. We find that the interaction among co-authors over time plays a crucial role in the selection of them; the distinction between more and less experienced members of the community, and the distinction between conventions with visible versus invisible effects, are both central to the underlying processes. Through our analysis we make predictions at the population level about the ultimate success of different synonymous conventions over time--and at the individual level about the outcome of "fights" between people over convention choices.Comment: To appear in Proceedings of WWW 2017, data at https://github.com/CornellNLP/Macro

    Climate Informatics

    Get PDF
    The impacts of present and potential future climate change will be one of the most important scientific and societal challenges in the 21st century. Given observed changes in temperature, sea ice, and sea level, improving our understanding of the climate system is an international priority. This system is characterized by complex phenomena that are imperfectly observed and even more imperfectly simulated. But with an ever-growing supply of climate data from satellites and environmental sensors, the magnitude of data and climate model output is beginning to overwhelm the relatively simple tools currently used to analyze them. A computational approach will therefore be indispensable for these analysis challenges. This chapter introduces the fledgling research discipline climate informatics: collaborations between climate scientists and machine learning researchers in order to bridge this gap between data and understanding. We hope that the study of climate informatics will accelerate discovery in answering pressing questions in climate science

    Inference algorithms for gene networks: a statistical mechanics analysis

    Full text link
    The inference of gene regulatory networks from high throughput gene expression data is one of the major challenges in systems biology. This paper aims at analysing and comparing two different algorithmic approaches. The first approach uses pairwise correlations between regulated and regulating genes; the second one uses message-passing techniques for inferring activating and inhibiting regulatory interactions. The performance of these two algorithms can be analysed theoretically on well-defined test sets, using tools from the statistical physics of disordered systems like the replica method. We find that the second algorithm outperforms the first one since it takes into account collective effects of multiple regulators

    Towards a consolidation of worldwide journal rankings - A classification using random forests and aggregate rating via data envelopment analysis

    Get PDF
    AbstractThe question of how to assess research outputs published in journals is now a global concern for academics. Numerous journal ratings and rankings exist, some featuring perceptual and peer-review-based journal ranks, some focusing on objective information related to citations, some using a combination of the two. This research consolidates existing journal rankings into an up-to-date and comprehensive list. Existing approaches to determining journal rankings are significantly advanced with the application of a new classification approach, ‘random forests’, and data envelopment analysis. As a result, a fresh look at a publication׳s place in the global research community is offered. While our approach is applicable to all management and business journals, we specifically exemplify the relative position of ‘operations research, management science, production and operations management’ journals within the broader management field, as well as within their own subject domain
    • …
    corecore