64 research outputs found
Representatively Memorable: Sampling the Right Phrase Set to Get the Text Entry Experiment Right
[EN] In text entry experiments, memorability is a desired property of the phrases used as stimuli. Unfortunately, to date
there is no automated method to achieve this effect. As a result, researchers have to use either manually curated Englishonly phrase sets or sampling procedures that do not guarantee phrases being memorable. In response to this need, we
present a novel sampling method based on two core ideas:
a multiple regression model over language-independent features, and the statistical analysis of the corpus from which
phrases will be drawn. Our results show that researchers can
finally use a method to successfully curate their own stimuli targeting potentially any language or domain. The source
code as well as our phrase sets are publicly available.This work is supported by the 7th Framework Program of the
European Commision (FP7/2007-13) under grant agreements
287576 (CASMACAT) and 600707 (tranScriptorium)Leiva, LA.; Sanchis-Trilles, G. (2014). Representatively Memorable: Sampling the Right Phrase Set to Get the Text Entry Experiment Right. ACM. 1709-1712. https://doi.org/10.1145/2556288.2557024S1709171
Cascades: A view from Audience
Cascades on online networks have been a popular subject of study in the past
decade, and there is a considerable literature on phenomena such as diffusion
mechanisms, virality, cascade prediction, and peer network effects. However, a
basic question has received comparatively little attention: how desirable are
cascades on a social media platform from the point of view of users? While
versions of this question have been considered from the perspective of the
producers of cascades, any answer to this question must also take into account
the effect of cascades on their audience. In this work, we seek to fill this
gap by providing a consumer perspective of cascade.
Users on online networks play the dual role of producers and consumers.
First, we perform an empirical study of the interaction of Twitter users with
retweet cascades. We measure how often users observe retweets in their home
timeline, and observe a phenomenon that we term the "Impressions Paradox": the
share of impressions for cascades of size k decays much slower than frequency
of cascades of size k. Thus, the audience for cascades can be quite large even
for rare large cascades. We also measure audience engagement with retweet
cascades in comparison to non-retweeted content. Our results show that cascades
often rival or exceed organic content in engagement received per impression.
This result is perhaps surprising in that consumers didn't opt in to see tweets
from these authors. Furthermore, although cascading content is widely popular,
one would expect it to eventually reach parts of the audience that may not be
interested in the content. Motivated by our findings, we posit a theoretical
model that focuses on the effect of cascades on the audience. Our results on
this model highlight the balance between retweeting as a high-quality content
selection mechanism and the role of network users in filtering irrelevant
content
Competition and Selection Among Conventions
In many domains, a latent competition among different conventions determines
which one will come to dominate. One sees such effects in the success of
community jargon, of competing frames in political rhetoric, or of terminology
in technical contexts. These effects have become widespread in the online
domain, where the data offers the potential to study competition among
conventions at a fine-grained level.
In analyzing the dynamics of conventions over time, however, even with
detailed on-line data, one encounters two significant challenges. First, as
conventions evolve, the underlying substance of their meaning tends to change
as well; and such substantive changes confound investigations of social
effects. Second, the selection of a convention takes place through the complex
interactions of individuals within a community, and contention between the
users of competing conventions plays a key role in the convention's evolution.
Any analysis must take place in the presence of these two issues.
In this work we study a setting in which we can cleanly track the competition
among conventions. Our analysis is based on the spread of low-level authoring
conventions in the eprint arXiv over 24 years: by tracking the spread of macros
and other author-defined conventions, we are able to study conventions that
vary even as the underlying meaning remains constant. We find that the
interaction among co-authors over time plays a crucial role in the selection of
them; the distinction between more and less experienced members of the
community, and the distinction between conventions with visible versus
invisible effects, are both central to the underlying processes. Through our
analysis we make predictions at the population level about the ultimate success
of different synonymous conventions over time--and at the individual level
about the outcome of "fights" between people over convention choices.Comment: To appear in Proceedings of WWW 2017, data at
https://github.com/CornellNLP/Macro
Climate Informatics
The impacts of present and potential future climate change will be one of the most important scientific and societal challenges in the 21st century. Given observed changes in temperature, sea ice, and sea level, improving our understanding of the climate system is an international priority. This system is characterized by complex phenomena that are imperfectly observed and even more imperfectly simulated. But with an ever-growing supply of climate data from satellites and environmental sensors, the magnitude of data and climate model output is beginning to overwhelm the relatively simple tools currently used to analyze them. A computational approach will therefore be indispensable for these analysis challenges. This chapter introduces the fledgling research discipline climate informatics: collaborations between climate scientists and machine learning researchers in order to bridge this gap between data and understanding. We hope that the study of climate informatics will accelerate discovery in answering pressing questions in climate science
Inference algorithms for gene networks: a statistical mechanics analysis
The inference of gene regulatory networks from high throughput gene
expression data is one of the major challenges in systems biology. This paper
aims at analysing and comparing two different algorithmic approaches. The first
approach uses pairwise correlations between regulated and regulating genes; the
second one uses message-passing techniques for inferring activating and
inhibiting regulatory interactions. The performance of these two algorithms can
be analysed theoretically on well-defined test sets, using tools from the
statistical physics of disordered systems like the replica method. We find that
the second algorithm outperforms the first one since it takes into account
collective effects of multiple regulators
Towards a consolidation of worldwide journal rankings - A classification using random forests and aggregate rating via data envelopment analysis
AbstractThe question of how to assess research outputs published in journals is now a global concern for academics. Numerous journal ratings and rankings exist, some featuring perceptual and peer-review-based journal ranks, some focusing on objective information related to citations, some using a combination of the two. This research consolidates existing journal rankings into an up-to-date and comprehensive list. Existing approaches to determining journal rankings are significantly advanced with the application of a new classification approach, ‘random forests’, and data envelopment analysis. As a result, a fresh look at a publication׳s place in the global research community is offered. While our approach is applicable to all management and business journals, we specifically exemplify the relative position of ‘operations research, management science, production and operations management’ journals within the broader management field, as well as within their own subject domain
- …