3 research outputs found
Online Evaluation for Effective Web Service Development
Development of the majority of the leading web services and software products
today is generally guided by data-driven decisions based on evaluation that
ensures a steady stream of updates, both in terms of quality and quantity.
Large internet companies use online evaluation on a day-to-day basis and at a
large scale. The number of smaller companies using A/B testing in their
development cycle is also growing. Web development across the board strongly
depends on quality of experimentation platforms. In this tutorial, we overview
state-of-the-art methods underlying everyday evaluation pipelines at some of
the leading Internet companies. Software engineers, designers, analysts,
service or product managers --- beginners, advanced specialists, and
researchers --- can learn how to make web service development data-driven and
do it effectively
Evaluating Personal Assistants on Mobile devices
The iPhone was introduced only a decade ago in 2007 but has fundamentally
changed the way we interact with online information. Mobile devices differ
radically from classic command-based and point-and-click user interfaces, now
allowing for gesture-based interaction using fine-grained touch and swipe
signals. Due to the rapid growth in the use of voice-controlled intelligent
personal assistants on mobile devices, such as Microsoft's Cortana, Google Now,
and Apple's Siri, mobile devices have become personal, allowing us to be online
all the time, and assist us in any task, both in work and in our daily lives,
making context a crucial factor to consider.
Mobile usage is now exceeding desktop usage, and is still growing at a rapid
rate, yet our main ways of training and evaluating personal assistants are
still based on (and framed in) classical desktop interactions, focusing on
explicit queries, clicks, and dwell time spent. However, modern user
interaction with mobile devices is radically different due to touch screens
with a gesture- and voice-based control and the varying context of use, e.g.,
in a car, by bike, often invalidating the assumptions underlying today's user
satisfaction evaluation.
There is an urgent need to understand voice- and gesture-based interaction,
taking all interaction signals and context into account in appropriate ways. We
propose a research agenda for developing methods to evaluate and improve
context-aware user satisfaction with mobile interactions using gesture-based
signals at scale
On Post-Selection Inference in A/B Tests
When interpreting A/B tests, we typically focus only on the statistically
significant results and take them by face value. This practice, termed
post-selection inference in the statistical literature, may negatively affect
both point estimation and uncertainty quantification, and therefore hinder
trustworthy decision making in A/B testing. To address this issue, in this
paper we explore two seemingly unrelated paths, one based on supervised machine
learning and the other on empirical Bayes, and propose post-selection
inferential approaches that combine the strengths of both. Through large-scale
simulated and empirical examples, we demonstrate that our proposed
methodologies stand out among other existing ones in both reducing
post-selection biases and improving confidence interval coverage rates, and
discuss how they can be conveniently adjusted to real-life scenarios