439 research outputs found
Changing the focus: worker-centric optimization in human-in-the-loop computations
A myriad of emerging applications from simple to complex ones involve human cognizance in the computation loop. Using the wisdom of human workers, researchers have solved a variety of problems, termed as “micro-tasks” such as, captcha recognition, sentiment analysis, image categorization, query processing, as well as “complex tasks” that are often collaborative, such as, classifying craters on planetary surfaces, discovering new galaxies (Galaxyzoo), performing text translation. The current view of “humans-in-the-loop” tends to see humans as machines, robots, or low-level agents used or exploited in the service of broader computation goals. This dissertation is developed to shift the focus back to humans, and study different data analytics problems, by recognizing characteristics of the human workers, and how to incorporate those in a principled fashion inside the computation loop.
The first contribution of this dissertation is to propose an optimization framework and a real world system to personalize worker’s behavior by developing a worker model and using that to better understand and estimate task completion time. The framework judiciously frames questions and solicits worker feedback on those to update the worker model. Next, improving workers skills through peer interaction during collaborative task completion is studied. A suite of optimization problems are identified in that context considering collaborativeness between the members as it plays a major role in peer learning. Finally, “diversified” sequence of work sessions for human workers is designed to improve worker satisfaction and engagement while completing tasks
SafeNet: The Unreasonable Effectiveness of Ensembles in Private Collaborative Learning
Secure multiparty computation (MPC) has been proposed to allow multiple
mutually distrustful data owners to jointly train machine learning (ML) models
on their combined data. However, by design, MPC protocols faithfully compute
the training functionality, which the adversarial ML community has shown to
leak private information and can be tampered with in poisoning attacks. In this
work, we argue that model ensembles, implemented in our framework called
SafeNet, are a highly MPC-amenable way to avoid many adversarial ML attacks.
The natural partitioning of data amongst owners in MPC training allows this
approach to be highly scalable at training time, provide provable protection
from poisoning attacks, and provably defense against a number of privacy
attacks. We demonstrate SafeNet's efficiency, accuracy, and resilience to
poisoning on several machine learning datasets and models trained in end-to-end
and transfer learning scenarios. For instance, SafeNet reduces backdoor attack
success significantly, while achieving faster training and less communication than the four-party MPC framework of Dalskov et al.
Our experiments show that ensembling retains these benefits even in many
non-iid settings. The simplicity, cheap setup, and robustness properties of
ensembling make it a strong first choice for training ML models privately in
MPC
Prochlo: Strong Privacy for Analytics in the Crowd
The large-scale monitoring of computer users' software activities has become
commonplace, e.g., for application telemetry, error reporting, or demographic
profiling. This paper describes a principled systems architecture---Encode,
Shuffle, Analyze (ESA)---for performing such monitoring with high utility while
also protecting user privacy. The ESA design, and its Prochlo implementation,
are informed by our practical experiences with an existing, large deployment of
privacy-preserving software monitoring.
(cont.; see the paper
- …