522 research outputs found

    Trustworthy Experimentation Under Telemetry Loss

    Full text link
    Failure to accurately measure the outcomes of an experiment can lead to bias and incorrect conclusions. Online controlled experiments (aka AB tests) are increasingly being used to make decisions to improve websites as well as mobile and desktop applications. We argue that loss of telemetry data (during upload or post-processing) can skew the results of experiments, leading to loss of statistical power and inaccurate or erroneous conclusions. By systematically investigating the causes of telemetry loss, we argue that it is not practical to entirely eliminate it. Consequently, experimentation systems need to be robust to its effects. Furthermore, we note that it is nontrivial to measure the absolute level of telemetry loss in an experimentation system. In this paper, we take a top-down approach towards solving this problem. We motivate the impact of loss qualitatively using experiments in real applications deployed at scale, and formalize the problem by presenting a theoretical breakdown of the bias introduced by loss. Based on this foundation, we present a general framework for quantitatively evaluating the impact of telemetry loss, and present two solutions to measure the absolute levels of loss. This framework is used by well-known applications at Microsoft, with millions of users and billions of sessions. These general principles can be adopted by any application to improve the overall trustworthiness of experimentation and data-driven decision making.Comment: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, October 201

    Addressing Hidden Imperfections in Online Experimentation

    Full text link
    Technology companies are increasingly using randomized controlled trials (RCTs) as part of their development process. Despite having fine control over engineering systems and data instrumentation, these RCTs can still be imperfectly executed. In fact, online experimentation suffers from many of the same biases seen in biomedical RCTs including opt-in and user activity bias, selection bias, non-compliance with the treatment, and more generally, challenges in the ability to test the question of interest. The result of these imperfections can lead to a bias in the estimated causal effect, a loss in statistical power, an attenuation of the effect, or even a need to reframe the question that can be answered. This paper aims to make practitioners of experimentation more aware of imperfections in technology-industry RCTs, which can be hidden throughout the engineering stack or in the design process.Comment: Presented at CODE@MIT 202

    Trust Management and Security in Satellite Telecommand Processing

    Get PDF
    New standards and initiatives in satellite system architecture are moving the space industry to more open and efficient mission operations. Primarily, these standards allow multiple missions to share standard ground and space based resources to reduce mission development and sustainment costs. With the benefits of these new concepts comes added risk associated with threats to the security of our critical space assets in a contested space and cyberspace domain. As one method to mitigate threats to space missions, this research develops, implements, and tests the Consolidated Trust Management System (CTMS) for satellite flight software. The CTMS architecture was developed using design requirements and features of Trust Management Systems (TMS) presented in the field of distributed information systems. This research advances the state of the art with the CTMS by refining and consolidating existing TMS theory and applying it to satellite systems. The feasibility and performance of this new CTMS architecture is demonstrated with a realistic implementation in satellite flight software and testing in an emulated satellite system environment. The system is tested with known threat modeling techniques and a specific forgery attack abuse case of satellite telecommanding functions. The CTMS test results show the promise of this technique to enhance security in satellite flight software telecommand processing. With this work, a new class of satellite protection mechanisms is established, which addresses the complex security issues facing satellite operations today. This work also fills a critical shortfall in validated security mechanisms for implementation in both public and private sector satellite systems

    Making Distribution State Estimation Practical: Challenges and Opportunities

    Full text link
    In increasingly digitalized and metered distribution networks, state estimation is generally recognized as a key enabler of advanced network management functionalities. However, despite decades of research, the real-life adoption of state estimation in distribution systems remains sporadic. This systematization of knowledge paper discusses the cause for this while comparing industrial and academic experiences and reviewing well- and less-established research directions. We argue that to make distribution system state estimation more practical and applicable in the field, new perspectives are needed. In particular, research should move away from conventional approaches and embrace generalized problem specifications and more comprehensive workflows. These, in turn, require algorithm advancements and more general mathematical formulations. We discuss lines of work to enable the delivery of tangible research.Comment: 10 page

    Understanding and Improving Continuous Experimentation : From A/B Testing to Continuous Software Optimization

    Get PDF
    Controlled experiments (i.e. A/B tests) are used by many companies with user-intensive products to improve their software with user data. Some companies adopt an experiment-driven approach to software development with continuous experimentation (CE). With CE, every user-affecting software change is evaluated in an experiment and specialized roles seek out opportunities to experiment with functionality. The goal of the thesis is to describe current practice and support CE in industry. The main contributions are threefold. First, a review of the CE literature on: infrastructure and processes, the problem-solution pairs applied in industry practice, and the benefits and challenges of the practice. Second, a multi-case study with 12 companies to analyze how experimentation is used and why some companies fail to fully realize the benefits of CE. A theory for Factors Affecting Continuous Experimentation (FACE) is constructed to realize this goal. Finally, a toolkit called Constraint Oriented Multi-variate Bandit Optimization (COMBO) is developed for supporting automated experimentation with many variables simultaneously, live in a production environment.The research in the thesis is conducted under the design science paradigm using empirical research methods, with simulation experiments of tool proposals and a multi-case study on company usage of CE. Other research methods include systematic literature review and theory building.From FACE we derive three factors that explain CE utility: (1) investments in data infrastructure, (2) user problem complexity, and (3) incentive structures for experimentation. Guidelines are provided on how to strive towards state-of-the-art CE based on company factors. All three factors are relevant for companies wanting to use CE, in particular, for those companies wanting to apply algorithms such as those in COMBO to support personalization of software to users' context in a process of continuous optimization
    • …
    corecore