12,247 research outputs found

    Trustworthy Experimentation Under Telemetry Loss

    Full text link
    Failure to accurately measure the outcomes of an experiment can lead to bias and incorrect conclusions. Online controlled experiments (aka AB tests) are increasingly being used to make decisions to improve websites as well as mobile and desktop applications. We argue that loss of telemetry data (during upload or post-processing) can skew the results of experiments, leading to loss of statistical power and inaccurate or erroneous conclusions. By systematically investigating the causes of telemetry loss, we argue that it is not practical to entirely eliminate it. Consequently, experimentation systems need to be robust to its effects. Furthermore, we note that it is nontrivial to measure the absolute level of telemetry loss in an experimentation system. In this paper, we take a top-down approach towards solving this problem. We motivate the impact of loss qualitatively using experiments in real applications deployed at scale, and formalize the problem by presenting a theoretical breakdown of the bias introduced by loss. Based on this foundation, we present a general framework for quantitatively evaluating the impact of telemetry loss, and present two solutions to measure the absolute levels of loss. This framework is used by well-known applications at Microsoft, with millions of users and billions of sessions. These general principles can be adopted by any application to improve the overall trustworthiness of experimentation and data-driven decision making.Comment: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, October 201

    Alpha Entanglement Codes: Practical Erasure Codes to Archive Data in Unreliable Environments

    Full text link
    Data centres that use consumer-grade disks drives and distributed peer-to-peer systems are unreliable environments to archive data without enough redundancy. Most redundancy schemes are not completely effective for providing high availability, durability and integrity in the long-term. We propose alpha entanglement codes, a mechanism that creates a virtual layer of highly interconnected storage devices to propagate redundant information across a large scale storage system. Our motivation is to design flexible and practical erasure codes with high fault-tolerance to improve data durability and availability even in catastrophic scenarios. By flexible and practical, we mean code settings that can be adapted to future requirements and practical implementations with reasonable trade-offs between security, resource usage and performance. The codes have three parameters. Alpha increases storage overhead linearly but increases the possible paths to recover data exponentially. Two other parameters increase fault-tolerance even further without the need of additional storage. As a result, an entangled storage system can provide high availability, durability and offer additional integrity: it is more difficult to modify data undetectably. We evaluate how several redundancy schemes perform in unreliable environments and show that alpha entanglement codes are flexible and practical codes. Remarkably, they excel at code locality, hence, they reduce repair costs and become less dependent on storage locations with poor availability. Our solution outperforms Reed-Solomon codes in many disaster recovery scenarios.Comment: The publication has 12 pages and 13 figures. This work was partially supported by Swiss National Science Foundation SNSF Doc.Mobility 162014, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN

    Mapping crime: Understanding Hotspots

    Get PDF

    Nonparametric time trends in optimal design of experiments.

    Get PDF
    When performing an experiment, the observed responses are often influenced by a temporal trend due to aging of material, learning effects, equipment wear-out, warm-up effects, etc. The construction of run orders that are optimally balanced for time trend effects relies on the incorporation of a parametric representation of the time dependence in the response model. The parameters of the time trend are then treated as nuisance parameters. However, the price one has to pay for by purely parametric modeling is the biased results when the time trend is misspecified. This paper presents a design algorithm for the construction of optimal run orders when kernel smoothing is used to model the temporal trend nonparametrically. The benefits of modeling the time trend nonparametrically are outlined. Besides, the influence of the bandwidth and the kernel function on the performance of the optimal run orders is investigated. The presented design algorithm shows to be very useful when it is hard to model the time dependence parametrically or when the functional form of the time trend is unknown. An industrial example illustrates the practical utility of the proposed design algorithm.Optimal; Trends;

    The VITI program: Final Report

    Get PDF
    In this report we present our findings and results from the VITI program in 2000. The focus of the research work undertaken by VITI has been to provide electronic meeting environments that are easy to use and afford as natural a collaboration experience as possible. This final report is structured into three parts. Part one concerns the VITI infrastructure and consists of two sections. The first section describes the process of establishing the infrastructure, concentrating on how the work was done. The second section presents the actual infrastructure that is in place today, concentrating on what has been put in place. Part two examines the use the VITI infrastructure has been put to, giving examples of activities it has supported and discussing strengths and weaknesses that have emerged through this use. Finally part three considers the future of distributed electronic meeting environments. It is recommended that the report be read in the order in which it is presented. However, each section has been written as a standalone document and can be read independently of the others

    Quality assessment technique for ubiquitous software and middleware

    Get PDF
    The new paradigm of computing or information systems is ubiquitous computing systems. The technology-oriented issues of ubiquitous computing systems have made researchers pay much attention to the feasibility study of the technologies rather than building quality assurance indices or guidelines. In this context, measuring quality is the key to developing high-quality ubiquitous computing products. For this reason, various quality models have been defined, adopted and enhanced over the years, for example, the need for one recognised standard quality model (ISO/IEC 9126) is the result of a consensus for a software quality model on three levels: characteristics, sub-characteristics, and metrics. However, it is very much unlikely that this scheme will be directly applicable to ubiquitous computing environments which are considerably different to conventional software, trailing a big concern which is being given to reformulate existing methods, and especially to elaborate new assessment techniques for ubiquitous computing environments. This paper selects appropriate quality characteristics for the ubiquitous computing environment, which can be used as the quality target for both ubiquitous computing product evaluation processes ad development processes. Further, each of the quality characteristics has been expanded with evaluation questions and metrics, in some cases with measures. In addition, this quality model has been applied to the industrial setting of the ubiquitous computing environment. These have revealed that while the approach was sound, there are some parts to be more developed in the future

    On-board processing for future satellite communications systems: Satellite-Routed FDMA

    Get PDF
    A frequency division multiple access (FDMA) 30/20 GHz satellite communications architecture without on-board baseband processing is investigated. Conceptual system designs are suggested for domestic traffic models totaling 4 Gb/s of customer premises service (CPS) traffic and 6 Gb/s of trunking traffic. Emphasis is given to the CPS portion of the system which includes thousands of earth terminals with digital traffic ranging from a single 64 kb/s voice channel to hundreds of channels of voice, data, and video with an aggregate data rate of 33 Mb/s. A unique regional design concept that effectively smooths the non-uniform traffic distribution and greatly simplifies the satellite design is employed. The satellite antenna system forms thirty-two 0.33 deg beam on both the uplinks and the downlinks in one design. In another design matched to a traffic model with more dispersed users, there are twenty-four 0.33 deg beams and twenty-one 0.7 deg beams. Detailed system design techniques show that a single satellite producing approximately 5 kW of dc power is capable of handling at least 75% of the postulated traffic. A detailed cost model of the ground segment and estimated system costs based on current information from manufacturers are presented
    • …
    corecore