21 research outputs found

    Right-sizing Server Capacity Headroom for Global Online Services

    Get PDF

    Strider: a black-box, state-based approach to change and configuration management and support

    Get PDF
    AbstractWe describe a new approach, called Strider, to Change and Configuration Management and Support (CCMS). Strider is a black-box approach: without relying on specifications, it uses state differencing to identify potential causes of differing program behaviors, uses state tracing to identify actual, run-time state dependencies, and uses statistical behavior modeling for noise filtering. Strider is a state-based approach: instead of linking vague, high level descriptions and symptoms to relevant actions, it models management and support problems in terms of individual, named pieces of low level configuration state and provides precise mappings to user-friendly information through a computer genomics database. We use troubleshooting of configuration failures to demonstrate that the Strider approach reduces problem complexity by several orders of magnitude, making root-cause analysis possible

    Optimization strategies for large scale distributed systems

    No full text
    Every day we rely on large distributed software services for communication, information, commerce, entertainment, and many other personal and business use cases. These services are globally available, sometimes with billions of users. A single service may use hundreds of thousands of interconnected servers running in datacenters around the world to ensure it is both highly available and able to provide users with sub-second responses. Operating such a service costs hundreds of millions of dollars annually and typically requires thousands of engineers working on millions of lines of software to maintain and evolve it. The scale and complexity of these systems make their optimization difficult, however, it is vital that they operate as efficiently as possible given the costs involved. This thesis focuses on two of the most impactful optimization opportunities common to these systems: minimizing their resource allocations, and minimizing the time taken to resolve request attributes into processing parameters. This work was done in conjunction with one of the largest global commercial services to show the effectiveness of the techniques on real-world systems. First, we focus on capacity planning, which has significant business and environmental impact. More than 1% (292 TWh) of global electricity is currently consumed by datacenters and that amount continues to increase. The challenge here is to determine resource needs for unexpected usage increases or failures in addition to steady-state activity. The goal is to reduce over-provisioning without impacting any service guarantees. This thesis presents a significant improvement of the state-of-the-art via a new iterative black-box capacity planning model relying only on the relationships between workload, utilization, and quality. Collaborating with one of the largest global service owners, we enabled capacity reductions between 20% and 40% saving $50,000,000 USD annually. A global datacenter usage reduction at this scale would save enough electricity to offset the annual consumption of 44 million people in the UK – eliminating 34 billion tonnes of CO₂ emissions. Second, we focus on improving the request latency by optimizing a common component across all services – resolving request attributes into processing settings. We describe a technique for translating tabular data into code and compiling it into a binary index for point look-ups on sparse data-sets. Query performance of a large commercial service improved by 58 times compared to R-tree based solutions and O(10⁸) with commercial databases. Additionally, a new compiler optimization is introduced for addressing large blocks of conditional evaluations, motivated by observing that up to 70% of all processing time is consumed by 10% or less of the requests. Machine learning was shown to be effective at guiding the compiler towards the most efficient optimization to use for blocks of conditional evaluations, resulting in a 92 times reduction in average latency

    A black-box tracing technique to identify causes of least-privilege incompatibilities

    No full text
    Abstract: Most Windows users run all the time with Administrator privileges, equivalent to root privileges on a UNIX system. The possession of Administrator privileges by every user significantly increases the vulnerability of Windows systems. For example, simply compromising a user network service, such as an instant messaging client, provides an attacker complete control of the system. We address this problem by making it easier to develop applications that do not require Administrator privileges, thereby decreasing the inconvenience of running without Administrator privileges. To this end, we present a novel tracing technique for identifying the reasons applications require Administrator privileges (which we refer to as least-privilege incompatibilities). Our evaluation on a number of real-world applications shows that our tracing technique significantly helps developers fix least-privilege incompatibilities and can also help system administrators mitigate the impact of least-privilege incompatibilities in the near term through local system policy changes. 1

    Daniels: Strider Typo-Patrol: Discovery and Analysis of Systematic Typo-Squatting. Microsoft Research

    No full text
    Typo-squatting refers to the practice of registering domain names that are typo variations of popular websites. We propose a new approach, called Strider Typo-Patrol, to discover large-scale, systematic typosquatters. We show that a large number of typosquatting domains are active and a large percentage of them are parked with a handful of major domain parking services, which serve syndicated advertisements on these domains. We also describe the Strider URL Tracer, a tool that we have released to allow website owners to systematically monitor typo-squatting domains of their sites. 1
    corecore