6 research outputs found

    Cliffhanger: Scaling Performance Cliffs in Web Memory Caches

    Get PDF
    Web-scale applications are heavily reliant on memory cache systems such as Memcached to improve throughput and reduce user latency. Small performance improvements in these systems can result in large end-to-end gains. For example, a marginal increase in hit rate of 1% can reduce the application layer latency by over 35%. However, existing web cache resource allocation policies are workload oblivious and first-come-first-serve. By analyzing measurements from a widely used caching service, Memcachier, we demonstrate that existing cache allocation techniques leave significant room for improvement. We develop Cliffhanger, a lightweight iterative algorithm that runs on memory cache servers, which incrementally optimizes the resource allocations across and within applications based on dynamically changing workloads. It has been shown that cache allocation algorithms underperform when there are performance cliffs, in which minor changes in cache allocation cause large changes in the hit rate. We design a novel technique for dealing with performance cliffs incrementally and locally. We demonstrate that for the Memcachier applications, on average, Cliffhanger increases the overall hit rate 1.2%, reduces the total number of cache misses by 36.7% and achieves the same hit rate with 45% less memory capacity

    Hardware-Software Co-Design for Network Performance Measurement

    Get PDF
    Diagnosing performance problems in networks is important, for example to determine where packets experience high latency or loss. However, existing performance diagnoses are constrained by limited switch mechanisms for measurement. Alternatively, operators use endpoint information indirectly to infer root causes for problematic latency or drops. Instead of designing piecemeal solutions to work around such switch restrictions, we believe that the right approach is to co-design language abstractions and switch hardware primitives for network performance measurement. This approach provides confidence that the switch primitives are sufficiently general, i.e., they can support a variety of existing and unanticipated use cases. We present a declarative query language that allows operators to ask a diverse set of network performance questions. We show that these queries can be implemented efficiently in switch hardware using a novel programmable key-value store primitive. Our preliminary evaluations show that our hardware design is feasible at modest chip area overhead relative to existing switching chips

    Costly circuits, submodular schedules and approximate Carathéodory Theorems

    No full text
    Hybrid switching—in which a high bandwidth circuit switch (optical or wireless) is used in conjunction with a low bandwidth packet switch—is a promising alternative to interconnect servers in today’s large-scale data centers. Circuit switches offer a very high link rate, but incur a non-trivial reconfiguration delay which makes their scheduling challenging. In this paper, we demonstrate a lightweight, simple and nearly optimal scheduling algorithm that trades off reconfiguration costs with the benefits of reconfiguration that match the traffic demands. Seen alternatively, the algorithm provides a fast and approximate solution toward a constructive version of Carathéodory’s Theorem for the Birkhoff polytope. The algorithm also has strong connections to submodular optimization, achieves a performance at least half that of the optimal schedule and strictly outperforms the state of the art in a variety of traffic demand settings. These ideas naturally generalize: we see that indirect routing leads to exponential connectivity; this is another phenomenon of the power of multi-hop routing, distinct from the well-known load balancing effects. Keywords: Data center networks, Bridges and switches, Circuit networks, Network flows, Submodular optimization, Approximation algorithmsNational Science Foundation (U.S.) (Grant CCF-1409106)National Science Foundation (U.S.) (Grant NeTS-1718270)United States. Army (Grant W911NF-14-1-0220

    RoadTracer: Automatic Extraction of Road Networks from Aerial Images

    No full text
    Mapping road networks is currently both expensive and labor-intensive. High-resolution aerial imagery provides a promising avenue to automatically infer a road network. Prior work uses convolutional neural networks (CNNs) to detect which pixels belong to a road (segmentation), and then uses complex post-processing heuristics to infer graph connectivity. We show that these segmentation methods have high error rates because noisy CNN outputs are difficult to correct. We propose RoadTracer, a new method to automatically construct accurate road network maps from aerial images. RoadTracer uses an iterative search process guided by a CNN-based decision function to derive the road network graph directly from the output of the CNN. We compare our approach with a segmentation method on fifteen cities, and find that at a 5% error rate, RoadTracer correctly captures 45% more junctions across these cities.Qatar Computing Research Institut

    BeeCluster: drone orchestration via predictive optimization

    No full text
    The rapid development of small aerial drones has enabled numerous drone-based applications, e.g., geographic mapping, air pollution sensing, and search and rescue. To assist the development of these applications, we propose BeeCluster, a drone orchestration system that manages a fleet of drones. BeeCluster provides a virtual drone abstraction that enables developers to express a sequence of geographical sensing tasks, and determines how to map these tasks to the fleet efficiently. BeeCluster's core contribution is predictive optimization, in which an inferred model of the future tasks of the application is used to generate an optimized flight and sensing schedule for the drones that aims to minimize the total expected execution time. We built a prototype of BeeCluster and evaluated it on five real-world case studies with drones in outdoor environments, measuring speedups from 11.6% to 23.9%

    Programmable Packet Scheduling at Line Rate

    No full text
    Switches today provide a small menu of scheduling algorithms. While we can tweak scheduling parameters, we cannot modify algorithmic logic, or add a completely new algorithm, after the switch has been designed. This paper presents a design for a {\em programmable} packet scheduler, which allows scheduling algorithms---potentially algorithms that are unknown today---to be programmed into a switch without requiring hardware redesign. Our design uses the property that scheduling algorithms make two decisions: in what order to schedule packets and when to schedule them. Further, we observe that in many scheduling algorithms, definitive decisions on these two questions can be made when packets are enqueued. We use these observations to build a programmable scheduler using a single abstraction: the push-in first-out queue (PIFO), a priority queue that maintains the scheduling order or time. We show that a PIFO-based scheduler lets us program a wide variety of scheduling algorithms. We present a hardware design for this scheduler for a 64-port 10 Gbit/s shared-memory (output-queued) switch. Our design costs an additional 4% in chip area. In return, it lets us program many sophisticated algorithms, such as a 5-level hierarchical scheduler with programmable decisions at each level.National Science Foundation (U.S.) (Grant CNS-1563826)Cisco Research Cente
    corecore