968 research outputs found

    TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for On-line Data-Intensive Applications

    Get PDF
    Datacenters running on-line, data-intensive applications (OLDIs) consume significant amounts of energy. However, reducing their energy is challenging due to their tight response time requirements. A key aspect of OLDIs is that each user query goes to all or many of the nodes in the cluster, so that the overall time budget is dictated by the tail of the replies' latency distribution; replies see latency variations both in the network and compute. Previous work proposes to achieve load-proportional energy by slowing down the computation at lower datacenter loads based directly on response times (i.e., at lower loads, the proposal exploits the average slack in the time budget provisioned for the peak load). In contrast, we propose TimeTrader to reduce energy by exploiting the latency slack in the sub- critical replies which arrive before the deadline (e.g., 80% of replies are 3-4x faster than the tail). This slack is present at all loads and subsumes the previous work's load-related slack. While the previous work shifts the leaves' response time distribution to consume the slack at lower loads, TimeTrader reshapes the distribution at all loads by slowing down individual sub-critical nodes without increasing missed deadlines. TimeTrader exploits slack in both the network and compute budgets. Further, TimeTrader leverages Earliest Deadline First scheduling to largely decouple critical requests from the queuing delays of sub- critical requests which can then be slowed down without hurting critical requests. A combination of real-system measurements and at-scale simulations shows that without adding to missed deadlines, TimeTrader saves 15-19% and 41-49% energy at 90% and 30% loading, respectively, in a datacenter with 512 nodes, whereas previous work saves 0% and 31-37%.Comment: 13 page

    Sprinklers: A Randomized Variable-Size Striping Approach to Reordering-Free Load-Balanced Switching

    Full text link
    Internet traffic continues to grow exponentially, calling for switches that can scale well in both size and speed. While load-balanced switches can achieve such scalability, they suffer from a fundamental packet reordering problem. Existing proposals either suffer from poor worst-case packet delays or require sophisticated matching mechanisms. In this paper, we propose a new family of stable load-balanced switches called "Sprinklers" that has comparable implementation cost and performance as the baseline load-balanced switch, but yet can guarantee packet ordering. The main idea is to force all packets within the same virtual output queue (VOQ) to traverse the same "fat path" through the switch, so that packet reordering cannot occur. At the core of Sprinklers are two key innovations: a randomized way to determine the "fat path" for each VOQ, and a way to determine its "fatness" roughly in proportion to the rate of the VOQ. These innovations enable Sprinklers to achieve near-perfect load-balancing under arbitrary admissible traffic. Proving this property rigorously using novel worst-case large deviation techniques is another key contribution of this work

    Optimal scheduling algorithms for input-queued switches

    Get PDF

    On the performance of STDMA Link Scheduling and Switched Beamforming Antennas in Wireless Mesh Networks

    Get PDF
    Projecte final de carrera realitzat en col.laboració amb King's College LondonWireless Mesh Networks (WMNs) aim to revolutionize Internet connectivity due to its high throughput, cost-e ectiveness and ease deployment by providing last mile connectivity and/or backhaul support to di erent cellular networks. In order not to jeopardize their successful deployment, several key issues must be investigated and overcome to fully realize its potential. For WMNs that utilize Spatial Reuse TDMA as the medium access control, link scheduling still requires further enhancements. The rst main contribution of this thesis is a fast randomized parallel link swap based packing (RSP) algorithm for timeslot allocation in a spatial time division multiple access (STDMA) wireless mesh network. The proposed randomized algorithm extends several greedy scheduling algorithms that utilize the physical interference model by applying a local search that leads to a substantial improvement in the spatial timeslot reuse. Numerical simulations reveal that compared to previously scheduling schemes the proposed randomized algorithm can achieve a performance gain of up to 11%. A signi cant bene t of the proposed scheme is that the computations can be parallelized and therefore can e ciently utilize commoditized and emerging multi-core and/or multi-CPU processors. Furthermore, the use of selectable multi-beam directional antennas in WMNs, such as beam switched phase array antennas, can assist to signi cantly enhance the overall reuse of timeslots by reducing interference levels across the network and thereby increasing the spectral e ciency of the system. To perform though a switch on the antenna beam it may require up to 0.25 ms in practical deployed networks, while at the same time very frequent beam switchings can a ect frame acquisition and overall reliability of the deployed mesh network. The second key contribution of this thesis is a set of algorithms that minimize the overall number of required beam switchings in the mesh network without penalizing the spatial reuse of timeslots, i.e., keeping the same overall frame length in the network. Numerical investigations reveal that the proposed set of algorithms can reduce the number of beam switchings by almost 90% without a ecting the frame length of the network

    Methods and Applications of Synthetic Data Generation

    Get PDF
    The advent of data mining and machine learning has highlighted the value of large and varied sources of data, while increasing the demand for synthetic data captures the structural and statistical characteristics of the original data without revealing personal or proprietary information contained in the original dataset. In this dissertation, we use examples from original research to show that, using appropriate models and input parameters, synthetic data that mimics the characteristics of real data can be generated with sufficient rate and quality to address the volume, structural complexity, and statistical variation requirements of research and development of digital information processing systems. First, we present a progression of research studies using a variety of tools to generate synthetic network traffic patterns, enabling us to observe relationships between network latency and communication pattern benchmarks at all levels of the network stack. We then present a framework for synthesizing large scale IoT data with complex structural characteristics in a scalable extraction and synthesis framework, and demonstrate the use of generated data in the benchmarking of IoT middleware. Finally, we detail research on synthetic image generation for deep learning models using 3D modeling. We find that synthetic images can be an effective technique for augmenting limited sets of real training data, and in use cases that benefit from incremental training or model specialization, we find that pretraining on synthetic images provided a usable base model for transfer learning

    Asymptotically Optimal Approximation Algorithms for Coflow Scheduling

    Full text link
    Many modern datacenter applications involve large-scale computations composed of multiple data flows that need to be completed over a shared set of distributed resources. Such a computation completes when all of its flows complete. A useful abstraction for modeling such scenarios is a {\em coflow}, which is a collection of flows (e.g., tasks, packets, data transmissions) that all share the same performance goal. In this paper, we present the first approximation algorithms for scheduling coflows over general network topologies with the objective of minimizing total weighted completion time. We consider two different models for coflows based on the nature of individual flows: circuits, and packets. We design constant-factor polynomial-time approximation algorithms for scheduling packet-based coflows with or without given flow paths, and circuit-based coflows with given flow paths. Furthermore, we give an O(logn/loglogn)O(\log n/\log \log n)-approximation polynomial time algorithm for scheduling circuit-based coflows where flow paths are not given (here nn is the number of network edges). We obtain our results by developing a general framework for coflow schedules, based on interval-indexed linear programs, which may extend to other coflow models and objective functions and may also yield improved approximation bounds for specific network scenarios. We also present an experimental evaluation of our approach for circuit-based coflows that show a performance improvement of at least 22% on average over competing heuristics.Comment: Fixed minor typo

    On packet switch design

    Get PDF

    A General Class of Throughput Optimal Routing Policies in Multi-hop Wireless Networks

    Full text link
    This paper considers the problem of throughput optimal routing/scheduling in a multi-hop constrained queueing network with random connectivity whose special case includes opportunistic multi-hop wireless networks and input-queued switch fabrics. The main challenge in the design of throughput optimal routing policies is closely related to identifying appropriate and universal Lyapunov functions with negative expected drift. The few well-known throughput optimal policies in the literature are constructed using simple quadratic or exponential Lyapunov functions of the queue backlogs and as such they seek to balance the queue backlogs across network independent of the topology. By considering a class of continuous, differentiable, and piece-wise quadratic Lyapunov functions, this paper provides a large class of throughput optimal routing policies. The proposed class of Lyapunov functions allow for the routing policy to control the traffic along short paths for a large portion of state-space while ensuring a negative expected drift. This structure enables the design of a large class of routing policies. In particular, and in addition to recovering the throughput optimality of the well known backpressure routing policy, an opportunistic routing policy with congestion diversity is proved to be throughput optimal.Comment: 31 pages (one column), 8 figures, (revision submitted to IEEE Transactions on Information Theory

    Efficient Measurement on Programmable Switches Using Probabilistic Recirculation

    Full text link
    Programmable network switches promise flexibility and high throughput, enabling applications such as load balancing and traffic engineering. Network measurement is a fundamental building block for such applications, including tasks such as the identification of heavy hitters (largest flows) or the detection of traffic changes. However, high-throughput packet processing architectures place certain limitations on the programming model, such as restricted branching, limited capability for memory access, and a limited number of processing stages. These limitations restrict the types of measurement algorithms that can run on programmable switches. In this paper, we focus on the RMT programmable high-throughput switch architecture, and carefully examine its constraints on designing measurement algorithms. We demonstrate our findings while solving the heavy hitter problem. We introduce PRECISION, an algorithm that uses \emph{Probabilistic Recirculation} to find top flows on a programmable switch. By recirculating a small fraction of packets, PRECISION simplifies the access to stateful memory to conform with RMT limitations and achieves higher accuracy than previous heavy hitter detection algorithms that avoid recirculation. We also analyze the effect of each architectural constraint on the measurement accuracy and provide insights for measurement algorithm designers.Comment: To appear in IEEE ICNP 201
    corecore