968 research outputs found
TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for On-line Data-Intensive Applications
Datacenters running on-line, data-intensive applications (OLDIs) consume
significant amounts of energy. However, reducing their energy is challenging
due to their tight response time requirements. A key aspect of OLDIs is that
each user query goes to all or many of the nodes in the cluster, so that the
overall time budget is dictated by the tail of the replies' latency
distribution; replies see latency variations both in the network and compute.
Previous work proposes to achieve load-proportional energy by slowing down the
computation at lower datacenter loads based directly on response times (i.e.,
at lower loads, the proposal exploits the average slack in the time budget
provisioned for the peak load). In contrast, we propose TimeTrader to reduce
energy by exploiting the latency slack in the sub- critical replies which
arrive before the deadline (e.g., 80% of replies are 3-4x faster than the
tail). This slack is present at all loads and subsumes the previous work's
load-related slack. While the previous work shifts the leaves' response time
distribution to consume the slack at lower loads, TimeTrader reshapes the
distribution at all loads by slowing down individual sub-critical nodes without
increasing missed deadlines. TimeTrader exploits slack in both the network and
compute budgets. Further, TimeTrader leverages Earliest Deadline First
scheduling to largely decouple critical requests from the queuing delays of
sub- critical requests which can then be slowed down without hurting critical
requests. A combination of real-system measurements and at-scale simulations
shows that without adding to missed deadlines, TimeTrader saves 15-19% and
41-49% energy at 90% and 30% loading, respectively, in a datacenter with 512
nodes, whereas previous work saves 0% and 31-37%.Comment: 13 page
Sprinklers: A Randomized Variable-Size Striping Approach to Reordering-Free Load-Balanced Switching
Internet traffic continues to grow exponentially, calling for switches that
can scale well in both size and speed. While load-balanced switches can achieve
such scalability, they suffer from a fundamental packet reordering problem.
Existing proposals either suffer from poor worst-case packet delays or require
sophisticated matching mechanisms. In this paper, we propose a new family of
stable load-balanced switches called "Sprinklers" that has comparable
implementation cost and performance as the baseline load-balanced switch, but
yet can guarantee packet ordering. The main idea is to force all packets within
the same virtual output queue (VOQ) to traverse the same "fat path" through the
switch, so that packet reordering cannot occur. At the core of Sprinklers are
two key innovations: a randomized way to determine the "fat path" for each VOQ,
and a way to determine its "fatness" roughly in proportion to the rate of the
VOQ. These innovations enable Sprinklers to achieve near-perfect load-balancing
under arbitrary admissible traffic. Proving this property rigorously using
novel worst-case large deviation techniques is another key contribution of this
work
On the performance of STDMA Link Scheduling and Switched Beamforming Antennas in Wireless Mesh Networks
Projecte final de carrera realitzat en col.laboració amb King's College LondonWireless Mesh Networks (WMNs) aim to revolutionize Internet connectivity due to
its high throughput, cost-e ectiveness and ease deployment by providing last mile
connectivity and/or backhaul support to di erent cellular networks. In order not to
jeopardize their successful deployment, several key issues must be investigated and
overcome to fully realize its potential. For WMNs that utilize Spatial Reuse TDMA
as the medium access control, link scheduling still requires further enhancements.
The rst main contribution of this thesis is a fast randomized parallel link swap
based packing (RSP) algorithm for timeslot allocation in a spatial time division multiple
access (STDMA) wireless mesh network. The proposed randomized algorithm
extends several greedy scheduling algorithms that utilize the physical interference
model by applying a local search that leads to a substantial improvement in the
spatial timeslot reuse. Numerical simulations reveal that compared to previously
scheduling schemes the proposed randomized algorithm can achieve a performance
gain of up to 11%. A signi cant bene t of the proposed scheme is that the computations
can be parallelized and therefore can e ciently utilize commoditized and
emerging multi-core and/or multi-CPU processors.
Furthermore, the use of selectable multi-beam directional antennas in WMNs,
such as beam switched phase array antennas, can assist to signi cantly enhance
the overall reuse of timeslots by reducing interference levels across the network and
thereby increasing the spectral e ciency of the system. To perform though a switch
on the antenna beam it may require up to 0.25 ms in practical deployed networks,
while at the same time very frequent beam switchings can a ect frame acquisition
and overall reliability of the deployed mesh network.
The second key contribution of this thesis is a set of algorithms that minimize the
overall number of required beam switchings in the mesh network without penalizing
the spatial reuse of timeslots, i.e., keeping the same overall frame length in the
network. Numerical investigations reveal that the proposed set of algorithms can
reduce the number of beam switchings by almost 90% without a ecting the frame
length of the network
Methods and Applications of Synthetic Data Generation
The advent of data mining and machine learning has highlighted the value of large and varied sources of data, while increasing the demand for synthetic data captures the structural and statistical characteristics of the original data without revealing personal or proprietary information contained in the original dataset.
In this dissertation, we use examples from original research to show that, using appropriate models and input parameters, synthetic data that mimics the characteristics of real data can be generated with sufficient rate and quality to address the volume, structural complexity, and statistical variation requirements of research and development of digital information processing systems.
First, we present a progression of research studies using a variety of tools to generate synthetic network traffic patterns, enabling us to observe relationships between network latency and communication pattern benchmarks at all levels of the network stack.
We then present a framework for synthesizing large scale IoT data with complex structural characteristics in a scalable extraction and synthesis framework, and demonstrate the use of generated data in the benchmarking of IoT middleware.
Finally, we detail research on synthetic image generation for deep learning models using 3D modeling. We find that synthetic images can be an effective technique for augmenting limited sets of real training data, and in use cases that benefit from incremental training or model specialization, we find that pretraining on synthetic images provided a usable base model for transfer learning
Asymptotically Optimal Approximation Algorithms for Coflow Scheduling
Many modern datacenter applications involve large-scale computations composed
of multiple data flows that need to be completed over a shared set of
distributed resources. Such a computation completes when all of its flows
complete. A useful abstraction for modeling such scenarios is a {\em coflow},
which is a collection of flows (e.g., tasks, packets, data transmissions) that
all share the same performance goal.
In this paper, we present the first approximation algorithms for scheduling
coflows over general network topologies with the objective of minimizing total
weighted completion time. We consider two different models for coflows based on
the nature of individual flows: circuits, and packets. We design
constant-factor polynomial-time approximation algorithms for scheduling
packet-based coflows with or without given flow paths, and circuit-based
coflows with given flow paths. Furthermore, we give an -approximation polynomial time algorithm for scheduling circuit-based
coflows where flow paths are not given (here is the number of network
edges).
We obtain our results by developing a general framework for coflow schedules,
based on interval-indexed linear programs, which may extend to other coflow
models and objective functions and may also yield improved approximation bounds
for specific network scenarios. We also present an experimental evaluation of
our approach for circuit-based coflows that show a performance improvement of
at least 22% on average over competing heuristics.Comment: Fixed minor typo
A General Class of Throughput Optimal Routing Policies in Multi-hop Wireless Networks
This paper considers the problem of throughput optimal routing/scheduling in
a multi-hop constrained queueing network with random connectivity whose special
case includes opportunistic multi-hop wireless networks and input-queued switch
fabrics. The main challenge in the design of throughput optimal routing
policies is closely related to identifying appropriate and universal Lyapunov
functions with negative expected drift. The few well-known throughput optimal
policies in the literature are constructed using simple quadratic or
exponential Lyapunov functions of the queue backlogs and as such they seek to
balance the queue backlogs across network independent of the topology. By
considering a class of continuous, differentiable, and piece-wise quadratic
Lyapunov functions, this paper provides a large class of throughput optimal
routing policies. The proposed class of Lyapunov functions allow for the
routing policy to control the traffic along short paths for a large portion of
state-space while ensuring a negative expected drift. This structure enables
the design of a large class of routing policies. In particular, and in addition
to recovering the throughput optimality of the well known backpressure routing
policy, an opportunistic routing policy with congestion diversity is proved to
be throughput optimal.Comment: 31 pages (one column), 8 figures, (revision submitted to IEEE
Transactions on Information Theory
Efficient Measurement on Programmable Switches Using Probabilistic Recirculation
Programmable network switches promise flexibility and high throughput,
enabling applications such as load balancing and traffic engineering. Network
measurement is a fundamental building block for such applications, including
tasks such as the identification of heavy hitters (largest flows) or the
detection of traffic changes.
However, high-throughput packet processing architectures place certain
limitations on the programming model, such as restricted branching, limited
capability for memory access, and a limited number of processing stages. These
limitations restrict the types of measurement algorithms that can run on
programmable switches. In this paper, we focus on the RMT programmable
high-throughput switch architecture, and carefully examine its constraints on
designing measurement algorithms. We demonstrate our findings while solving the
heavy hitter problem.
We introduce PRECISION, an algorithm that uses \emph{Probabilistic
Recirculation} to find top flows on a programmable switch. By recirculating a
small fraction of packets, PRECISION simplifies the access to stateful memory
to conform with RMT limitations and achieves higher accuracy than previous
heavy hitter detection algorithms that avoid recirculation. We also analyze the
effect of each architectural constraint on the measurement accuracy and provide
insights for measurement algorithm designers.Comment: To appear in IEEE ICNP 201
- …