83 research outputs found
Greening Multi-Tenant Data Center Demand Response
Data centers have emerged as promising resources for demand response,
particularly for emergency demand response (EDR), which saves the power grid
from incurring blackouts during emergency situations. However, currently, data
centers typically participate in EDR by turning on backup (diesel) generators,
which is both expensive and environmentally unfriendly. In this paper, we focus
on "greening" demand response in multi-tenant data centers, i.e., colocation
data centers, by designing a pricing mechanism through which the data center
operator can efficiently extract load reductions from tenants during emergency
periods to fulfill energy reduction requirement for EDR. In particular, we
propose a pricing mechanism for both mandatory and voluntary EDR programs,
ColoEDR, that is based on parameterized supply function bidding and provides
provably near-optimal efficiency guarantees, both when tenants are price-taking
and when they are price-anticipating. In addition to analytic results, we
extend the literature on supply function mechanism design, and evaluate ColoEDR
using trace-based simulation studies. These validate the efficiency analysis
and conclude that the pricing mechanism is both beneficial to the environment
and to the data center operator (by decreasing the need for backup diesel
generation), while also aiding tenants (by providing payments for load
reductions).Comment: 34 pages, 6 figure
Optimizing Resource Management in Cloud Analytics Services
The fundamental challenge in the cloud today is how to build and optimize machine learning and data analytical services. Machine learning and data analytical platforms are changing computing infrastructure from expensive private data centers to easily accessible online services. These services pack user requests as jobs and run them on thousands of machines in parallel in geo-distributed clusters. The scale and the complexity of emerging jobs lead to increasing challenges for the clusters at all levels, from power infrastructure to system architecture and corresponding software framework design.
These challenges come in many forms. Today's clusters are built on commodity hardware and hardware failures are unavoidable. Resource competition, network congestion, and mixed generations of hardware make the hardware environment complex and hard to model and predict. Such heterogeneity becomes a crucial roadblock for efficient parallelization on both the task level and job level. Another challenge comes from the increasing complexity of the applications. For example, machine learning services run jobs made up of multiple tasks with complex dependency structures. This complexity leads to difficulties in framework designs. The scale, especially when services span geo-distributed clusters, leads to another important hurdle for cluster design. Challenges also come from the power infrastructure. Power infrastructure is very expensive and accounts for more than 20% of the total costs to build a cluster. Power sharing optimization to maximize the facility utilization and smooth peak hour usages is another roadblock for cluster design.
In this thesis, we focus on solutions for these challenges at the task level, on the job level, with respect to the geo-distributed data cloud design and for power management in colocation data centers.
At the task level, a crucial hurdle to achieving predictable performance is stragglers, i.e., tasks that take significantly longer than expected to run. At this point, speculative execution has been widely adopted to mitigate the impact of stragglers in simple workloads. We apply straggler mitigation for approximation jobs for the first time. We present GRASS, which carefully uses speculation to mitigate the impact of stragglers in approximation jobs. GRASS's design is based on the analysis of a model we develop to capture the optimal speculation levels for approximation jobs. Evaluations with production workloads from Facebook and Microsoft Bing in an EC2 cluster of 200 nodes show that GRASS increases accuracy of deadline-bound jobs by 47% and speeds up error-bound jobs by 38%.
Moving from task level to job level, task level speculation mechanisms are designed and operated independently of job scheduling when, in fact, scheduling a speculative copy of a task has a direct impact on the resources available for other jobs. Thus, we present Hopper, a job-level speculation-aware scheduler that integrates the tradeoffs associated with speculation into job scheduling decisions based on a model generalized from the task-level speculation model. We implement both centralized and decentralized prototypes of the Hopper scheduler and show that 50% (66%) improvements over state-of-the-art centralized (decentralized) schedulers and speculation strategies can be achieved through the coordination of scheduling and speculation.
As computing resources move from local clusters to geo-distributed cloud services, we are expecting the same transformation for data storage. We study two crucial pieces of a geo-distributed data cloud system: data acquisition and data placement. Starting from developing the optimal algorithm for the case of a data cloud made up of a single data center, we propose a near-optimal, polynomial-time algorithm for a geo-distributed data cloud in general. We show, via a case study, that the resulting design, Datum, is near-optimal (within 1.6%) in practical settings.
Efficient power management is a fundamental challenge for data centers when providing reliable services. Power oversubscription in data centers is very common and may occasionally trigger an emergency when the aggregate power demand exceeds the capacity. We study power capping solutions for handling such emergencies in a colocation data center, where the operator supplies power to multiple tenants. We propose a novel market mechanism based on supply function bidding, called COOP, to financially incentivize and coordinate tenants' power reduction for minimizing total performance loss while satisfying multiple power capping constraints. We demonstrate that COOP is "win-win", increasing the operator's profit (through oversubscription) and reducing tenants' costs (through financial compensation for their power reduction during emergencies).</p
Communication-Aware Scheduling of Precedence-Constrained Tasks on Related Machines
Scheduling precedence-constrained tasks is a classical problem that has been
studied for more than fifty years. However, little progress has been made in
the setting where there are communication delays between tasks. Results for the
case of identical machines were derived nearly thirty years ago, and yet no
results for related machines have followed. In this work, we propose a new
scheduler, Generalized Earliest Time First (GETF), and provide the first
provable, worst-case approximation guarantees for the goals of minimizing both
the makespan and total weighted completion time of tasks with precedence
constraints on related machines with machine-dependent communication times
Communication-Aware Scheduling of Precedence-Constrained Tasks
Jobs in large-scale machine learning platforms are expressed using a computational graph of tasks with precedence constraints. To handle such precedence-constrained tasks that have machine-dependent communication demands in settings with heterogeneous service rates and communication times, we propose a new scheduling framework, Generalized Earliest Time First (GETF), that improves upon stateof- the-art results in the area. Specifically, we provide the first provable, worst-case approximation guarantee for the goal of minimizing the makespan of tasks with precedence constraints on related machines with machine-dependent communication times
Quantum Correlation Sharing: A Review On Recent Progress From Nonlocality To Other Non-Classical Correlations
This review offers a comprehensive exploration and synthesis of recent
advancements in the domain of quantum correlation sharing facilitated through
sequential measurements. We initiate our inquiry by delving into the
interpretation of the joint probability, laying the foundation for an
examination of quantum correlations within the context of specific measurement
methods. The subsequent section meticulously explores nonlocal sharing under
diverse measurement strategies and scenarios, with a specific focus on
investigating the impact of these strategies on the dissemination of quantum
nonlocality. Key perspectives such as "asymmetry" and "weak value" are
scrutinized through detailed analyses across various scenarios, allowing us to
evaluate the potential of nonlocality sharing. We also provide a retrospective
overview of experimental endeavors associated with this phenomenon. The third
part of our exploration presents research findings on steering sharing,
offering clarity on the feasibility of steering sharing and summarizing the
distinctive properties of quantum steering sharing in different scenarios.
Continuing our journey, the fourth section delves into discussions on the
sharing of diverse quantum correlations, encompassing network nonlocality,
quantum entanglement, and quantum contextuality. Moving forward, the fifth
section conducts a comprehensive review of the progress in the application of
quantum correlation sharing, specifically based on sequential measurement
strategies. Applications such as quantum random access coding, random number
generation, and self-testing tasks are highlighted. Finally, we discuss and
list some of the key unresolved issues in this research field, and conclude the
entire article
- …