4,973 research outputs found
A multi-agent based system to enable strategic and operational design coordination
This paper presents two systems which individually focus on different aspects of design coordination, namely strategic and operational. The systems were developed in parallel and individually contain related models that represent specific frames from a Design Coordination Framework developed by Andreasen et al. [1]. The focus of the strategic design management system is the management of design tasks, decisions, information, goals and rationale within the design process, whereas the focus of the operational design coordination system is the coordination of tasks and activities with respect to the near-optimal utilisation of available resources. A common interface exists which enables the two systems to be integrated and used as a single system with the aim of managing both strategicand operational design coordination. Hence, the objective of this work is to enable the design process to be conducted in a timely and appropriate manner
SQUASH: Simple QoS-Aware High-Performance Memory Scheduler for Heterogeneous Systems with Hardware Accelerators
Modern SoCs integrate multiple CPU cores and Hardware Accelerators (HWAs)
that share the same main memory system, causing interference among memory
requests from different agents. The result of this interference, if not
controlled well, is missed deadlines for HWAs and low CPU performance.
State-of-the-art mechanisms designed for CPU-GPU systems strive to meet a
target frame rate for GPUs by prioritizing the GPU close to the time when it
has to complete a frame. We observe two major problems when such an approach is
adapted to a heterogeneous CPU-HWA system. First, HWAs miss deadlines because
they are prioritized only close to their deadlines. Second, such an approach
does not consider the diverse memory access characteristics of different
applications running on CPUs and HWAs, leading to low performance for
latency-sensitive CPU applications and deadline misses for some HWAs, including
GPUs.
In this paper, we propose a Simple Quality of service Aware memory Scheduler
for Heterogeneous systems (SQUASH), that overcomes these problems using three
key ideas, with the goal of meeting deadlines of HWAs while providing high CPU
performance. First, SQUASH prioritizes a HWA when it is not on track to meet
its deadline any time during a deadline period. Second, SQUASH prioritizes HWAs
over memory-intensive CPU applications based on the observation that the
performance of memory-intensive applications is not sensitive to memory
latency. Third, SQUASH treats short-deadline HWAs differently as they are more
likely to miss their deadlines and schedules their requests based on worst-case
memory access time estimates.
Extensive evaluations across a wide variety of different workloads and
systems show that SQUASH achieves significantly better CPU performance than the
best previous scheduler while always meeting the deadlines for all HWAs,
including GPUs, thereby largely improving frame rates
Tromino: Demand and DRF Aware Multi-Tenant Queue Manager for Apache Mesos Cluster
Apache Mesos, a two-level resource scheduler, provides resource sharing
across multiple users in a multi-tenant cluster environment. Computational
resources (i.e., CPU, memory, disk, etc. ) are distributed according to the
Dominant Resource Fairness (DRF) policy. Mesos frameworks (users) receive
resources based on their current usage and are responsible for scheduling their
tasks within the allocation. We have observed that multiple frameworks can
cause fairness imbalance in a multiuser environment. For example, a greedy
framework consuming more than its fair share of resources can deny resource
fairness to others. The user with the least Dominant Share is considered first
by the DRF module to get its resource allocation. However, the default DRF
implementation, in Apache Mesos' Master allocation module, does not consider
the overall resource demands of the tasks in the queue for each user/framework.
This lack of awareness can result in users without any pending task receiving
more resource offers while users with a queue of pending tasks starve due to
their high dominant shares. We have developed a policy-driven queue manager,
Tromino, for an Apache Mesos cluster where tasks for individual frameworks can
be scheduled based on each framework's overall resource demands and current
resource consumption. Dominant Share and demand awareness of Tromino and
scheduling based on these attributes can reduce (1) the impact of unfairness
due to a framework specific configuration, and (2) unfair waiting time due to
higher resource demand in a pending task queue. In the best case, Tromino can
significantly reduce the average waiting time of a framework by using the
proposed Demand-DRF aware policy
High-Throughput Computing on High-Performance Platforms: A Case Study
The computing systems used by LHC experiments has historically consisted of
the federation of hundreds to thousands of distributed resources, ranging from
small to mid-size resource. In spite of the impressive scale of the existing
distributed computing solutions, the federation of small to mid-size resources
will be insufficient to meet projected future demands. This paper is a case
study of how the ATLAS experiment has embraced Titan---a DOE leadership
facility in conjunction with traditional distributed high- throughput computing
to reach sustained production scales of approximately 52M core-hours a years.
The three main contributions of this paper are: (i) a critical evaluation of
design and operational considerations to support the sustained, scalable and
production usage of Titan; (ii) a preliminary characterization of a next
generation executor for PanDA to support new workloads and advanced execution
modes; and (iii) early lessons for how current and future experimental and
observational systems can be integrated with production supercomputers and
other platforms in a general and extensible manner
InterCloud: Utility-Oriented Federation of Cloud Computing Environments for Scaling of Application Services
Cloud computing providers have setup several data centers at different
geographical locations over the Internet in order to optimally serve needs of
their customers around the world. However, existing systems do not support
mechanisms and policies for dynamically coordinating load distribution among
different Cloud-based data centers in order to determine optimal location for
hosting application services to achieve reasonable QoS levels. Further, the
Cloud computing providers are unable to predict geographic distribution of
users consuming their services, hence the load coordination must happen
automatically, and distribution of services must change in response to changes
in the load. To counter this problem, we advocate creation of federated Cloud
computing environment (InterCloud) that facilitates just-in-time,
opportunistic, and scalable provisioning of application services, consistently
achieving QoS targets under variable workload, resource and network conditions.
The overall goal is to create a computing environment that supports dynamic
expansion or contraction of capabilities (VMs, services, storage, and database)
for handling sudden variations in service demands.
This paper presents vision, challenges, and architectural elements of
InterCloud for utility-oriented federation of Cloud computing environments. The
proposed InterCloud environment supports scaling of applications across
multiple vendor clouds. We have validated our approach by conducting a set of
rigorous performance evaluation study using the CloudSim toolkit. The results
demonstrate that federated Cloud computing model has immense potential as it
offers significant performance gains as regards to response time and cost
saving under dynamic workload scenarios.Comment: 20 pages, 4 figures, 3 tables, conference pape
Building a Truly Distributed Constraint Solver with JADE
Real life problems such as scheduling meeting between people at different
locations can be modelled as distributed Constraint Satisfaction Problems
(CSPs). Suitable and satisfactory solutions can then be found using constraint
satisfaction algorithms which can be exhaustive (backtracking) or otherwise
(local search). However, most research in this area tested their algorithms by
simulation on a single PC with a single program entry point. The main
contribution of our work is the design and implementation of a truly
distributed constraint solver based on a local search algorithm using Java
Agent DEvelopment framework (JADE) to enable communication between agents on
different machines. Particularly, we discuss design and implementation issues
related to truly distributed constraint solver which might not be critical when
simulated on a single machine. Evaluation results indicate that our truly
distributed constraint solver works well within the observed limitations when
tested with various distributed CSPs. Our application can also incorporate any
constraint solving algorithm with little modifications.Comment: 7 page
- …