30,392 research outputs found
Runtime-guided mitigation of manufacturing variability in power-constrained multi-socket NUMA nodes
This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493, SEV-2011-00067), by
the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P), by Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), by the RoMoL ERC Advanced Grant (GA 321253) and the European HiPEAC Network of Excellence. M. Moretó has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI-2012-15047. M. Casas is supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Cofund
programme of the Marie Curie Actions of the 7th R&D Framework Programme of the European Union (Contract 2013 BP B 00243). This work was also partially performed
under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 (LLNL-CONF-689878).
Finally, the authors are grateful to the reviewers for their valuable comments, to the RoMoL team, to Xavier Teruel and Kallia Chronaki from the Programming Models group
of BSC and the Computation Department of LLNL for their technical support and useful feedback.Peer ReviewedPostprint (published version
Dependable Distributed Computing for the International Telecommunication Union Regional Radio Conference RRC06
The International Telecommunication Union (ITU) Regional Radio Conference
(RRC06) established in 2006 a new frequency plan for the introduction of
digital broadcasting in European, African, Arab, CIS countries and Iran. The
preparation of the plan involved complex calculations under short deadline and
required dependable and efficient computing capability. The ITU designed and
deployed in-situ a dedicated PC farm, in parallel to the European Organization
for Nuclear Research (CERN) which provided and supported a system based on the
EGEE Grid. The planning cycle at the RRC06 required a periodic execution in the
order of 200,000 short jobs, using several hundreds of CPU hours, in a period
of less than 12 hours. The nature of the problem required dynamic
workload-balancing and low-latency access to the computing resources. We
present the strategy and key technical choices that delivered a reliable
service to the RRC06
InterCloud: Utility-Oriented Federation of Cloud Computing Environments for Scaling of Application Services
Cloud computing providers have setup several data centers at different
geographical locations over the Internet in order to optimally serve needs of
their customers around the world. However, existing systems do not support
mechanisms and policies for dynamically coordinating load distribution among
different Cloud-based data centers in order to determine optimal location for
hosting application services to achieve reasonable QoS levels. Further, the
Cloud computing providers are unable to predict geographic distribution of
users consuming their services, hence the load coordination must happen
automatically, and distribution of services must change in response to changes
in the load. To counter this problem, we advocate creation of federated Cloud
computing environment (InterCloud) that facilitates just-in-time,
opportunistic, and scalable provisioning of application services, consistently
achieving QoS targets under variable workload, resource and network conditions.
The overall goal is to create a computing environment that supports dynamic
expansion or contraction of capabilities (VMs, services, storage, and database)
for handling sudden variations in service demands.
This paper presents vision, challenges, and architectural elements of
InterCloud for utility-oriented federation of Cloud computing environments. The
proposed InterCloud environment supports scaling of applications across
multiple vendor clouds. We have validated our approach by conducting a set of
rigorous performance evaluation study using the CloudSim toolkit. The results
demonstrate that federated Cloud computing model has immense potential as it
offers significant performance gains as regards to response time and cost
saving under dynamic workload scenarios.Comment: 20 pages, 4 figures, 3 tables, conference pape
A Statistical Mechanical Load Balancer for the Web
The maximum entropy principle from statistical mechanics states that a closed
system attains an equilibrium distribution that maximizes its entropy. We first
show that for graphs with fixed number of edges one can define a stochastic
edge dynamic that can serve as an effective thermalization scheme, and hence,
the underlying graphs are expected to attain their maximum-entropy states,
which turn out to be Erdos-Renyi (ER) random graphs. We next show that (i) a
rate-equation based analysis of node degree distribution does indeed confirm
the maximum-entropy principle, and (ii) the edge dynamic can be effectively
implemented using short random walks on the underlying graphs, leading to a
local algorithm for the generation of ER random graphs. The resulting
statistical mechanical system can be adapted to provide a distributed and local
(i.e., without any centralized monitoring) mechanism for load balancing, which
can have a significant impact in increasing the efficiency and utilization of
both the Internet (e.g., efficient web mirroring), and large-scale computing
infrastructure (e.g., cluster and grid computing).Comment: 11 Pages, 5 Postscript figures; added references, expanded on
protocol discussio
Adaptive Dispatching of Tasks in the Cloud
The increasingly wide application of Cloud Computing enables the
consolidation of tens of thousands of applications in shared infrastructures.
Thus, meeting the quality of service requirements of so many diverse
applications in such shared resource environments has become a real challenge,
especially since the characteristics and workload of applications differ widely
and may change over time. This paper presents an experimental system that can
exploit a variety of online quality of service aware adaptive task allocation
schemes, and three such schemes are designed and compared. These are a
measurement driven algorithm that uses reinforcement learning, secondly a
"sensible" allocation algorithm that assigns jobs to sub-systems that are
observed to provide a lower response time, and then an algorithm that splits
the job arrival stream into sub-streams at rates computed from the hosts'
processing capabilities. All of these schemes are compared via measurements
among themselves and with a simple round-robin scheduler, on two experimental
test-beds with homogeneous and heterogeneous hosts having different processing
capacities.Comment: 10 pages, 9 figure
- …