30,392 research outputs found

    Runtime-guided mitigation of manufacturing variability in power-constrained multi-socket NUMA nodes

    Get PDF
    This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493, SEV-2011-00067), by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P), by Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), by the RoMoL ERC Advanced Grant (GA 321253) and the European HiPEAC Network of Excellence. M. Moretó has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI-2012-15047. M. Casas is supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Cofund programme of the Marie Curie Actions of the 7th R&D Framework Programme of the European Union (Contract 2013 BP B 00243). This work was also partially performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 (LLNL-CONF-689878). Finally, the authors are grateful to the reviewers for their valuable comments, to the RoMoL team, to Xavier Teruel and Kallia Chronaki from the Programming Models group of BSC and the Computation Department of LLNL for their technical support and useful feedback.Peer ReviewedPostprint (published version

    Dependable Distributed Computing for the International Telecommunication Union Regional Radio Conference RRC06

    Full text link
    The International Telecommunication Union (ITU) Regional Radio Conference (RRC06) established in 2006 a new frequency plan for the introduction of digital broadcasting in European, African, Arab, CIS countries and Iran. The preparation of the plan involved complex calculations under short deadline and required dependable and efficient computing capability. The ITU designed and deployed in-situ a dedicated PC farm, in parallel to the European Organization for Nuclear Research (CERN) which provided and supported a system based on the EGEE Grid. The planning cycle at the RRC06 required a periodic execution in the order of 200,000 short jobs, using several hundreds of CPU hours, in a period of less than 12 hours. The nature of the problem required dynamic workload-balancing and low-latency access to the computing resources. We present the strategy and key technical choices that delivered a reliable service to the RRC06

    InterCloud: Utility-Oriented Federation of Cloud Computing Environments for Scaling of Application Services

    Full text link
    Cloud computing providers have setup several data centers at different geographical locations over the Internet in order to optimally serve needs of their customers around the world. However, existing systems do not support mechanisms and policies for dynamically coordinating load distribution among different Cloud-based data centers in order to determine optimal location for hosting application services to achieve reasonable QoS levels. Further, the Cloud computing providers are unable to predict geographic distribution of users consuming their services, hence the load coordination must happen automatically, and distribution of services must change in response to changes in the load. To counter this problem, we advocate creation of federated Cloud computing environment (InterCloud) that facilitates just-in-time, opportunistic, and scalable provisioning of application services, consistently achieving QoS targets under variable workload, resource and network conditions. The overall goal is to create a computing environment that supports dynamic expansion or contraction of capabilities (VMs, services, storage, and database) for handling sudden variations in service demands. This paper presents vision, challenges, and architectural elements of InterCloud for utility-oriented federation of Cloud computing environments. The proposed InterCloud environment supports scaling of applications across multiple vendor clouds. We have validated our approach by conducting a set of rigorous performance evaluation study using the CloudSim toolkit. The results demonstrate that federated Cloud computing model has immense potential as it offers significant performance gains as regards to response time and cost saving under dynamic workload scenarios.Comment: 20 pages, 4 figures, 3 tables, conference pape

    A Statistical Mechanical Load Balancer for the Web

    Full text link
    The maximum entropy principle from statistical mechanics states that a closed system attains an equilibrium distribution that maximizes its entropy. We first show that for graphs with fixed number of edges one can define a stochastic edge dynamic that can serve as an effective thermalization scheme, and hence, the underlying graphs are expected to attain their maximum-entropy states, which turn out to be Erdos-Renyi (ER) random graphs. We next show that (i) a rate-equation based analysis of node degree distribution does indeed confirm the maximum-entropy principle, and (ii) the edge dynamic can be effectively implemented using short random walks on the underlying graphs, leading to a local algorithm for the generation of ER random graphs. The resulting statistical mechanical system can be adapted to provide a distributed and local (i.e., without any centralized monitoring) mechanism for load balancing, which can have a significant impact in increasing the efficiency and utilization of both the Internet (e.g., efficient web mirroring), and large-scale computing infrastructure (e.g., cluster and grid computing).Comment: 11 Pages, 5 Postscript figures; added references, expanded on protocol discussio

    Adaptive Dispatching of Tasks in the Cloud

    Full text link
    The increasingly wide application of Cloud Computing enables the consolidation of tens of thousands of applications in shared infrastructures. Thus, meeting the quality of service requirements of so many diverse applications in such shared resource environments has become a real challenge, especially since the characteristics and workload of applications differ widely and may change over time. This paper presents an experimental system that can exploit a variety of online quality of service aware adaptive task allocation schemes, and three such schemes are designed and compared. These are a measurement driven algorithm that uses reinforcement learning, secondly a "sensible" allocation algorithm that assigns jobs to sub-systems that are observed to provide a lower response time, and then an algorithm that splits the job arrival stream into sub-streams at rates computed from the hosts' processing capabilities. All of these schemes are compared via measurements among themselves and with a simple round-robin scheduler, on two experimental test-beds with homogeneous and heterogeneous hosts having different processing capacities.Comment: 10 pages, 9 figure
    corecore