Search CORE

208 research outputs found

Capturing the impact of external interference on HPC application performance

Author: Shah Aamer
Publication venue
Publication date: 01/01/2020
Field of study

HPC applications are large software packages with high computation and storage requirements. To meet these requirements, the architectures of supercomputers are continuously evolving and their capabilities are continuously increasing. Present-day supercomputers have achieved petaflops of computational power by utilizing thousands to millions of compute cores, connected through specialized communication networks, and are equipped with petabytes of storage using a centralized I/O subsystem. While fulfilling the high resource demands of HPC applications, such a design also entails its own challenges. Applications running on these systems own the computation resources exclusively, but share the communication interconnect and the I/O subsystem with other concurrently running applications. Simultaneous access to these shared resources causes contention and inter-application interference, leading to degraded application performance. Inter-application interference is one of the sources of run-to-run variation. While other sources of variation, such as operating system jitter, have been investigated before, this doctoral thesis specifically focuses on inter-application interference and studies it from the perspective of an application. Variation in execution time not only causes uncertainty and affects user expectations (especially during performance analysis), but also causes suboptimal usage of HPC resources. Therefore, this thesis aims to evaluate inter-application interference, establish trends among applications under contention, and approximate the impact of external influences on the runtime of an application. To this end, this thesis first presents a method to correlate the performance of applications running side-by-side. The method divides the runtime of a system into globally synchronized, fine-grained time slices for which application performance data is recorded separately. The evaluation of the method demonstrates that correlating application performance data can identify inter-application interference. The thesis further uses the method to study I/O interference and shows that file access patterns are a significant factor in determining the interference potential of an application. This thesis also presents a technique to estimate the impact of external influences on an application run. The technique introduces the concept of intrinsic performance characteristics to cluster similar application execution segments. Anomalies in the cluster are the result of external interference. An evaluation with several benchmarks shows high accuracy in estimating the impact of interference from a single application run. The contributions of this thesis will help establish interference trends and devise interference mitigation techniques. Similarly, estimating the impact of external interference will restore user expectations and help performance analysts separate application performance from external influence

Thermal/performance trade-off in network-on-chip architectures

Author: Corbetta Simone
Fornaciari William
Zoni Davide
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Multi-core architectures are a promising paradigm to exploit the huge integration density reached by high-performance systems. Indeed, integration density and technology scaling are causing undesirable operating temperatures, having net impact on reduced reliability and increased cooling costs. Dynamic Thermal Management (DTM) approaches have been proposed in literature to control temperature profile at run-time, while design-time approaches generally provide floorplan-driven solutions to cope with temperature constraints. Nevertheless, a suitable approach to collect performance, thermal and reliability metrics has not been proposed, yet. This work presents a novel methodology to jointly optimize temperature/performance trade-off in reliable high-performance parallel architectures with security constraints achieved by workload physical isolation on each core. The proposed methodology is based on a linear formal model relating temperature and duty-cycle on one side, and performance and duty-cycle on the other side. Extensive experimental results on real-world use-case scenarios show the goodness of the proposed model, suitable for design-time system-wide optimization to be used in conjunction with DTM technique

Archivio istituzionale della ricerca - Politecnico di Milano

Control Plane Hardware Design for Optical Packet Switched Data Centre Networks

Author: Andreades Paris
Publication venue: UCL (University College London)
Publication date: 28/01/2020
Field of study

Optical packet switching for intra-data centre networks is key to addressing traffic requirements. Photonic integration and wavelength division multiplexing (WDM) can overcome bandwidth limits in switching systems. A promising technology to build a nanosecond-reconfigurable photonic-integrated switch, compatible with WDM, is the semiconductor optical amplifier (SOA). SOAs are typically used as gating elements in a broadcast-and-select (B\&S) configuration, to build an optical crossbar switch. For larger-size switching, a three-stage Clos network, based on crossbar nodes, is a viable architecture. However, the design of the switch control plane, is one of the barriers to packet switching; it should run on packet timescales, which becomes increasingly challenging as line rates get higher. The scheduler, used for the allocation of switch paths, limits control clock speed. To this end, the research contribution was the design of highly parallel hardware schedulers for crossbar and Clos network switches. On a field-programmable gate array (FPGA), the minimum scheduler clock period achieved was 5.0~ns and 5.4~ns, for a 32-port crossbar and Clos switch, respectively. By using parallel path allocation modules, one per Clos node, a minimum clock period of 7.0~ns was achieved, for a 256-port switch. For scheduler application-specific integrated circuit (ASIC) synthesis, this reduces to 2.0~ns; a record result enabling scalable packet switching. Furthermore, the control plane was demonstrated experimentally. Moreover, a cycle-accurate network emulator was developed to evaluate switch performance. Results showed a switch saturation throughput at a traffic load 60\% of capacity, with sub-microsecond packet latency, for a 256-port Clos switch, outperforming state-of-the-art optical packet switches

System design approach to energy-efficient data centers

Author: Keville Kurt (Kurt Lawrence)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2011
Field of study

Thesis (S.M. in Engineering and Management)--Massachusetts Institute of Technology, Engineering Systems Division, System Design and Management Program, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 63-65).Green HPC is the new standard for High Performance Computing (HPC). This has now become the primary interest among HPC researchers because of a renewed emphasis on Total Cost of Ownership (TCO) and the pursuit of higher performance. Quite simply, the cost of operating modern HPC equipment can rapidly outstrip the cost of acquisition. This phenomenon is recent and can be traced to the inadequacies in modern CPU and Datacenter systems design. This thesis analyzes the problem in its entirety and describe best practice fixes to solve the problems of energy-inefficient HPC.by Kurt Keville.S.M.in Engineering and Managemen

Workload characterization and synthesis for data center optimization

Author: Polfliet Stijn
Publication venue: Ghent University. Faculty of Engineering and Architecture
Publication date: 01/01/2013
Field of study

Recommended from our members

IEEE 802.11 wireless LAN traffic analysis: a cross-layer approach

Author: Na Chen
Publication venue
Publication date: 01/01/2005
Field of study

textThe deployment of broadband wireless data networks, e.g., wireless local area networks (WLANs) [29], experienced tremendous growth in the last several years, and this trend is continuously gaining momentum. In fact, WLAN is becoming an indispensable component of the modern telecommunication infrastructure. Despite this optimistic outlook, however, little is known about the impact of the wireless channel on the characteristics of WLAN traffic. This dissertation characterizes the correlation structures of WLAN channel with traffic statistics from a cross-layer point of view, and provides new measurement methodologies and statistical models for WLAN networks. Currently WLAN standards are designed within the paradigm of the layered network architecture. For example, the architecture of IEEE 802.11 vii is almost identical to the Ethernet. However, wireless networks are fundamentally different from their wired peers due to the shift of transmission media from cables to over-the-air radio waves. This transition exposes wireless systems to the influence of radio propagation, and more importantly, to the temporal and spacial fluctuations of the radio channel that can actually be propagated up to upper layers. However, the current WLAN architecture isolates network layers, and largely ignores this impact. Therefore, we believe that a cross-layer based approach is necessary to understand and reflect this underlying impact of the channel to the upper layers of the network, especially in relation to WLAN traffic behavior. Measurement is one of the fundamental tools used to quantify radio propagation. As part of this dissertation, a complete framework for a measurement methodology, including hardware, software, and measurement procedures, is established. Characteristics of the propagation channel are estimated from measurement data, and the channel knowledge is applied to the upper layers for more realistic and accurate modeling. In WLAN environments, knowledge of the traffic characteristics is essential for proper network provisioning, and for improving the performance of the IEEE 802.11 standard and network devices, e.g., to design improved MAC schemes, or to build better buffer scheduling algorithms with channel knowledge, etc. Built upon extensive WLAN traffic traces, this dissertation work presents cross-layer models for WLAN throughput predictions, traffic statistics, and link layer characteristics. viii The main goal of this dissertation work is to experiment with and develop new methods for identifying channel characteristics. Thereby utilizing this knowledge, we show how to predict and improve WLAN performance. Within the framework of the developed cross-layer measurement methodology, we conducted extensive measurements in different physical environments and different settings such as office buildings and stores, and (1) show that the impact of the propagation channel can be quantified by using simple large scale channel metric (throughput over longer period of time), and (2) also present the existence of a Doppler effect within today’s WLAN packet traffic at sub-second time scales. We also show the real-world WLAN usage pattern from our measurement results. From this data, we conclude that the key issues to study WLAN networks include accurate site-specific propagation channel modeling and real-time autonomous traffic control.Electrical and Computer Engineerin

Texas ScholarWorks

Energy- efficient and SLA-based management of IaaS Cloud Data Centers

Author: Altino Manuel Silva Sampaio
Publication venue
Publication date: 15/06/2015
Field of study

Analysis of a Gluonic Penguin Decay with the BaBar Detector

Author: Ragghianti Gerald Conrad, Jr.
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/05/2007
Field of study

This thesis presents a branching fraction analysis of the neutral B meson decay channel B → ϕK0s where the K0s decays to π0π0. The decay is dominated by gluonic penguin transitions, which have been very important for the main program of BABAR: the search for physics beyond the Standard Model. The decay channel has been established and is included in the CP analysis, which is sensitive to new physics. The data set consists of 227 million BB̅ pairs recorded by the BABAR detector at the Stanford Linear Accelerator Center. Sophisticated analysis techniques have been applied primarily to suppress background from e+e- → quark/anti-quark reactions. The analysis of such rare decay channels with BABAR relies on the availability of a large set of computer simulated data. For that purpose a computer cluster has been built at the University of Tennessee as part of the distributed computing support work for BABAR. The design and performance of the cluster is a main subject of this thesis work