208 research outputs found
Capturing the impact of external interference on HPC application performance
HPC applications are large software packages with high computation and storage requirements. To meet these requirements, the architectures of supercomputers are continuously evolving and their capabilities are continuously increasing. Present-day supercomputers have achieved petaflops of computational power by utilizing thousands to millions of compute cores, connected through specialized communication networks, and are equipped with petabytes of storage using a centralized I/O subsystem. While fulfilling the high resource demands of HPC applications, such a design also entails its own challenges. Applications running on these systems own the computation resources exclusively, but share the communication interconnect and the I/O subsystem with other concurrently running applications. Simultaneous access to these shared resources causes contention and inter-application interference, leading to degraded application performance.
Inter-application interference is one of the sources of run-to-run variation. While other sources of variation, such as operating system jitter, have been investigated before, this doctoral thesis specifically focuses on inter-application interference and studies it from the perspective of an application. Variation in execution time not only causes uncertainty and affects user expectations (especially during performance analysis), but also causes suboptimal usage of HPC resources. Therefore, this thesis aims to evaluate inter-application interference, establish trends among applications under contention, and approximate the impact of external influences on the runtime of an application.
To this end, this thesis first presents a method to correlate the performance of applications running side-by-side. The method divides the runtime of a system into globally synchronized, fine-grained time slices for which application performance data is recorded separately. The evaluation of the method demonstrates that correlating application performance data can identify inter-application interference. The thesis further uses the method to study I/O interference and shows that file access patterns are a significant factor in determining the interference potential of an application.
This thesis also presents a technique to estimate the impact of external influences on an application run. The technique introduces the concept of intrinsic performance characteristics to cluster similar application execution segments. Anomalies in the cluster are the result of external interference. An evaluation with several benchmarks shows high accuracy in estimating the impact of interference from a single application run.
The contributions of this thesis will help establish interference trends and devise interference mitigation techniques. Similarly, estimating the impact of external interference will restore user expectations and help performance analysts separate application performance from external influence
Thermal/performance trade-off in network-on-chip architectures
Multi-core architectures are a promising paradigm to exploit the huge integration density reached by high-performance systems. Indeed, integration density and technology scaling are causing undesirable operating temperatures, having net impact on reduced reliability and increased cooling costs. Dynamic Thermal Management (DTM) approaches have been proposed in literature to control temperature profile at run-time, while design-time approaches generally provide floorplan-driven solutions to cope with temperature constraints. Nevertheless, a suitable approach to collect performance, thermal and reliability metrics has not been proposed, yet. This work presents a novel methodology to jointly optimize temperature/performance trade-off in reliable high-performance parallel architectures with security constraints achieved by workload physical isolation on each core. The proposed methodology is based on a linear formal model relating temperature and duty-cycle on one side, and performance and duty-cycle on the other side. Extensive experimental results on real-world use-case scenarios show the goodness of the proposed model, suitable for design-time system-wide optimization to be used in conjunction with DTM technique
Control Plane Hardware Design for Optical Packet Switched Data Centre Networks
Optical packet switching for intra-data centre networks is key to addressing traffic requirements. Photonic integration and wavelength division multiplexing (WDM) can overcome bandwidth limits in switching systems. A promising technology to build a nanosecond-reconfigurable photonic-integrated switch, compatible with WDM, is the semiconductor optical amplifier (SOA). SOAs are typically used as gating elements in a broadcast-and-select (B\&S) configuration, to build an optical crossbar switch. For larger-size switching, a three-stage Clos network, based on crossbar nodes, is a viable architecture. However, the design of the switch control plane, is one of the barriers to packet switching; it should run on packet timescales, which becomes increasingly challenging as line rates get higher. The scheduler, used for the allocation of switch paths, limits control clock speed. To this end, the research contribution was the design of highly parallel hardware schedulers for crossbar and Clos network switches. On a field-programmable gate array (FPGA), the minimum scheduler clock period achieved was 5.0~ns and 5.4~ns, for a 32-port crossbar and Clos switch, respectively. By using parallel path allocation modules, one per Clos node, a minimum clock period of 7.0~ns was achieved, for a 256-port switch. For scheduler application-specific integrated circuit (ASIC) synthesis, this reduces to 2.0~ns; a record result enabling scalable packet switching. Furthermore, the control plane was demonstrated experimentally. Moreover, a cycle-accurate network emulator was developed to evaluate switch performance. Results showed a switch saturation throughput at a traffic load 60\% of capacity, with sub-microsecond packet latency, for a 256-port Clos switch, outperforming state-of-the-art optical packet switches
System design approach to energy-efficient data centers
Thesis (S.M. in Engineering and Management)--Massachusetts Institute of Technology, Engineering Systems Division, System Design and Management Program, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 63-65).Green HPC is the new standard for High Performance Computing (HPC). This has now become the primary interest among HPC researchers because of a renewed emphasis on Total Cost of Ownership (TCO) and the pursuit of higher performance. Quite simply, the cost of operating modern HPC equipment can rapidly outstrip the cost of acquisition. This phenomenon is recent and can be traced to the inadequacies in modern CPU and Datacenter systems design. This thesis analyzes the problem in its entirety and describe best practice fixes to solve the problems of energy-inefficient HPC.by Kurt Keville.S.M.in Engineering and Managemen
Recommended from our members
IEEE 802.11 wireless LAN traffic analysis: a cross-layer approach
textThe deployment of broadband wireless data networks, e.g., wireless local area
networks (WLANs) [29], experienced tremendous growth in the last several
years, and this trend is continuously gaining momentum. In fact, WLAN is
becoming an indispensable component of the modern telecommunication infrastructure.
Despite this optimistic outlook, however, little is known about
the impact of the wireless channel on the characteristics of WLAN traffic.
This dissertation characterizes the correlation structures of WLAN channel
with traffic statistics from a cross-layer point of view, and provides new measurement
methodologies and statistical models for WLAN networks.
Currently WLAN standards are designed within the paradigm of the
layered network architecture. For example, the architecture of IEEE 802.11
vii
is almost identical to the Ethernet. However, wireless networks are fundamentally
different from their wired peers due to the shift of transmission media
from cables to over-the-air radio waves. This transition exposes wireless
systems to the influence of radio propagation, and more importantly, to the
temporal and spacial fluctuations of the radio channel that can actually be
propagated up to upper layers. However, the current WLAN architecture isolates
network layers, and largely ignores this impact. Therefore, we believe
that a cross-layer based approach is necessary to understand and reflect this
underlying impact of the channel to the upper layers of the network, especially
in relation to WLAN traffic behavior.
Measurement is one of the fundamental tools used to quantify radio
propagation. As part of this dissertation, a complete framework for a measurement
methodology, including hardware, software, and measurement procedures,
is established. Characteristics of the propagation channel are estimated
from measurement data, and the channel knowledge is applied to the upper
layers for more realistic and accurate modeling.
In WLAN environments, knowledge of the traffic characteristics is essential
for proper network provisioning, and for improving the performance
of the IEEE 802.11 standard and network devices, e.g., to design improved
MAC schemes, or to build better buffer scheduling algorithms with channel
knowledge, etc. Built upon extensive WLAN traffic traces, this dissertation
work presents cross-layer models for WLAN throughput predictions, traffic
statistics, and link layer characteristics.
viii
The main goal of this dissertation work is to experiment with and develop
new methods for identifying channel characteristics. Thereby utilizing
this knowledge, we show how to predict and improve WLAN performance.
Within the framework of the developed cross-layer measurement methodology,
we conducted extensive measurements in different physical environments
and different settings such as office buildings and stores, and (1) show that
the impact of the propagation channel can be quantified by using simple large
scale channel metric (throughput over longer period of time), and (2) also
present the existence of a Doppler effect within today’s WLAN packet traffic
at sub-second time scales. We also show the real-world WLAN usage pattern
from our measurement results. From this data, we conclude that the key issues
to study WLAN networks include accurate site-specific propagation channel
modeling and real-time autonomous traffic control.Electrical and Computer Engineerin
Analysis of a Gluonic Penguin Decay with the BaBar Detector
This thesis presents a branching fraction analysis of the neutral B meson decay channel B → ϕK0s where the K0s decays to π0π0. The decay is dominated by gluonic penguin transitions, which have been very important for the main program of BABAR: the search for physics beyond the Standard Model. The decay channel has been established and is included in the CP analysis, which is sensitive to new physics. The data set consists of 227 million BB̅ pairs recorded by the BABAR detector at the Stanford Linear Accelerator Center. Sophisticated analysis techniques have been applied primarily to suppress background from e+e- → quark/anti-quark reactions. The analysis of such rare decay channels with BABAR relies on the availability of a large set of computer simulated data. For that purpose a computer cluster has been built at the University of Tennessee as part of the distributed computing support work for BABAR. The design and performance of the cluster is a main subject of this thesis work
- …