Search CORE

21 research outputs found

Optimization of communication intensive applications on HPC networks

Author: Jain Nikhil
Publication venue
Publication date: 01/05/2016
Field of study

Communication is a necessary but overhead inducing component of parallel programming. Its impact on application design and performance is due to several related aspects of a parallel job execution: network topology, routing protocol, suitability of algorithm being used to the network, job placement, etc. This thesis is aimed at developing an understanding of how communication plays out on networks of high performance computing systems and exploring methods that can be used to improve communication performance of large scale applications. Broadly speaking, three topics have been studied in detail in this thesis. The first of these topics is task mapping and job placement on practical installations of torus and dragonfly networks. Next, use of supervised learning algorithms for conducting diagnostic studies of how communication evolves on networks is explored. Finally, efficacy of packet-level simulations for prediction-based studies of communication performance on different networks using different network parameters is analyzed. The primary contribution of this thesis is development of scalable diagnostic and prediction methods that can assist in the process of network designing, adapting applications to future systems, and optimizing execution of applications on existing systems. These meth- ods include a supervised learning approach, a functional modeling tool (called Damselfly), and a PDES-based packet level simulator (called TraceR), all of which are described in this thesis

Illinois Digital Environment for Access to Learning and Scholarship Repository

Recommended from our members

Latency-driven performance in data centres

Author: Popescu Diana Andreea
Publication venue: University of Cambridge
Publication date: 14/04/2019
Field of study

Data centre based cloud computing has revolutionised the way businesses use computing infrastructure. Instead of building their own data centres, companies rent computing resources and deploy their applications on cloud hardware. Providing customers with well-defined application performance guarantees is of paramount importance to ensure transparency and to build a lasting collaboration between users and cloud operators. A user’s application performance is subject to the constraints of the resources it has been allocated and to the impact of the network conditions in the data centre. In this dissertation, I argue that application performance in data centres can be improved through cluster scheduling of applications informed by predictions of application performance for given network latency, and measurements of current network latency in data centres between hosts. Firstly, I show how to use the Precision Time Protocol (PTP), through an open-source software implementation PTPd, to measure network latency and packet loss in data centres. I propose PTPmesh, which uses PTPd, as a cloud network monitoring tool for tenants. Furthermore, I conduct a measurement study using PTPmesh in different cloud providers, finding that network latency variability in data centres is still common. Normal latency values in data centres are in the order of tens or hundreds of microseconds, while unexpected events, such as network congestion or packet loss, can lead to latency spikes in the order of milliseconds. Secondly, I show that network latency matters for certain distributed applications even in small amounts of tens or hundreds of microseconds, significantly reducing their performance. I propose a methodology to determine the impact of network latency on distributed applications performance by injecting artificial delay into the network of an experimental setup. Based on the experimental results, I build functions that predict the performance of an application for a given network latency. Given the network latency variability observed in data centers, applications’ performance is determined by their placement within the data centre. Thirdly, I propose latency-driven, application performance-aware, cluster scheduling as a way to provide performance guarantees to applications. I introduce NoMora, a cluster scheduling architecture that leverages the predictions of application performance dependent upon network latency combined with dynamic network latency measurements taken between pairs of hosts in data centres to place applications. Moreover, I show that NoMora improves application performance by choosing better placements than other scheduling policies.MEASUREMENT FOR EUROPE: TRAINING AND RESEARCH FOR INTERNET COMMUNICATIONS SCIENCE, European Commission FP7 Marie Curie Innovative Training Networks (ITN) ENDEAVOUR, European Commission Horizon 2020 (H2020) Industrial Leadership (IL

Apollo (Cambridge)

On the design of efficient caching systems

Author: Saino L
Publication venue: UCL (University College London)
Publication date: 28/12/2015
Field of study

Content distribution is currently the prevalent Internet use case, accounting for the majority of global Internet traffic and growing exponentially. There is general consensus that the most effective method to deal with the large amount of content demand is through the deployment of massively distributed caching infrastructures as the means to localise content delivery traffic. Solutions based on caching have been already widely deployed through Content Delivery Networks. Ubiquitous caching is also a fundamental aspect of the emerging Information-Centric Networking paradigm which aims to rethink the current Internet architecture for long term evolution. Distributed content caching systems are expected to grow substantially in the future, in terms of both footprint and traffic carried and, as such, will become substantially more complex and costly. This thesis addresses the problem of designing scalable and cost-effective distributed caching systems that will be able to efficiently support the expected massive growth of content traffic and makes three distinct contributions. First, it produces an extensive theoretical characterisation of sharding, which is a widely used technique to allocate data items to resources of a distributed system according to a hash function. Based on the findings unveiled by this analysis, two systems are designed contributing to the abovementioned objective. The first is a framework and related algorithms for enabling efficient load-balanced content caching. This solution provides qualitative advantages over previously proposed solutions, such as ease of modelling and availability of knobs to fine-tune performance, as well as quantitative advantages, such as 2x increase in cache hit ratio and 19-33% reduction in load imbalance while maintaining comparable latency to other approaches. The second is the design and implementation of a caching node enabling 20 Gbps speeds based on inexpensive commodity hardware. We believe these contributions advance significantly the state of the art in distributed caching systems

UCL Discovery

Efficient Domain Partitioning for Stencil-based Parallel Operators

Author: Saxena Gaurav
Publication venue: University of Leeds
Publication date: 01/08/2018
Field of study

Partial Differential Equations (PDEs) are used ubiquitously in modelling natural phenomena. It is generally not possible to obtain an analytical solution and hence they are commonly discretized using schemes such as the Finite Difference Method (FDM) and the Finite Element Method (FEM), converting the continuous PDE to a discrete system of sparse algebraic equations. The solution of this system can be approximated using iterative methods, which are better suited to many sparse systems than direct methods. In this thesis we use the FDM to discretize linear, second order, Elliptic PDEs and consider parallel implementations of standard iterative solvers. The dominant paradigm in this field is distributed memory parallelism which requires the FDM grid to be partitioned across the available computational cores. The orthodox approach to domain partitioning aims to minimize only the communication volume and achieve perfect load-balance on each core. In this work, we re-examine and challenge this traditional method of domain partitioning and show that for well load-balanced problems, minimizing only the communication volume is insufficient for obtaining optimal domain partitions. To this effect we create a high-level, quasi-cache-aware mathematical model that quantifies cache-misses at the sub-domain level and minimizes them to obtain families of high performing domain decompositions. To our knowledge this is the first work that optimizes domain partitioning by analyzing cache misses, establishing a relationship between cache-misses and domain partitioning. To place our model in its true context, we identify and qualitatively examine multiple other factors such as the Least Recently Used policy, Cache Line Utilization and Vectorization, that influence the choice of optimal sub-domain dimensions. Since the convergence rate of point iterative methods, such as Jacobi, for uniform meshes is not acceptable at a high mesh resolution, we extend the model to Parallel Geometric Multigrid (GMG). GMG is a multilevel, iterative, optimal algorithm for numerically solving Elliptic PDEs. Adaptive Mesh Refinement (AMR) is another multilevel technique that allows local refinement of a global mesh based on parameters such as error estimates or geometric importance. We study a massively parallel, multiphysics, multi-resolution AMR framework called BoxLib, and implement and discuss our model on single level and adaptively refined meshes, respectively. We conclude that “close to 2-D” partitions are optimal for stencil-based codes on structured 3-D domains and that it is necessary to optimize for both minimizing cache-misses and communication. We advise that in light of the evolving hardware-software ecosystem, there is an imperative need to re-examine conventional domain partitioning strategies

White Rose E-theses Online

Hyperscale Data Processing With Network-Centric Designs

Author: Zhang Qizhen
Publication venue: ScholarlyCommons
Publication date: 01/01/2022
Field of study

Today’s largest data processing workloads are hosted in cloud data centers. Due to unprecedented data growth and the end of Moore’s Law, these workloads have ballooned to the hyperscale level, encompassing billions to trillions of data items and hundreds to thousands of machines per query. Enabling and expanding with these workloads are highly scalable data center networks that connect up to hundreds of thousands of networked servers. These massive scales fundamentally challenge the designs of both data processing systems and data center networks, and the classic layered designs are no longer sustainable. Rather than optimize these massive layers in silos, we build systems across them with principled network-centric designs. In current networks, we redesign data processing systems with network-awareness to minimize the cost of moving data in the network. In future networks, we propose new interfaces and services that the cloud infrastructure offers to applications and codesign data processing systems to achieve optimal query processing performance. To transform the network to future designs, we facilitate network innovation at scale. This dissertation presents a line of systems work that covers all three directions. It first discusses GraphRex, a network-aware system that combines classic database and systems techniques to push the performance of massive graph queries in current data centers. It then introduces data processing in disaggregated data centers, a promising new cloud proposal. It details TELEPORT, a compute pushdown feature that eliminates data processing performance bottlenecks in disaggregated data centers, and Redy, which provides high-performance caches using remote disaggregated memory. Finally, it presents MimicNet, a fine-grained simulation framework that evaluates network proposals at datacenter scale with machine learning approximation. These systems demonstrate that our ideas in network-centric designs achieve orders of magnitude higher efficiency compared to the state of the art at hyperscale

ScholarlyCommons@Penn

Online learning on the programmable dataplane

Author: Simpson Kyle Andrew
Publication venue
Publication date: 01/01/2022
Field of study

This thesis makes the case for managing computer networks with datadriven methods automated statistical inference and control based on measurement data and runtime observations—and argues for their tight integration with programmable dataplane hardware to make management decisions faster and from more precise data. Optimisation, defence, and measurement of networked infrastructure are each challenging tasks in their own right, which are currently dominated by the use of hand-crafted heuristic methods. These become harder to reason about and deploy as networks scale in rates and number of forwarding elements, but their design requires expert knowledge and care around unexpected protocol interactions. This makes tailored, per-deployment or -workload solutions infeasible to develop. Recent advances in machine learning offer capable function approximation and closed-loop control which suit many of these tasks. New, programmable dataplane hardware enables more agility in the network— runtime reprogrammability, precise traffic measurement, and low latency on-path processing. The synthesis of these two developments allows complex decisions to be made on previously unusable state, and made quicker by offloading inference to the network. To justify this argument, I advance the state of the art in data-driven defence of networks, novel dataplane-friendly online reinforcement learning algorithms, and in-network data reduction to allow classification of switchscale data. Each requires co-design aware of the network, and of the failure modes of systems and carried traffic. To make online learning possible in the dataplane, I use fixed-point arithmetic and modify classical (non-neural) approaches to take advantage of the SmartNIC compute model and make use of rich device local state. I show that data-driven solutions still require great care to correctly design, but with the right domain expertise they can improve on pathological cases in DDoS defence, such as protecting legitimate UDP traffic. In-network aggregation to histograms is shown to enable accurate classification from fine temporal effects, and allows hosts to scale such classification to far larger flow counts and traffic volume. Moving reinforcement learning to the dataplane is shown to offer substantial benefits to stateaction latency and online learning throughput versus host machines; allowing policies to react faster to fine-grained network events. The dataplane environment is key in making reactive online learning feasible—to port further algorithms and learnt functions, I collate and analyse the strengths of current and future hardware designs, as well as individual algorithms

Glasgow Theses Service

Evaluating and Improving Internet Load Balancing with Large-Scale Latency Measurements

Author: Pi Yibo
Publication venue
Publication date: 01/01/2021
Field of study

Load balancing is used in the Internet to distribute load across resources at different levels, from global load balancing that distributes client requests across servers at the Internet level to path-level load balancing that balances traffic across load-balanced paths. These load balancing algorithms generally work under certain assumptions on performance similarity. Specifically, global load balancing divides the Internet address space into client aggregations and assumes that clients in the same aggregation have similar performance to the same server; load-balanced paths are generally selected for load balancing as if they have similar performance. However, as performance similarity is typically achieved with similarity in path properties, e.g., topology and hop count, which do not necessarily lead to similar performance, performance between clients in the same aggregation and between load-balanced paths could differ significantly. This dissertation evaluates and improves global and path-level load balancing in terms of performance similarity. We achieve this with large-scale latency measurements, which not only allow us to systematically identify and evaluate the performance issues of Internet load balancing at scale, but also enable us to develop data-driven approaches to improve the performance. Specifically, this dissertation consists of three parts. First, we study the issues of existing client aggregations for global load balancing and then design AP-atoms, a data-driven client aggregation learned from passive large-scale latency measurements. Second, we show that the latency imbalance between load-balanced paths, previously deemed insignificant, is now both significant and prevalent. We present Flipr, a network prober that actively collects large-scale latency measurements to characterize the latency imbalance issue. Lastly, we design another network prober, Congi, that can detect congestion at scale and use Congi to study the congestion imbalance problem at scale. For both latency and congestion imbalance, we demonstrate that they could greatly affect the performance of various applications.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/168012/1/yibo_1.pd

Deep Blue Documents at the University of Michigan

An Initial Framework Assessing the Safety of Complex Systems

Author: Burton Simon
Garnett Philip
McDermid John Alexander
Weaver Rob
Publication venue: Zenodo
Publication date: 07/12/2020
Field of study

Trabajo presentado en la Conference on Complex Systems, celebrada online del 7 al 11 de diciembre de 2020.Atmospheric blocking events, that is large-scale nearly stationary atmospheric pressure patterns, are often associated with extreme weather in the mid-latitudes, such as heat waves and cold spells which have significant consequences on ecosystems, human health and economy. The high impact of blocking events has motivated numerous studies. However, there is not yet a comprehensive theory explaining their onset, maintenance and decay and their numerical prediction remains a challenge. In recent years, a number of studies have successfully employed complex network descriptions of fluid transport to characterize dynamical patterns in geophysical flows. The aim of the current work is to investigate the potential of so called Lagrangian flow networks for the detection and perhaps forecasting of atmospheric blocking events. The network is constructed by associating nodes to regions of the atmosphere and establishing links based on the flux of material between these nodes during a given time interval. One can then use effective tools and metrics developed in the context of graph theory to explore the atmospheric flow properties. In particular, Ser-Giacomi et al. [1] showed how optimal paths in a Lagrangian flow network highlight distinctive circulation patterns associated with atmospheric blocking events. We extend these results by studying the behavior of selected network measures (such as degree, entropy and harmonic closeness centrality)at the onset of and during blocking situations, demonstrating their ability to trace the spatio-temporal characteristics of these events.This research was conducted as part of the CAFE (Climate Advanced Forecasting of sub-seasonal Extremes) Innovative Training Network which has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 813844

Digital.CSIC

White Rose Research Online

Esprit '89. Proceedings of the 6th annual Esprit conference. Brussels, 27 November - 1 December 1989. EUR 12512 EN

Author
Publication venue
Publication date: 01/01/1989
Field of study

Archive of European Integration

An Approach Based on Particle Swarm Optimization for Inspection of Spacecraft Hulls by a Swarm of Miniaturized Robots

Author: Boghaert Johannes
Ebert Julia
Ekblaw Ariel
Haghighat Bahar
Liu Fanghzheng
Minsky-Primus Zev
Nagpal Radhika
Nisser Martin
Publication venue: Springer Verlag
Publication date: 29/10/2022
Field of study

The remoteness and hazards that are inherent to the operating environments of space infrastructures promote their need for automated robotic inspection. In particular, micrometeoroid and orbital debris impact and structural fatigue are common sources of damage to spacecraft hulls. Vibration sensing has been used to detect structural damage in spacecraft hulls as well as in structural health monitoring practices in industry by deploying static sensors. In this paper, we propose using a swarm of miniaturized vibration-sensing mobile robots realizing a network of mobile sensors. We present a distributed inspection algorithm based on the bio-inspired particle swarm optimization and evolutionary algorithm niching techniques to deliver the task of enumeration and localization of an a priori unknown number of vibration sources on a simplified 2.5D spacecraft surface. Our algorithm is deployed on a swarm of simulated cm-scale wheeled robots. These are guided in their inspection task by sensing vibrations arising from failure points on the surface which are detected by on-board accelerometers. We study three performance metrics: (1) proximity of the localized sources to the ground truth locations, (2) time to localize each source, and (3) time to finish the inspection task given a 75% inspection coverage threshold. We find that our swarm is able to successfully localize the present so

Proceedings - University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen