34,359 research outputs found
Performance of Network and Service Monitoring Frameworks
The efficiency and the performance of anagement systems is becoming a hot
research topic within the networks and services management community. This
concern is due to the new challenges of large scale managed systems, where the
management plane is integrated within the functional plane and where management
activities have to carry accurate and up-to-date information. We defined a set
of primary and secondary metrics to measure the performance of a management
approach. Secondary metrics are derived from the primary ones and quantifies
mainly the efficiency, the scalability and the impact of management activities.
To validate our proposals, we have designed and developed a benchmarking
platform dedicated to the measurement of the performance of a JMX manager-agent
based management system. The second part of our work deals with the collection
of measurement data sets from our JMX benchmarking platform. We mainly studied
the effect of both load and the number of agents on the scalability, the impact
of management activities on the user perceived performance of a managed server
and the delays of JMX operations when carrying variables values. Our findings
show that most of these delays follow a Weibull statistical distribution. We
used this statistical model to study the behavior of a monitoring algorithm
proposed in the literature, under heavy tail delays distribution. In this case,
the view of the managed system on the manager side becomes noisy and out of
date
Preparing HPC Applications for the Exascale Era: A Decoupling Strategy
Production-quality parallel applications are often a mixture of diverse
operations, such as computation- and communication-intensive, regular and
irregular, tightly coupled and loosely linked operations. In conventional
construction of parallel applications, each process performs all the
operations, which might result inefficient and seriously limit scalability,
especially at large scale. We propose a decoupling strategy to improve the
scalability of applications running on large-scale systems.
Our strategy separates application operations onto groups of processes and
enables a dataflow processing paradigm among the groups. This mechanism is
effective in reducing the impact of load imbalance and increases the parallel
efficiency by pipelining multiple operations. We provide a proof-of-concept
implementation using MPI, the de-facto programming system on current
supercomputers. We demonstrate the effectiveness of this strategy by decoupling
the reduce, particle communication, halo exchange and I/O operations in a set
of scientific and data-analytics applications. A performance evaluation on
8,192 processes of a Cray XC40 supercomputer shows that the proposed approach
can achieve up to 4x performance improvement.Comment: The 46th International Conference on Parallel Processing (ICPP-2017
Algorithmic Based Fault Tolerance Applied to High Performance Computing
We present a new approach to fault tolerance for High Performance Computing
system. Our approach is based on a careful adaptation of the Algorithmic Based
Fault Tolerance technique (Huang and Abraham, 1984) to the need of parallel
distributed computation. We obtain a strongly scalable mechanism for fault
tolerance. We can also detect and correct errors (bit-flip) on the fly of a
computation. To assess the viability of our approach, we have developed a fault
tolerant matrix-matrix multiplication subroutine and we propose some models to
predict its running time. Our parallel fault-tolerant matrix-matrix
multiplication scores 1.4 TFLOPS on 484 processors (cluster jacquard.nersc.gov)
and returns a correct result while one process failure has happened. This
represents 65% of the machine peak efficiency and less than 12% overhead with
respect to the fastest failure-free implementation. We predict (and have
observed) that, as we increase the processor count, the overhead of the fault
tolerance drops significantly
A Survey on the Contributions of Software-Defined Networking to Traffic Engineering
Since the appearance of OpenFlow back in 2008, software-defined networking (SDN) has gained momentum. Although there are some discrepancies between the standards developing organizations working with SDN about what SDN is and how it is defined, they all outline traffic engineering (TE) as a key application. One of the most common objectives of TE is the congestion minimization, where techniques such as traffic splitting among multiple paths or advanced reservation systems are used. In such a scenario, this manuscript surveys the role of a comprehensive list of SDN protocols in TE solutions, in order to assess how these protocols can benefit TE. The SDN protocols have been categorized using the SDN architecture proposed by the open networking foundation, which differentiates among data-controller plane interfaces, application-controller plane interfaces, and management interfaces, in order to state how the interface type in which they operate influences TE. In addition, the impact of the SDN protocols on TE has been evaluated by comparing them with the path computation element (PCE)-based architecture. The PCE-based architecture has been selected to measure the impact of SDN on TE because it is the most novel TE architecture until the date, and because it already defines a set of metrics to measure the performance of TE solutions. We conclude that using the three types of interfaces simultaneously will result in more powerful and enhanced TE solutions, since they benefit TE in complementary ways.European Commission through the Horizon 2020 Research and Innovation Programme (GN4) under Grant 691567
Spanish Ministry of Economy and Competitiveness under the Secure Deployment of Services Over SDN and NFV-based Networks Project S&NSEC under Grant TEC2013-47960-C4-3-
Securing the Participation of Safety-Critical SCADA Systems in the Industrial Internet of Things
In the past, industrial control systems were ‘air gapped’ and
isolated from more conventional networks. They used
specialist protocols, such as Modbus, that are very different
from TCP/IP. Individual devices used proprietary operating
systems rather than the more familiar Linux or Windows.
However, things are changing. There is a move for greater
connectivity – for instance so that higher-level enterprise
management systems can exchange information that helps
optimise production processes. At the same time, industrial
systems have been influenced by concepts from the Internet
of Things; where the information derived from sensors and
actuators in domestic and industrial components can be
addressed through network interfaces. This paper identifies a
range of cyber security and safety concerns that arise from
these developments. The closing sections introduce potential
solutions and identify areas for future research
- …