2,567 research outputs found
Frameworks for Protocol Implementation
This paper reports on the development of a catalogue of frameworks for protocol implementation. Frameworks are software structures developed for a specific application domain, which can be re-used in the implementation of various different concrete systems in this domain. By using frameworks we aim at increasing the effectiveness of the protocol implementation process. We assume that whenever protocols are directly implemented from their specifications one may be able to increase the correctness and the speed of the implementation process, and the maintainability of the resulting system. We argue that frameworks should match the concepts underlying the techniques used for specifying protocols. Consequently, we couple the development of frameworks for protocol implementation to the investigation of the different alternative design models for protocol specification. This paper presents the approach we have been using to develop frameworks, and illustrates this approach with an example of framework
A Flexible and Modular Framework for Implementing Infrastructures for Global Computing
We present a Java software framework for building infrastructures to support the development of applications for systems where mobility and network awareness are key issues. The framework is particularly useful to develop run-time support for languages oriented towards global computing. It enables platform designers to customize communication protocols and network architectures and guarantees transparency of name management and code mobility in distributed environments. The key features are illustrated by means of a couple of simple case studies
The End of Slow Networks: It's Time for a Redesign
Next generation high-performance RDMA-capable networks will require a
fundamental rethinking of the design and architecture of modern distributed
DBMSs. These systems are commonly designed and optimized under the assumption
that the network is the bottleneck: the network is slow and "thin", and thus
needs to be avoided as much as possible. Yet this assumption no longer holds
true. With InfiniBand FDR 4x, the bandwidth available to transfer data across
network is in the same ballpark as the bandwidth of one memory channel, and it
increases even further with the most recent EDR standard. Moreover, with the
increasing advances of RDMA, the latency improves similarly fast. In this
paper, we first argue that the "old" distributed database design is not capable
of taking full advantage of the network. Second, we propose architectural
redesigns for OLTP, OLAP and advanced analytical frameworks to take better
advantage of the improved bandwidth, latency and RDMA capabilities. Finally,
for each of the workload categories, we show that remarkable performance
improvements can be achieved
GeantV: Results from the prototype of concurrent vector particle transport simulation in HEP
Full detector simulation was among the largest CPU consumer in all CERN
experiment software stacks for the first two runs of the Large Hadron Collider
(LHC). In the early 2010's, the projections were that simulation demands would
scale linearly with luminosity increase, compensated only partially by an
increase of computing resources. The extension of fast simulation approaches to
more use cases, covering a larger fraction of the simulation budget, is only
part of the solution due to intrinsic precision limitations. The remainder
corresponds to speeding-up the simulation software by several factors, which is
out of reach using simple optimizations on the current code base. In this
context, the GeantV R&D project was launched, aiming to redesign the legacy
particle transport codes in order to make them benefit from fine-grained
parallelism features such as vectorization, but also from increased code and
data locality. This paper presents extensively the results and achievements of
this R&D, as well as the conclusions and lessons learnt from the beta
prototype.Comment: 34 pages, 26 figures, 24 table
A New Simplified Federated Single Sign-on System
The work presented in this MPhil thesis addresses this challenge by developing a new simplified FSSO system that allows end-users to access desktop systems, web-based services/applications and non-web based services/applications using one authentication process. This new system achieves this using two major components: an “Authentication Infrastructure Integration Program (AIIP) and an “Integration of Desktop Authentication and Web-based Authentication (IDAWA). The AIIP acquires Kerberos tickets (for end-users who have been authenticated by a Kerberos single sign-on system in one net- work domain) from Kerberos single sign-on systems in different network domains without establishing trust between these Kerberos single sign-on systems. The IDAWA is an extension to the web-based authentication systems (i.e. the web portal), and it authenticates end-users by verifying the end-users\u27 Kerberos tickets. This research also developed new criteria to determine which FSSO system can deliver true single sign-on to the end-users (i.e. allowing end-users to access desktop systems, web-based services/applications and non-web based services/applications using one authentication process). The evaluation shows that the new simplified FSSO system (i.e. the combination of AIIP and IDAWA) can deliver true single sign-on to the end- users. In addition, the evaluation shows the new simplified FSSO system has advantages over existing FSSO systems as it does not require additional modifications to network domains\u27 existing non-web based authentication infrastructures (i.e. Kerberos single sign- on systems) and their firewall rules
Deep Learning Models on CPUs: A Methodology for Efficient Training
GPUs have been favored for training deep learning models due to their highly
parallelized architecture. As a result, most studies on training optimization
focus on GPUs. There is often a trade-off, however, between cost and efficiency
when deciding on how to choose the proper hardware for training. In particular,
CPU servers can be beneficial if training on CPUs was more efficient, as they
incur fewer hardware update costs and better utilizing existing infrastructure.
This paper makes several contributions to research on training deep learning
models using CPUs. First, it presents a method for optimizing the training of
deep learning models on Intel CPUs and a toolkit called ProfileDNN, which we
developed to improve performance profiling. Second, we describe a generic
training optimization method that guides our workflow and explores several case
studies where we identified performance issues and then optimized the Intel
Extension for PyTorch, resulting in an overall 2x training performance increase
for the RetinaNet-ResNext50 model. Third, we show how to leverage the
visualization capabilities of ProfileDNN, which enabled us to pinpoint
bottlenecks and create a custom focal loss kernel that was two times faster
than the official reference PyTorch implementation
- …