368 research outputs found
Runtime support for irregular computation in MPI-based applications
In recent years there are increasing number of applications that have been using irregular computation models in various domains, such as computational chemistry, bioinformatics, nuclear reactor simulation and social network analysis. Due to the irregular and data-dependent communication patterns and sparse data structures involved in those applications, the traditional parallel programming model and runtime need to be carefully designed and implemented in order to accommodate the performance and scalability requirements of those irregular applications on large-scale systems.
The Message Passing Interface (MPI) is the industry standard communication library for high performance computing. However, whether MPI can serve as a suitable programming model / runtime for irregular applications or not is one of the most debated aspects in the community. The goal of this thesis is to investigate the suitability of MPI to irregular applications.
This thesis consists of two subtopics. The first subtopic focuses on improving MPI runtime to support the irregular applications from perspective of scalability and performance. The first three parts in this subtopic focus on MPI one-sided communication. In the first part, we present a thorough survey of current MPI one-sided implementations and illustrate scalability limitations in those implementations. In the second part, we propose a new design and implementation of MPI one-sided communication, called ScalaRMA, to effectively address those scalability limitations. The third part in this subtopic focuses on various issuing strategies in MPI one-sided communication. We propose an adaptive issuing strategy which can adaptively choose between delayed issuing strategy and eager issuing strategy in MPI runtime to achieve high performance based on current communication volume in MPI-based application. The last part in this subtopic is to tackle the scalability limitations in the virtual connection (VC) objects in MPI implementation. We propose a scalable design to reduce the memory consumption of VC objects in MPI runtime.
The second subtopic of this thesis focuses on improving MPI programming model to better support the irregular applications. Traditional two-sided data movement model in MPI standard designed for scientific computation provides a paradigm for user to specify how to move the data between processes, however, it does not provide interface to flexibly manage the computation, which means user needs to explicitly manage where the computation should be performed. This model is not well suited for irregular applications which involve irregular and data-dependent communication pattern. In this work, we combine Active Messages (AM), an alternative programming paradigm which is more suitable for irregular computations, with traditional MPI data movement model, and propose a generalized MPI-interoperable Active Messages framework (MPI-AM). The framework allows MPI-based applications to incrementally use AMs only when necessary, avoiding rewriting the entire MPI-based application. Such framework integrates data movement and computation together in the programming model and MPI can coordinate the computation and communication in a much more flexible manner. In this subtopic, we propose several strategies including message streaming, buffer management and asynchronous processing, in order to efficiently handle AMs inside MPI. We also propose subtle correctness semantics of MPI-AM to define how AMs can work correctly with other MPI messages in the system, from perspectives of memory consistency, concurrency, ordering and atomicity
An ontology enhanced parallel SVM for scalable spam filter training
This is the post-print version of the final paper published in Neurocomputing. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2013 Elsevier B.V.Spam, under a variety of shapes and forms, continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) techniques have been proposed for spam filter training and classification. However, SVM training is a computationally intensive process. This paper presents a MapReduce based parallel SVM algorithm for scalable spam filter training. By distributing, processing and optimizing the subsets of the training data across multiple participating computer nodes, the parallel SVM reduces the training time significantly. Ontology semantics are employed to minimize the impact of accuracy degradation when distributing the training data among a number of SVM classifiers. Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart
The Inter-cloud meta-scheduling
Inter-cloud is a recently emerging approach that expands cloud elasticity. By facilitating an adaptable setting, it purposes at the realization of a scalable resource provisioning that enables a diversity of cloud user requirements to be handled efficiently. This studyâs contribution is in the inter-cloud performance optimization of job executions using metascheduling concepts. This includes the development of the inter-cloud meta-scheduling (ICMS) framework, the ICMS optimal schemes and the SimIC toolkit. The ICMS model is an architectural strategy for managing and scheduling user services in virtualized dynamically inter-linked clouds. This is achieved by the development of a model that includes a set of algorithms, namely the Service-Request, Service-Distribution, Service-Availability and Service-Allocation algorithms. These along with resource management optimal schemes offer the novel functionalities of the ICMS where the message exchanging implements the job distributions method, the VM deployment offers the VM management features and the local resource management system details the management of the local cloud schedulers. The generated system offers great flexibility by facilitating a lightweight resource management methodology while at the same time handling the heterogeneity of different clouds through advanced service level agreement coordination. Experimental results are productive as the proposed ICMS model achieves enhancement of the performance of service distribution for a variety of criteria such as service execution times, makespan, turnaround times, utilization levels and energy consumption rates for various inter-cloud entities, e.g. users, hosts and VMs. For example, ICMS optimizes the performance of a non-meta-brokering inter-cloud by 3%, while ICMS with full optimal schemes achieves 9% optimization for the same configurations. The whole experimental platform is implemented into the inter-cloud Simulation toolkit (SimIC) developed by the author, which is a discrete event simulation framework
An In-Depth Analysis of the Slingshot Interconnect
The interconnect is one of the most critical components in large scale
computing systems, and its impact on the performance of applications is going
to increase with the system size. In this paper, we will describe Slingshot, an
interconnection network for large scale computing systems. Slingshot is based
on high-radix switches, which allow building exascale and hyperscale
datacenters networks with at most three switch-to-switch hops. Moreover,
Slingshot provides efficient adaptive routing and congestion control
algorithms, and highly tunable traffic classes. Slingshot uses an optimized
Ethernet protocol, which allows it to be interoperable with standard Ethernet
devices while providing high performance to HPC applications. We analyze the
extent to which Slingshot provides these features, evaluating it on
microbenchmarks and on several applications from the datacenter and AI worlds,
as well as on HPC applications. We find that applications running on Slingshot
are less affected by congestion compared to previous generation networks.Comment: To be published in Proceedings of The International Conference for
High Performance Computing Networking, Storage, and Analysis (SC '20) (2020
3rd EGEE User Forum
We have organized this book in a sequence of chapters, each chapter associated with an application or technical theme introduced by an overview of the contents, and a summary of the main conclusions coming from the Forum for the chapter topic. The first chapter gathers all the plenary session keynote addresses, and following this there is a sequence of chapters covering the application flavoured sessions. These are followed by chapters with the flavour of Computer Science and Grid Technology. The final chapter covers the important number of practical demonstrations and posters exhibited at the Forum. Much of the work presented has a direct link to specific areas of Science, and so we have created a Science Index, presented below. In addition, at the end of this book, we provide a complete list of the institutes and countries involved in the User Forum
Monitoring and Information Alignment in Pursuit of an IoT-Enabled Self-Sustainable Interoperability
To remain competitive with big corporations, small and medium-sized enterprises (SMEs) often need to be more dynamic, adapt to new business situations, react faster, and thereby survive in todayâs global economy. To do so, SMEs normally seek to create consortiums, thus gaining access to new and more opportunities. However, this strategy may also lead to complications. Due to the different sources of enterprise models and semantics, organizations are experiencing difficulties in seamlessly exchanging vital information via electronic means. In their attempt to address this issue, most seek to achieve interoperability by establishing peer-to-peer mappings with different business partners, or by using neutral data standards to regulate communications in optimized networks. Moreover, systems are more and more dynamic, frequently changing to answer new customerâs requirements, causing new interoperability problems and a reduction of efficiency. Another situation that is constantly changing is the devices used in the enterprises, as the Enterprise Information Systems, devices are used to register internal data, and to be used to monitor several aspects. These devices are constantly changing, following the evolution and growth of the market. So, it is important to monitor these devices and doing a model representation of them. This dissertation proposes a self-sustainable interoperable framework to monitor existing enterprise information systems and their devices, monitor the device/enterprise network for changes and automatically detecting model changes. With this, network harmonization disruptions are detected in a timely way, and possible solutions are suggested to regain the interoperable status, thus enhancing robustness for reaching sustainability of business networks along time
Performance evaluation of an open distributed platform for realistic traffic generation
Network researchers have dedicated a notable part of their efforts
to the area of modeling traffic and to the implementation of efficient traffic
generators. We feel that there is a strong demand for traffic generators
capable to reproduce realistic traffic patterns according to theoretical
models and at the same time with high performance. This work presents an open
distributed platform for traffic generation that we called distributed
internet traffic generator (D-ITG), capable of producing traffic (network,
transport and application layer) at packet level and of accurately replicating
appropriate stochastic processes for both inter departure time (IDT) and
packet size (PS) random variables. We implemented two different versions of
our distributed generator. In the first one, a log server is in charge of
recording the information transmitted by senders and receivers and these
communications are based either on TCP or UDP. In the other one, senders and
receivers make use of the MPI library. In this work a complete performance
comparison among the centralized version and the two distributed versions of
D-ITG is presented
- âŠ