571,808 research outputs found

    Combined shared and distributed memory ab-initio computations of molecular-hydrogen systems in the correlated state: process pool solution and two-level parallelism

    Full text link
    An efficient computational scheme devised for investigations of ground state properties of the electronically correlated systems is presented. As an example, (H2)n(H_{2})_{n} chain is considered with the long-range electron-electron interactions taken into account. The implemented procedure covers: (i) single-particle Wannier wave-function basis construction in the correlated state, (ii) microscopic parameters calculation, and (iii) ground state energy optimization. The optimization loop is based on highly effective process-pool solution - specific root-workers approach. The hierarchical, two-level parallelism was applied: both shared (by use of Open Multi-Processing) and distributed (by use of Message Passing Interface) memory models were utilized. We discuss in detail the feature that such approach results in a substantial increase of the calculation speed reaching factor of 300300 for the fully parallelized solution.Comment: 14 pages, 10 figures, 1 tabl

    HEC: Collaborative Research: SAM^2 Toolkit: Scalable and Adaptive Metadata Management for High-End Computing

    Get PDF
    The increasing demand for Exa-byte-scale storage capacity by high end computing applications requires a higher level of scalability and dependability than that provided by current file and storage systems. The proposal deals with file systems research for metadata management of scalable cluster-based parallel and distributed file storage systems in the HEC environment. It aims to develop a scalable and adaptive metadata management (SAM2) toolkit to extend features of and fully leverage the peak performance promised by state-of-the-art cluster-based parallel and distributed file storage systems used by the high performance computing community. There is a large body of research on data movement and management scaling, however, the need to scale up the attributes of cluster-based file systems and I/O, that is, metadata, has been underestimated. An understanding of the characteristics of metadata traffic, and an application of proper load-balancing, caching, prefetching and grouping mechanisms to perform metadata management correspondingly, will lead to a high scalability. It is anticipated that by appropriately plugging the scalable and adaptive metadata management components into the state-of-the-art cluster-based parallel and distributed file storage systems one could potentially increase the performance of applications and file systems, and help translate the promise and potential of high peak performance of such systems to real application performance improvements. The project involves the following components: 1. Develop multi-variable forecasting models to analyze and predict file metadata access patterns. 2. Develop scalable and adaptive file name mapping schemes using the duplicative Bloom filter array technique to enforce load balance and increase scalability 3. Develop decentralized, locality-aware metadata grouping schemes to facilitate the bulk metadata operations such as prefetching. 4. Develop an adaptive cache coherence protocol using a distributed shared object model for client-side and server-side metadata caching. 5. Prototype the SAM2 components into the state-of-the-art parallel virtual file system PVFS2 and a distributed storage data caching system, set up an experimental framework for a DOE CMS Tier 2 site at University of Nebraska-Lincoln and conduct benchmark, evaluation and validation studies

    Distributed Cooperative Control of Multi-Agent Systems Under Detectability and Communication Constraints

    Get PDF
    Cooperative control of multi-agent systems has recently gained widespread attention from the scientific communities due to numerous applications in areas such as the formation control in unmanned vehicles, cooperative attitude control of spacecrafts, clustering of micro-satellites, environmental monitoring and exploration by mobile sensor networks, etc. The primary goal of a cooperative control problem for multi-agent systems is to design a decentralized control algorithm for each agent, relying on the local coordination of their actions to exhibit a collective behavior. Common challenges encountered in the study of cooperative control problems are unavailable group-level information, and limited bandwidth of the shared communication. In this dissertation, we investigate one of such cooperative control problems, namely cooperative output regulation, under various local and global level constraints coming from physical and communication limitations. The objective of the cooperative output regulation problem (CORP) for multi-agent systems is to design a distributed control strategy for the agents to synchronize their state with an external system, called the leader, in the presence of disturbance inputs. For the problem at hand, we additionally consider the scenario in which none of the agents can independently access the synchronization signal from their view of the leader, and therefore it is not possible for the agents to achieve the group objective by themselves unless they cooperate among members. To this end, we devise a novel distributed estimation algorithm to collectively gather the leader states under the discussed detectability constraint, and then use this estimation to synthesize a distributed control solution to the problem. Next, we extend our results in CORP to the case with uncertain agent dynamics arising from modeling errors. In addition to the detectability constraint, we also assumed that the local regulated error signals are not available to the agents for feedback, and thus none of the agents have all the required measurements to independently synthesize a control solution. By combining the distributed observer and a control law based on the internal model principle for the agents, we offer a solution to the robust CORP under these added constraints. In practical applications of multi-agent systems, it is difficult to consistently maintain a reliable communication between the agents. By considering such challenge in the communication, we study the CORP for the case when agents are connected through a time-varying communication topology. Due to the presence of the detectability constraint that none of the agents can independently access all the leader states at any switching instant, we devise a distributed estimation algorithm for the agents to collectively reconstruct the leader states. Then by using this estimation, a distributed dynamic control solution is offered to solve the CORP under the added communication constraint. Since the fixed communication network is a special case of this time-varying counterpart, the offered control solution can be viewed as a generalization of the former results. For effective validation of previous theoretical results, we apply the control algorithms to a practical case study problem on synchronizing the position of networked motors under time-varying communication. Based on our experimental results, we also demonstrate the uniqueness of derived control solutions. Another communication constraint affecting the cooperative control performance is the presence of network delays. To this regard, first we study the distributed state estimation problem of an autonomous plant by a network of observers under heterogeneous time-invariant delays and then extend to the time-varying counterpart. With the use of a low gain based estimation technique, we derive a sufficient stability condition in terms of the upper bound of the low gain parameter or the time delay to guarantee the convergence of estimation errors. Additionally, when the plant measurements are subject to bounded disturbances, we find that that the local estimation errors also remain bounded. Lastly, by using this estimation, we present a distributed control solution for a leader-follower synchronization problem of a multi-agent system. Next, we present another case study concerning a synchronization control problem of a group of distributed generators in an islanded microgrid under unknown time-varying latency. Similar to the case of delayed communication in aforementioned works, we offer a low gain based distributed control protocol to synchronize the terminal voltage and inverter operating frequency

    High performance Java for multi-core systems

    Get PDF
    [Abstract] The interest in Java within the High Performance Computing (HPC) community has been rising during the last years thanks to its noticeable performance improvements and its productivity features. In a context where the trend to increase the number of cores per processor is leading to the generalization of many-core processors and accelerators, multithreading as an inherent feature of the language makes Java extremely interesting to exploit the performance provided by multi- and manycore architectures. This PhD Thesis presents a thorough analysis of the current state of the art regarding multi- and many-core programming in Java and provides the design, implementation and evaluation of several solutions to enable Java for the many-core era. To achieve this, a shared memory message-passing solution has been implemented to provide shared memory programming with the scalability of distributed memory paradigms, also with the benefits of a portable programming model that allows the developed codes to be run on distributed memory systems. Moreover, representative collective operations, involving computation and communication among different processes or threads, have been optimized, also introducing in Java new features for scalability from the MPI 3.0 specification, namely nonblocking collectives. Regarding the exploitation of many-core architectures, the lack of direct Java support forces to resort to wrappers or higher-level solutions to translate Java code into CUDA or OpenCL. The most relevant among these solutions have been evaluated and thoroughly analyzed in terms of performance and productivity. Guidelines for taking advantage of shared memory environments have been derived during the analysis and development of the proposed solutions, and the main conclusion is that the use of Java for shared memory programming on multi- and many-core systems is not only productive but also can provide high performance competitive results. However, in order to effectively take advantage of the underlying multi- and many-core architectures, the key is the availability of optimized middleware that abstracts multithreading details from the user, like the one proposed in this Thesis, and the optimization of common operations like collective communications

    Computationally efficient simulation in urban mechanised tunnelling based on multi-level BIM models

    Get PDF
    The design of complex underground infrastructure projects involves various empirical, analytical or numerical models with different levels of complexity. The use of simulation models in current state-of-the-art tunnel design process can be cumbersome when significant manual, time-consuming preparation, analysis and excessive computing resources are required. This paper addresses the challenges connected with minimising the user workload and computational time, as well as enabling real-time computations during the construction. To ensure a seamless workflow during design and to minimise the computation time of the analysis, we propose a novel concept for BIM-based numerical simulations, enabling the modelling of the tunnel advance on different levels of detail in terms of geometrical representation, material modelling and modelling of the advancement process. To ensure computational efficiency, the simulation software has been developed with special emphasis on efficient implementation, including parallelisation strategies on shared and distributed memory systems. For real-time on-demand calculations, simulation based meta models are integrated into the software platform. The components of the BIM-based multi-level simulation concept are described and evaluated in detail by means of representative numerical examples

    Parallel symbolic state-space exploration is difficult, but what is the alternative?

    Full text link
    State-space exploration is an essential step in many modeling and analysis problems. Its goal is to find the states reachable from the initial state of a discrete-state model described. The state space can used to answer important questions, e.g., "Is there a dead state?" and "Can N become negative?", or as a starting point for sophisticated investigations expressed in temporal logic. Unfortunately, the state space is often so large that ordinary explicit data structures and sequential algorithms cannot cope, prompting the exploration of (1) parallel approaches using multiple processors, from simple workstation networks to shared-memory supercomputers, to satisfy large memory and runtime requirements and (2) symbolic approaches using decision diagrams to encode the large structured sets and relations manipulated during state-space generation. Both approaches have merits and limitations. Parallel explicit state-space generation is challenging, but almost linear speedup can be achieved; however, the analysis is ultimately limited by the memory and processors available. Symbolic methods are a heuristic that can efficiently encode many, but not all, functions over a structured and exponentially large domain; here the pitfalls are subtler: their performance varies widely depending on the class of decision diagram chosen, the state variable order, and obscure algorithmic parameters. As symbolic approaches are often much more efficient than explicit ones for many practical models, we argue for the need to parallelize symbolic state-space generation algorithms, so that we can realize the advantage of both approaches. This is a challenging endeavor, as the most efficient symbolic algorithm, Saturation, is inherently sequential. We conclude by discussing challenges, efforts, and promising directions toward this goal

    Transparent and efficient shared-state management for optimistic simulations on multi-core machines

    Get PDF
    Traditionally, Logical Processes (LPs) forming a simulation model store their execution information into disjoint simulations states, forcing events exchange to communicate data between each other. In this work we propose the design and implementation of an extension to the traditional Time Warp (optimistic) synchronization protocol for parallel/distributed simulation, targeted at shared-memory/multicore machines, allowing LPs to share parts of their simulation states by using global variables. In order to preserve optimism's intrinsic properties, global variables are transparently mapped to multi-version ones, so to avoid any form of safety predicate verification upon updates. Execution's consistency is ensured via the introduction of a new rollback scheme which is triggered upon the detection of an incorrect global variable's read. At the same time, efficiency in the execution is guaranteed by the exploitation of non-blocking algorithms in order to manage the multi-version variables' lists. Furthermore, our proposal is integrated with the simulation model's code through software instrumentation, in order to allow the application-level programmer to avoid using any specific API to mark or to inform the simulation kernel of updates to global variables. Thus we support full transparency. An assessment of our proposal, comparing it with a traditional message-passing implementation of variables' multi-version is provided as well. © 2012 IEEE
    • …
    corecore