396,797 research outputs found
Overview of Swallow --- A Scalable 480-core System for Investigating the Performance and Energy Efficiency of Many-core Applications and Operating Systems
We present Swallow, a scalable many-core architecture, with a current
configuration of 480 x 32-bit processors.
Swallow is an open-source architecture, designed from the ground up to
deliver scalable increases in usable computational power to allow
experimentation with many-core applications and the operating systems that
support them.
Scalability is enabled by the creation of a tile-able system with a
low-latency interconnect, featuring an attractive communication-to-computation
ratio and the use of a distributed memory configuration.
We analyse the energy and computational and communication performances of
Swallow. The system provides 240GIPS with each core consuming 71--193mW,
dependent on workload. Power consumption per instruction is lower than almost
all systems of comparable scale.
We also show how the use of a distributed operating system (nOS) allows the
easy creation of scalable software to exploit Swallow's potential. Finally, we
show two use case studies: modelling neurons and the overlay of shared memory
on a distributed memory system.Comment: An open source release of the Swallow system design and code will
follow and references to these will be added at a later dat
Heracles: Fully Synthesizable Parameterized MIPS-Based Multicore System
Heracles is an open-source complete multicore system written in Verilog. It is fully parameterized and can be reconfigured and synthesized into different topologies and sizes. Each processing node has a 7-stage pipeline, fully bypassed, microprocessor running the MIPS-III ISA, a 4-stage input-buffer, virtual-channel router, and a local variable-size shared memory. Our design is highly modular with clear interfaces between the core, the memory hierarchy, and the on-chip network. In the baseline design, the microprocessor is attached to two caches, one instruction cache and one data cache, which are oblivious to the global memory organization. The memory system in Heracles can be configured as one single global shared memory (SM), or distributed shared memory (DSM), or any combination thereof. Each core is connected to the rest of the network of processors by a parameterized, realistic, wormhole router. We show different topology configurations of the system, and their synthesis results on the Xilinx Virtex-5 LX330T FPGA board. We also provide a small MIPS cross-compiler toolchain to assist in developing software for Heracles
AllScale API
Effectively implementing scientific algorithms in distributed memory parallel applications is a difficult task for domain scientists, as evident by the large number of domain-specific languages and libraries available today attempting to facilitate the process. However, they usually provide a closed set of parallel patterns and are not open for extension without vast modifications to the underlying system. In this work, we present the AllScale API, a programming interface for developing distributed memory parallel applications with the ease of shared memory programming models. The AllScale API is closed for a modification but open for an extension, allowing new user-defined parallel patterns and data structures to be implemented based on existing core primitives and therefore fully supported in the AllScale framework. Focusing on high-level functionality directly offered to application developers, we present the design advantages of such an API design, detail some of its specifications and evaluate it using three real-world use cases. Our results show that AllScale decreases the complexity of implementing scientific applications for distributed memory while attaining comparable or higher performance compared to MPI reference implementations
HP-CERTI: Towards a high performance, high availability open source RTI for composable simulations (04F-SIW-014)
Composing simulations of complex systems from already existing simulation components remains a challenging issue. Motivations for composable simulation include generation of a given federation driven by operational requirements provided "on the fly". The High Level Architecture, initially developed for designing fully distributed simulations, can be considered as an interoperability standard for composing simulations from existing components. Requirements for constructing such complex simulations are quite different from those discussed for distributed simulations. Although interoperability and reusability remain essential, both high performance and availability have also to be considered to fulfill the requirements of the end user. ONERA is currently designing a High Performance / High Availability HLA Run-time Infrastructure from its open source implementation of HLA 1.3 specifications. HP-CERTI is a software package including two main components: the first one, SHM-CERTI, provides an optimized version of CERTI based on a shared memory communication scheme; the second one, Kerrighed-CERTI, allows the deployment of CERTI through the control of the Kerrighed Single System Image operating system for clusters, currently designed by IRISA. This paper describes the design of both high performance and availability Runtime Infrastructures, focusing on the architecture of SHM-CERTI. This work is carried out in the context of the COCA (High Performance Distributed Simulation and Models Reuse) Project, sponsored by the DGA/STTC (Délégation Générale pour l'Armement/Service des Stratégies Techniques et des Technologies Communes) of the French Ministry of Defense
An overview of the wcd EST clustering tool
Summary: The wcd system is an open source tool for clustering expressed sequence tags (EST) and other DNA and RNA sequences. wcd allows efficient all-versus-all comparison of ESTs using either the d 2 distance function or edit distance, improving existing implementations of d 2. It supports merging, refinement and reclustering of clusters. It is âdrop inâ compatible with the StackPack clustering package. wcd supports parallelization under both shared memory and cluster architectures. It is distributed with an EMBOSS wrapper allowing wcd to be installed as part of an EMBOSS installation (and so provided by a web server)
iPrivacy: a Distributed Approach to Privacy on the Cloud
The increasing adoption of Cloud storage poses a number of privacy issues.
Users wish to preserve full control over their sensitive data and cannot accept
that it to be accessible by the remote storage provider. Previous research was
made on techniques to protect data stored on untrusted servers; however we
argue that the cloud architecture presents a number of open issues. To handle
them, we present an approach where confidential data is stored in a highly
distributed database, partly located on the cloud and partly on the clients.
Data is shared in a secure manner using a simple grant-and-revoke permission of
shared data and we have developed a system test implementation, using an
in-memory RDBMS with row-level data encryption for fine-grained data access
controlComment: 13 pages, International Journal on Advances in Security 2011 vol.4 no
3 & 4. arXiv admin note: substantial text overlap with arXiv:1012.0759,
arXiv:1109.355
High Performance Regional Ocean Modeling with GPU Acceleration
The Regional Ocean Modeling System (ROMS) is an open-source, free-surface, primitive equation ocean model used by the scientific community for a diverse range of applications [1]. ROMS employs sophisticated numerical techniques, including a split-explicit time-stepping scheme that treats the fast barotropic (2D) and slow baroclinic (3D) modes separately for improved efficiency [2]. ROMS also contains a suite of data assimilation tools that allow the user to improve the accuracy of a simulation by incorporating observational data. These tools are based on four dimensional variational methods [3], which generate reliable results, but require more computational resources than without any assimilation of data. The implementation of ROMS supports two parallel computing models; a distributed memory model that utilizes Message Passing Interface (MPI), and a shared memory model that utilizes OpenMP. Prior research has shown that portions of ROMS can also be executed on a General Purpose Graphics Processing Unit (GPGPU) to take advantage of the massively parallel architecture available on those systems [4]. This paper presents a comparison between two forms of parallelism. NVIDIA Kepler K20X GPUs were used for performance measurement of GPU parallelism using CUDA while an Intel Xeon E5-2650 was used for shared memory parallelism using OpenMP. The implementation is benchmarked using idealistic marine conditions. Our experiments show that OpenMP was the fastest, followed closely by CUDA, while the normal serial version was considerably slower
iPrivacy : a distributed approach to privacy on the cloud
The increasing adoption of Cloud storage poses a number of privacy issues. Users wish to preserve full control over their sensitive data and cannot accept that it is accessible by the remote storage provider. Previous research was made on techniques to protect data stored on untrusted servers; however we argue that the cloud architecture presents a number of open issues. To handle them, we present an approach where confidential data is stored in a highly distributed database, partly located on the cloud and partly on the clients. Data is shared in a secure manner using a simple grant-and-revoke permission of shared data and we have developed a system test implementation, using an in memory Relational Data Base Management System with row-level data encryption for fine-grained data access control
- âŠ