396,797 research outputs found

    Overview of Swallow --- A Scalable 480-core System for Investigating the Performance and Energy Efficiency of Many-core Applications and Operating Systems

    Full text link
    We present Swallow, a scalable many-core architecture, with a current configuration of 480 x 32-bit processors. Swallow is an open-source architecture, designed from the ground up to deliver scalable increases in usable computational power to allow experimentation with many-core applications and the operating systems that support them. Scalability is enabled by the creation of a tile-able system with a low-latency interconnect, featuring an attractive communication-to-computation ratio and the use of a distributed memory configuration. We analyse the energy and computational and communication performances of Swallow. The system provides 240GIPS with each core consuming 71--193mW, dependent on workload. Power consumption per instruction is lower than almost all systems of comparable scale. We also show how the use of a distributed operating system (nOS) allows the easy creation of scalable software to exploit Swallow's potential. Finally, we show two use case studies: modelling neurons and the overlay of shared memory on a distributed memory system.Comment: An open source release of the Swallow system design and code will follow and references to these will be added at a later dat

    Heracles: Fully Synthesizable Parameterized MIPS-Based Multicore System

    Get PDF
    Heracles is an open-source complete multicore system written in Verilog. It is fully parameterized and can be reconfigured and synthesized into different topologies and sizes. Each processing node has a 7-stage pipeline, fully bypassed, microprocessor running the MIPS-III ISA, a 4-stage input-buffer, virtual-channel router, and a local variable-size shared memory. Our design is highly modular with clear interfaces between the core, the memory hierarchy, and the on-chip network. In the baseline design, the microprocessor is attached to two caches, one instruction cache and one data cache, which are oblivious to the global memory organization. The memory system in Heracles can be configured as one single global shared memory (SM), or distributed shared memory (DSM), or any combination thereof. Each core is connected to the rest of the network of processors by a parameterized, realistic, wormhole router. We show different topology configurations of the system, and their synthesis results on the Xilinx Virtex-5 LX330T FPGA board. We also provide a small MIPS cross-compiler toolchain to assist in developing software for Heracles

    AllScale API

    Get PDF
    Effectively implementing scientific algorithms in distributed memory parallel applications is a difficult task for domain scientists, as evident by the large number of domain-specific languages and libraries available today attempting to facilitate the process. However, they usually provide a closed set of parallel patterns and are not open for extension without vast modifications to the underlying system. In this work, we present the AllScale API, a programming interface for developing distributed memory parallel applications with the ease of shared memory programming models. The AllScale API is closed for a modification but open for an extension, allowing new user-defined parallel patterns and data structures to be implemented based on existing core primitives and therefore fully supported in the AllScale framework. Focusing on high-level functionality directly offered to application developers, we present the design advantages of such an API design, detail some of its specifications and evaluate it using three real-world use cases. Our results show that AllScale decreases the complexity of implementing scientific applications for distributed memory while attaining comparable or higher performance compared to MPI reference implementations

    HP-CERTI: Towards a high performance, high availability open source RTI for composable simulations (04F-SIW-014)

    Get PDF
    Composing simulations of complex systems from already existing simulation components remains a challenging issue. Motivations for composable simulation include generation of a given federation driven by operational requirements provided "on the fly". The High Level Architecture, initially developed for designing fully distributed simulations, can be considered as an interoperability standard for composing simulations from existing components. Requirements for constructing such complex simulations are quite different from those discussed for distributed simulations. Although interoperability and reusability remain essential, both high performance and availability have also to be considered to fulfill the requirements of the end user. ONERA is currently designing a High Performance / High Availability HLA Run-time Infrastructure from its open source implementation of HLA 1.3 specifications. HP-CERTI is a software package including two main components: the first one, SHM-CERTI, provides an optimized version of CERTI based on a shared memory communication scheme; the second one, Kerrighed-CERTI, allows the deployment of CERTI through the control of the Kerrighed Single System Image operating system for clusters, currently designed by IRISA. This paper describes the design of both high performance and availability Runtime Infrastructures, focusing on the architecture of SHM-CERTI. This work is carried out in the context of the COCA (High Performance Distributed Simulation and Models Reuse) Project, sponsored by the DGA/STTC (Délégation Générale pour l'Armement/Service des Stratégies Techniques et des Technologies Communes) of the French Ministry of Defense

    An overview of the wcd EST clustering tool

    Get PDF
    Summary: The wcd system is an open source tool for clustering expressed sequence tags (EST) and other DNA and RNA sequences. wcd allows efficient all-versus-all comparison of ESTs using either the d 2 distance function or edit distance, improving existing implementations of d 2. It supports merging, refinement and reclustering of clusters. It is ‘drop in’ compatible with the StackPack clustering package. wcd supports parallelization under both shared memory and cluster architectures. It is distributed with an EMBOSS wrapper allowing wcd to be installed as part of an EMBOSS installation (and so provided by a web server)

    iPrivacy: a Distributed Approach to Privacy on the Cloud

    Full text link
    The increasing adoption of Cloud storage poses a number of privacy issues. Users wish to preserve full control over their sensitive data and cannot accept that it to be accessible by the remote storage provider. Previous research was made on techniques to protect data stored on untrusted servers; however we argue that the cloud architecture presents a number of open issues. To handle them, we present an approach where confidential data is stored in a highly distributed database, partly located on the cloud and partly on the clients. Data is shared in a secure manner using a simple grant-and-revoke permission of shared data and we have developed a system test implementation, using an in-memory RDBMS with row-level data encryption for fine-grained data access controlComment: 13 pages, International Journal on Advances in Security 2011 vol.4 no 3 & 4. arXiv admin note: substantial text overlap with arXiv:1012.0759, arXiv:1109.355

    High Performance Regional Ocean Modeling with GPU Acceleration

    Get PDF
    The Regional Ocean Modeling System (ROMS) is an open-source, free-surface, primitive equation ocean model used by the scientific community for a diverse range of applications [1]. ROMS employs sophisticated numerical techniques, including a split-explicit time-stepping scheme that treats the fast barotropic (2D) and slow baroclinic (3D) modes separately for improved efficiency [2]. ROMS also contains a suite of data assimilation tools that allow the user to improve the accuracy of a simulation by incorporating observational data. These tools are based on four dimensional variational methods [3], which generate reliable results, but require more computational resources than without any assimilation of data. The implementation of ROMS supports two parallel computing models; a distributed memory model that utilizes Message Passing Interface (MPI), and a shared memory model that utilizes OpenMP. Prior research has shown that portions of ROMS can also be executed on a General Purpose Graphics Processing Unit (GPGPU) to take advantage of the massively parallel architecture available on those systems [4]. This paper presents a comparison between two forms of parallelism. NVIDIA Kepler K20X GPUs were used for performance measurement of GPU parallelism using CUDA while an Intel Xeon E5-2650 was used for shared memory parallelism using OpenMP. The implementation is benchmarked using idealistic marine conditions. Our experiments show that OpenMP was the fastest, followed closely by CUDA, while the normal serial version was considerably slower

    iPrivacy : a distributed approach to privacy on the cloud

    Get PDF
    The increasing adoption of Cloud storage poses a number of privacy issues. Users wish to preserve full control over their sensitive data and cannot accept that it is accessible by the remote storage provider. Previous research was made on techniques to protect data stored on untrusted servers; however we argue that the cloud architecture presents a number of open issues. To handle them, we present an approach where confidential data is stored in a highly distributed database, partly located on the cloud and partly on the clients. Data is shared in a secure manner using a simple grant-and-revoke permission of shared data and we have developed a system test implementation, using an in memory Relational Data Base Management System with row-level data encryption for fine-grained data access control
    • 

    corecore