25,266 research outputs found
Overview of Swallow --- A Scalable 480-core System for Investigating the Performance and Energy Efficiency of Many-core Applications and Operating Systems
We present Swallow, a scalable many-core architecture, with a current
configuration of 480 x 32-bit processors.
Swallow is an open-source architecture, designed from the ground up to
deliver scalable increases in usable computational power to allow
experimentation with many-core applications and the operating systems that
support them.
Scalability is enabled by the creation of a tile-able system with a
low-latency interconnect, featuring an attractive communication-to-computation
ratio and the use of a distributed memory configuration.
We analyse the energy and computational and communication performances of
Swallow. The system provides 240GIPS with each core consuming 71--193mW,
dependent on workload. Power consumption per instruction is lower than almost
all systems of comparable scale.
We also show how the use of a distributed operating system (nOS) allows the
easy creation of scalable software to exploit Swallow's potential. Finally, we
show two use case studies: modelling neurons and the overlay of shared memory
on a distributed memory system.Comment: An open source release of the Swallow system design and code will
follow and references to these will be added at a later dat
Memory Hierarchy Hardware-Software Co-design in Embedded Systems
The memory hierarchy is the main bottleneck in modern computer systems as the gap between the speed of the processor and the memory continues to grow larger. The situation in embedded systems is even worse. The memory hierarchy consumes a large amount of chip area and energy, which are precious resources in embedded systems. Moreover, embedded systems have multiple design objectives such as performance, energy consumption, and area, etc.
Customizing the memory hierarchy for specific applications is a very important way to take full advantage of limited resources to maximize the performance. However, the traditional custom memory hierarchy design methodologies are phase-ordered. They separate the application optimization from the memory hierarchy architecture design, which tend to result in local-optimal solutions. In traditional Hardware-Software co-design methodologies, much of the work has focused on utilizing reconfigurable logic to partition the computation. However, utilizing reconfigurable logic to perform the memory hierarchy design is seldom addressed.
In this paper, we propose a new framework for designing memory hierarchy for embedded systems. The framework will take advantage of the flexible reconfigurable logic to customize the memory hierarchy for specific applications. It combines the application optimization and memory hierarchy design together to obtain a global-optimal solution. Using the framework, we performed a case study to design a new software-controlled instruction memory that showed promising potential.Singapore-MIT Alliance (SMA
A Survey of Techniques for Architecting TLBs
“Translation lookaside buffer” (TLB) caches virtual to physical address translation information and is used
in systems ranging from embedded devices to high-end servers. Since TLB is accessed very frequently
and a TLB miss is extremely costly, prudent management of TLB is important for improving performance
and energy efficiency of processors. In this paper, we present a survey of techniques for architecting and
managing TLBs. We characterize the techniques across several dimensions to highlight their similarities and
distinctions. We believe that this paper will be useful for chip designers, computer architects and system
engineers
The future of computing beyond Moore's Law.
Moore's Law is a techno-economic model that has enabled the information technology industry to double the performance and functionality of digital electronics roughly every 2 years within a fixed cost, power and area. Advances in silicon lithography have enabled this exponential miniaturization of electronics, but, as transistors reach atomic scale and fabrication costs continue to rise, the classical technological driver that has underpinned Moore's Law for 50 years is failing and is anticipated to flatten by 2025. This article provides an updated view of what a post-exascale system will look like and the challenges ahead, based on our most recent understanding of technology roadmaps. It also discusses the tapering of historical improvements, and how it affects options available to continue scaling of successors to the first exascale machine. Lastly, this article covers the many different opportunities and strategies available to continue computing performance improvements in the absence of historical technology drivers. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
Programming the Navier-Stokes computer: An abstract machine model and a visual editor
The Navier-Stokes computer is a parallel computer designed to solve Computational Fluid Dynamics problems. Each processor contains several floating point units which can be configured under program control to implement a vector pipeline with several inputs and outputs. Since the development of an effective compiler for this computer appears to be very difficult, machine level programming seems necessary and support tools for this process have been studied. These support tools are organized into a graphical program editor. A programming process is described by which appropriate computations may be efficiently implemented on the Navier-Stokes computer. The graphical editor would support this programming process, verifying various programmer choices for correctness and deducing values such as pipeline delays and network configurations. Step by step details are provided and demonstrated with two example programs
The C Object System: Using C as a High-Level Object-Oriented Language
The C Object System (Cos) is a small C library which implements high-level
concepts available in Clos, Objc and other object-oriented programming
languages: uniform object model (class, meta-class and property-metaclass),
generic functions, multi-methods, delegation, properties, exceptions, contracts
and closures. Cos relies on the programmable capabilities of the C programming
language to extend its syntax and to implement the aforementioned concepts as
first-class objects. Cos aims at satisfying several general principles like
simplicity, extensibility, reusability, efficiency and portability which are
rarely met in a single programming language. Its design is tuned to provide
efficient and portable implementation of message multi-dispatch and message
multi-forwarding which are the heart of code extensibility and reusability.
With COS features in hand, software should become as flexible and extensible as
with scripting languages and as efficient and portable as expected with C
programming. Likewise, Cos concepts should significantly simplify adaptive and
aspect-oriented programming as well as distributed and service-oriented
computingComment: 18
- …