2,914 research outputs found
Finding Morton-Like Layouts for Multi-Dimensional Arrays Using Evolutionary Algorithms
The layout of multi-dimensional data can have a significant impact on the
efficacy of hardware caches and, by extension, the performance of applications.
Common multi-dimensional layouts include the canonical row-major and
column-major layouts as well as the Morton curve layout. In this paper, we
describe how the Morton layout can be generalized to a very large family of
multi-dimensional data layouts with widely varying performance characteristics.
We posit that this design space can be efficiently explored using a
combinatorial evolutionary methodology based on genetic algorithms. To this
end, we propose a chromosomal representation for such layouts as well as a
methodology for estimating the fitness of array layouts using cache simulation.
We show that our fitness function correlates to kernel running time in real
hardware, and that our evolutionary strategy allows us to find candidates with
favorable simulated cache properties in four out of the eight real-world
applications under consideration in a small number of generations. Finally, we
demonstrate that the array layouts found using our evolutionary method perform
well not only in simulated environments but that they can effect significant
performance gains -- up to a factor ten in extreme cases -- in real hardware
Task-based adaptive multiresolution for time-space multi-scale reaction-diffusion systems on multi-core architectures
A new solver featuring time-space adaptation and error control has been
recently introduced to tackle the numerical solution of stiff
reaction-diffusion systems. Based on operator splitting, finite volume adaptive
multiresolution and high order time integrators with specific stability
properties for each operator, this strategy yields high computational
efficiency for large multidimensional computations on standard architectures
such as powerful workstations. However, the data structure of the original
implementation, based on trees of pointers, provides limited opportunities for
efficiency enhancements, while posing serious challenges in terms of parallel
programming and load balancing. The present contribution proposes a new
implementation of the whole set of numerical methods including Radau5 and
ROCK4, relying on a fully different data structure together with the use of a
specific library, TBB, for shared-memory, task-based parallelism with
work-stealing. The performance of our implementation is assessed in a series of
test-cases of increasing difficulty in two and three dimensions on multi-core
and many-core architectures, demonstrating high scalability
Custom Integrated Circuits
Contains reports on six research projects.U.S. Air Force - Office of Scientific Research (Contract F49620-84-C-0004)Analog Devices, Inc.Defense Advanced Research Projects Agency (Contract N00014-80-C-0622)National Science Foundation (Grant ECS83-10941
A Structured Design Methodology for High Performance VLSI Arrays
abstract: The geometric growth in the integrated circuit technology due to transistor scaling also with system-on-chip design strategy, the complexity of the integrated circuit has increased manifold. Short time to market with high reliability and performance is one of the most competitive challenges. Both custom and ASIC design methodologies have evolved over the time to cope with this but the high manual labor in custom and statistic design in ASIC are still causes of concern. This work proposes a new circuit design strategy that focuses mostly on arrayed structures like TLB, RF, Cache, IPCAM etc. that reduces the manual effort to a great extent and also makes the design regular, repetitive still achieving high performance. The method proposes making the complete design custom schematic but using the standard cells. This requires adding some custom cells to the already exhaustive library to optimize the design for performance. Once schematic is finalized, the designer places these standard cells in a spreadsheet, placing closely the cells in the critical paths. A Perl script then generates Cadence Encounter compatible placement file. The design is then routed in Encounter. Since designer is the best judge of the circuit architecture, placement by the designer will allow achieve most optimal design. Several designs like IPCAM, issue logic, TLB, RF and Cache designs were carried out and the performance were compared against the fully custom and ASIC flow. The TLB, RF and Cache were the part of the HEMES microprocessor.Dissertation/ThesisPh.D. Electrical Engineering 201
Custom Integrated Circuits
Contains reports on nine research projects.Analog Devices, Inc.International Business Machines, Inc.Joint Services Electronics Program (Contract DAALO03-86-K-0002)U.S. Air Force - Office of Scientific Research (Grant AFOSR 86-0164)Rockwell International CorporationOKI SemiconductorU.S. Navy - Office of Naval Research (Contract N00014-81-K-0742)Charles Stark Draper LaboratoryDARPA/U.S. Navy - Office of Naval Research (Contract N00014-80-C-0622)DARPA/U.S. Navy - Office of Naval Research (Contract N00014-87-K-0825)National Science Foundation (Grant ECS-83-10941)AT&T Bell Laboratorie
- …