425 research outputs found
Evaluating local indirect addressing in SIMD proc essors
In the design of parallel computers, there exists a tradeoff between the number and power of individual processors. The single instruction stream, multiple data stream (SIMD) model of parallel computers lies at one extreme of the resulting spectrum. The available hardware resources are devoted to creating the largest possible number of processors, and consequently each individual processor must use the fewest possible resources. Disagreement exists as to whether SIMD processors should be able to generate addresses individually into their local data memory, or all processors should access the same address. The tradeoff is examined between the increased capability and the reduced number of processors that occurs in this single instruction stream, multiple, locally addressed, data (SIMLAD) model. Factors are assembled that affect this design choice, and the SIMLAD model is compared with the bare SIMD and the MIMD models
Memory Access Patterns for Cellular Automata Using GPGPUs
Today\u27s graphical processing units have hundreds of individual processing cores that can be used for general purpose computation of mathematical and scientific problems. Due to their hardware architecture, these devices are especially effective when solving problems that exhibit a high degree of spatial locality. Cellular automata use small, local neighborhoods to determine successive states of individual elements and therefore, provide an excellent opportunity for the application of general purpose GPU computing. However, the GPU presents a challenging environment because it lacks many of the features of traditional CPUs, such as automatic, on-chip caching of data. To fully realize the potential of a GPU, specialized memory techniques and patterns must be employed to account for their unique architecture. Several techniques are presented which not only dramatically improve performance, but, in many cases, also simplify implementation. Many of the approaches discussed relate to the organization of data in memory or patterns for accessing that data, while others detail methods of increasing the computation to memory access ratio. The ideas presented are generic, and applicable to cellular automata models as a whole. Example implementations are given for several problems, including the Game of Life and Gaussian blurring, while performance characteristics, such as instruction and memory accesses counts, are analyzed and compared. A case study is detailed, showing the effectiveness of the various techniques when applied to a larger, real-world problem. Lastly, the reasoning behind each of the improvements is explained, providing general guidelines for determining when a given technique will be most and least effective
Structural Design using Cellular Automata
Traditional parallel methods for structural design do not scale well. This paper discusses the application of massively scalable cellular automata (CA) techniques to structural design. There are two sets of CA rules, one used to propagate stresses and strains, and one to perform design analysis. These rules can be applied serially,periodically,or concurrently, and Jacobi or Gauss-
Seidel style updating can be done. These options are compared with respect to convergence,speed, and stability
Designing a Novel Reversible Systolic Array Using QCA
Many efforts have been done about designing nano-based devices till today. One of these devices is Quantum Cellular Automata (QCA). Because of astonishing growth in VLSI circuits Designs in larger scales and necessity of feature size reduction, there is more need to design complicated control systems using nano-based devices. Besides, since there is a critical manner of temperature in QCA devices, complicated systems using these devices should be designed reversibly. This article has been proposed a novel architecture for QCA circuits in order to utilizing in complicated control systems based on systolic arrays with high throughput and least power dissipation
A model for Intelligent Random Access Memory architecture (IRAM): cellular automata algorithms on the Associative String Processing machine (ASTRA)
In the near future, the computer performance will be completely determined by how long it takes to access memory. There are bottle-necks in memory latency and memory-to processor interface bandwidth. The IRAM initiative could be the answer by putting Processor-In-Memory (PIM). Starting from the massively parallel processing concept, one reached a similar conclusion. The MPPC (Massively Parallel Processing Collaboration) project and the 8K processor ASTRA machine (Associative String Test bench for Research \& Applications) developed at CERN \cite{kuala} can be regarded as a forerunner of the IRAM concept. The computing power of the ASTRA machine, regarded as an IRAM with 64 one-bit processors on a 6464 bit-matrix memory chip machine, has been demonstrated by running statistical physics algorithms: one-dimensional stochastic cellular automata, as a simple model for dynamical phase transitions. As a relevant result for physics, the damage spreading of this model has been investigated
Fault tolerance issues in nanoelectronics
The astonishing success story of microelectronics cannot go on indefinitely. In fact, once
devices reach the few-atom scale (nanoelectronics), transient quantum effects are expected
to impair their behaviour. Fault tolerant techniques will then be required. The aim of this
thesis is to investigate the problem of transient errors in nanoelectronic devices. Transient
error rates for a selection of nanoelectronic gates, based upon quantum cellular automata
and single electron devices, in which the electrostatic interaction between electrons is used
to create Boolean circuits, are estimated. On the bases of such results, various fault tolerant
solutions are proposed, for both logic and memory nanochips. As for logic chips, traditional
techniques are found to be unsuitable. A new technique, in which the voting approach of
triple modular redundancy (TMR) is extended by cascading TMR units composed of
nanogate clusters, is proposed and generalised to other voting approaches. For memory
chips, an error correcting code approach is found to be suitable. Various codes are
considered and a lookup table approach is proposed for encoding and decoding. We are
then able to give estimations for the redundancy level to be provided on nanochips, so as to
make their mean time between failures acceptable. It is found that, for logic chips, space
redundancies up to a few tens are required, if mean times between failures have to be of the
order of a few years. Space redundancy can also be traded for time redundancy. As for
memory chips, mean times between failures of the order of a few years are found to imply
both space and time redundancies of the order of ten
A Search for Good Pseudo-random Number Generators : Survey and Empirical Studies
In today's world, several applications demand numbers which appear random but
are generated by a background algorithm; that is, pseudo-random numbers. Since
late century, researchers have been working on pseudo-random number
generators (PRNGs). Several PRNGs continue to develop, each one demanding to be
better than the previous ones. In this scenario, this paper targets to verify
the claim of so-called good generators and rank the existing generators based
on strong empirical tests in same platforms. To do this, the genre of PRNGs
developed so far has been explored and classified into three groups -- linear
congruential generator based, linear feedback shift register based and cellular
automata based. From each group, well-known generators have been chosen for
empirical testing. Two types of empirical testing has been done on each PRNG --
blind statistical tests with Diehard battery of tests, TestU01 library and NIST
statistical test-suite and graphical tests (lattice test and space-time diagram
test). Finally, the selected PRNGs are divided into groups and are
ranked according to their overall performance in all empirical tests
- …