6,496 research outputs found
Discontinuities in recurrent neural networks
This paper studies the computational power of various discontinuous
real computational models that are based on the classical analog
recurrent neural network (ARNN). This ARNN consists of finite number
of neurons; each neuron computes a polynomial net-function and a
sigmoid-like continuous activation-function.
The authors introducePostprint (published version
An O(N squared) method for computing the eigensystem of N by N symmetric tridiagonal matrices by the divide and conquer approach
An efficient method is proposed to solve the eigenproblem of N by N Symmetric Tridiagonal (ST) matrices. Unlike the standard eigensolvers which necessitate O(N cubed) operations to compute the eigenvectors of such ST matrices, the proposed method computes both the eigenvalues and eigenvectors with only O(N squared) operations. The method is based on serial implementation of the recently introduced Divide and Conquer (DC) algorithm. It exploits the fact that by O(N squared) of DC operations, one can compute the eigenvalues of N by N ST matrix and a finite number of pairs of successive rows of its eigenvector matrix. The rest of the eigenvectors--all of them or one at a time--are computed by linear three-term recurrence relations. Numerical examples are presented which demonstrate the superiority of the proposed method by saving an order of magnitude in execution time at the expense of sacrificing a few orders of accuracy
Object-oriented domain specific compilers for programming FPGAs
Published versio
Customisable arithmetic hardware designs
Imperial Users onl
Periodic orbits of the ensemble of Sinai-Arnold cat maps and pseudorandom number generation
We propose methods for constructing high-quality pseudorandom number
generators (RNGs) based on an ensemble of hyperbolic automorphisms of the unit
two-dimensional torus (Sinai-Arnold map or cat map) while keeping a part of the
information hidden. The single cat map provides the random properties expected
from a good RNG and is hence an appropriate building block for an RNG, although
unnecessary correlations are always present in practice. We show that
introducing hidden variables and introducing rotation in the RNG output,
accompanied with the proper initialization, dramatically suppress these
correlations. We analyze the mechanisms of the single-cat-map correlations
analytically and show how to diminish them. We generalize the Percival-Vivaldi
theory in the case of the ensemble of maps, find the period of the proposed RNG
analytically, and also analyze its properties. We present efficient practical
realizations for the RNGs and check our predictions numerically. We also test
our RNGs using the known stringent batteries of statistical tests and find that
the statistical properties of our best generators are not worse than those of
other best modern generators.Comment: 18 pages, 3 figures, 9 table
Parametrizable Architecture for Function Recursive Evaluation
Paper submitted to the XVIII Conference on Design of Circuits and Integrated Systems (DCIS), Ciudad Real, España, 2003.This paper presents a function evaluation method developed under the scope of recursive expression of function convolution. This approach is based on a unique parametrizable formula capable of providing function points by successive iteration. When tackling design level, it also shows suitable for developing architectural schemes capable of dealing with different speed and precision issues. An architecture for reconfigurable FPGA based in serial distributed arithmetic implements the design for fast prototyping. The case of combined trigonometric functions involved in rotation is analyzed under this scope. Compared with others methods, our proposal offers a good balance between speed and precision
Processor evaluation for low power frequency converter product family
TÀssÀ työssÀ tutkitaan markkinoilla olevia tai lÀhitulevaisuudessa markkinoille saapuvia prosessoreja kÀytettÀvÀksi pienitehoisissa taajuusmuuttajissa. Tutkimuksen tarkoitus on selvittÀÀ prosessorin sopivuutta sovellukseen, jossa hinta on merkittÀvÀ tekijÀ. Tutkimuksessa esitettyjen vaatimusten perusteella houkuttelevimmat prosessorit otetaan tarkempaan tutkimukseen. Tarkemman selvityksen jÀlkeen vaatimuksia teknisesti mahdollisimman tarkasti vastaavat prosessorit pyydettiin valmistajalta testattavaksi.
Testaaminen suoritettiin lopulta viidelle eri prosessorille, joista kaksi perustui samaan ytimeen. Testaamisen tavoitteena on selvittÀÀ prosessorin sopivuus kÀyttökohteeseensa. Sopivuus testattiin suorittamalla prosessoreissa taajuusmuuttajakÀyttöÀ mallintavaa testikoodia. Tuloksina testikoodin ajamisesta saatiin tietyissÀ aliohjelmissa kulutettu aika sekÀ kulutetut kellosyklit. Suorituskyvyn lisÀksi testaukseen kuului prosessorikohtaisen kÀÀntÀjÀn aikaansaaman koodin koko. Aliohjelmat sisÀlsivÀt sekÀ aritmeettisia, ettÀ loogisia operaatioita, joiden kombinaationa mahdollisimman hyvÀ sopivuus saatiin selvitettyÀ.The aim of this thesis is to study processors to be used in a low power frequency converter. Processors under investigation must be currently or in the near future in the market. The purpose is to examine suitability of a processor to an application in which price is an essential factor. The requirements presented in this study will determine which processor will be reviewed more closely. After a precise review, processor vendors was asked to provide as corresponding device as possible to a test.
Testing was accomplished eventually with five different processors of which two were based on a same core. The aim of the testing was to investigate suitability of the processors to their target task. Suitability was tested by executing code that models frequency converter application. As a result, spent time and clock cycles are presented in certain functions. In addition to performance, the testing included evaluation of the size of the output code the compilers created. Functions under test consisted of a combination of arithmetic and logic operations that was used to interpret the suitability of the processor
Implementation and Evaluation of Algorithmic Skeletons: Parallelisation of Computer Algebra Algorithms
This thesis presents design and implementation approaches for the parallel algorithms of computer algebra. We use algorithmic skeletons and also further approaches, like data parallel arithmetic and actors. We have implemented skeletons for divide and conquer algorithms and some special parallel loops, that we call ârepeated computation with a possibility of premature terminationâ. We introduce in this thesis a rational data parallel arithmetic. We focus on parallel symbolic computation algorithms, for these algorithms our arithmetic provides a generic parallelisation approach.
The implementation is carried out in Eden, a parallel functional programming language based on Haskell. This choice enables us to encode both the skeletons and the programs in the same language. Moreover, it allows us to refrain from using two different languagesâone for the implementation and one for the interfaceâfor our implementation of computer algebra algorithms.
Further, this thesis presents methods for evaluation and estimation of parallel execution times. We partition the parallel execution time into two components. One of them accounts for the quality of the parallelisation, we call it the âparallel penaltyâ. The other is the sequential execution time. For the estimation, we predict both components separately, using statistical methods. This enables very confident estimations, although using drastically less measurement points than other methods. We have applied both our evaluation and estimation approaches to the parallel programs presented in this thesis. We haven also used existing estimation methods.
We developed divide and conquer skeletons for the implementation of fast parallel multiplication. We have implemented the Karatsuba algorithm, Strassenâs matrix multiplication algorithm and the fast Fourier transform. The latter was used to implement polynomial convolution that leads to a further fast multiplication algorithm. Specially for our implementation of Strassen algorithm we have designed and implemented a divide and conquer skeleton basing on actors. We have implemented the parallel fast Fourier transform, and not only did we use new divide and conquer skeletons, but also developed a map-and-transpose skeleton. It enables good parallelisation of the Fourier transform. The parallelisation of Karatsuba multiplication shows a very good performance. We have analysed the parallel penalty of our programs and compared it to the serial fractionâan approach, known from literature. We also performed execution time estimations of our divide and conquer programs.
This thesis presents a parallel map+reduce skeleton scheme. It allows us to combine the usual parallel map skeletons, like parMap, farm, workpool, with a premature termination property. We use this to implement the so-called âparallel repeated computationâ, a special form of a speculative parallel loop. We have implemented two probabilistic primality tests: the RabinâMiller test and the Jacobi sum test. We parallelised both with our approach. We analysed the task distribution and stated the fitting configurations of the Jacobi sum test. We have shown formally that the Jacobi sum test can be implemented in parallel. Subsequently, we parallelised it, analysed the load balancing issues, and produced an optimisation. The latter enabled a good implementation, as verified using the parallel penalty. We have also estimated the performance of the tests for further input sizes and numbers of processing elements. Parallelisation of the Jacobi sum test and our generic parallelisation scheme for the repeated computation is our original contribution.
The data parallel arithmetic was defined not only for integers, which is already known, but also for rationals. We handled the common factors of the numerator or denominator of the fraction with the modulus in a novel manner. This is required to obtain a true multiple-residue arithmetic, a novel result of our research. Using these mathematical advances, we have parallelised the determinant computation using the GauĂ elimination. As always, we have performed task distribution analysis and estimation of the parallel execution time of our implementation. A similar computation in Maple emphasised the potential of our approach. Data parallel arithmetic enables parallelisation of entire classes of computer algebra algorithms.
Summarising, this thesis presents and thoroughly evaluates new and existing design decisions for high-level parallelisations of computer algebra algorithms
- âŠ