205 research outputs found

    Reconfigurable architectures for beyond 3G wireless communication systems

    Get PDF

    Efficient Algorithms for Large-Scale Image Analysis

    Get PDF
    This work develops highly efficient algorithms for analyzing large images. Applications include object-based change detection and screening. The algorithms are 10-100 times as fast as existing software, sometimes even outperforming FGPA/GPU hardware, because they are designed to suit the computer architecture. This thesis describes the implementation details and the underlying algorithm engineering methodology, so that both may also be applied to other applications

    Algorithms and architectures for MCMC acceleration in FPGAs

    Get PDF
    Markov Chain Monte Carlo (MCMC) is a family of stochastic algorithms which are used to draw random samples from arbitrary probability distributions. This task is necessary to solve a variety of problems in Bayesian modelling, e.g. prediction and model comparison, making MCMC a fundamental tool in modern statistics. Nevertheless, due to the increasing complexity of Bayesian models, the explosion in the amount of data they need to handle and the computational intensity of many MCMC algorithms, performing MCMC-based inference is often impractical in real applications. This thesis tackles this computational problem by proposing Field Programmable Gate Array (FPGA) architectures for accelerating MCMC and by designing novel MCMC algorithms and optimization methodologies which are tailored for FPGA implementation. The contributions of this work include: 1) An FPGA architecture for the Population-based MCMC algorithm, along with two modified versions of the algorithm which use custom arithmetic precision in large parts of the implementation without introducing error in the output. Mapping the two modified versions to an FPGA allows for more parallel modules to be instantiated in the same chip area. 2) An FPGA architecture for the Particle MCMC algorithm, along with a novel algorithm which combines Particle MCMC and Population-based MCMC to tackle multi-modal distributions. A proposed FPGA architecture for the new algorithm achieves higher datapath utilization than the Particle MCMC architecture. 3) A generic method to optimize the arithmetic precision of any MCMC algorithm that is implemented on FPGAs. The method selects the minimum precision among a given set of precisions, while guaranteeing a user-defined bound on the output error. By applying the above techniques to large-scale Bayesian problems, it is shown that significant speedups (one or two orders of magnitude) are possible compared to state-of-the-art MCMC algorithms implemented on CPUs and GPUs, opening the way for handling complex statistical analyses in the era of ubiquitous, ever-increasing data.Open Acces

    Domain specific high performance reconfigurable architecture for a communication platform

    Get PDF

    Custom optimization algorithms for efficient hardware implementation

    No full text
    The focus is on real-time optimal decision making with application in advanced control systems. These computationally intensive schemes, which involve the repeated solution of (convex) optimization problems within a sampling interval, require more efficient computational methods than currently available for extending their application to highly dynamical systems and setups with resource-constrained embedded computing platforms. A range of techniques are proposed to exploit synergies between digital hardware, numerical analysis and algorithm design. These techniques build on top of parameterisable hardware code generation tools that generate VHDL code describing custom computing architectures for interior-point methods and a range of first-order constrained optimization methods. Since memory limitations are often important in embedded implementations we develop a custom storage scheme for KKT matrices arising in interior-point methods for control, which reduces memory requirements significantly and prevents I/O bandwidth limitations from affecting the performance in our implementations. To take advantage of the trend towards parallel computing architectures and to exploit the special characteristics of our custom architectures we propose several high-level parallel optimal control schemes that can reduce computation time. A novel optimization formulation was devised for reducing the computational effort in solving certain problems independent of the computing platform used. In order to be able to solve optimization problems in fixed-point arithmetic, which is significantly more resource-efficient than floating-point, tailored linear algebra algorithms were developed for solving the linear systems that form the computational bottleneck in many optimization methods. These methods come with guarantees for reliable operation. We also provide finite-precision error analysis for fixed-point implementations of first-order methods that can be used to minimize the use of resources while meeting accuracy specifications. The suggested techniques are demonstrated on several practical examples, including a hardware-in-the-loop setup for optimization-based control of a large airliner.Open Acces

    An Architecture for Coexistence with Multiple Users in Frequency Hopping Cognitive Radio Networks

    Get PDF
    The radio frequency (RF) spectrum is a limited resource. Spectrum allotment disputes stem from this scarcity as many radio devices are con confined to a fixed frequency or frequency sequence. One alternative is to incorporate cognition within a configurable radio platform, therefore enabling the radio to adapt to dynamic RF spectrum environments. In this way, the radio is able to actively observe the RF spectrum, orient itself to the current RF environment, decide on a mode of operation, and act accordingly, thereby sharing the spectrum and operating in more flexible manner. This research presents a novel framework for incorporating several techniques for the purpose of adapting radio operation to the current RF spectrum environment. Specifically, this research makes six contributions to the field of cognitive radio: (1) the framework for a new hybrid hardware/software middleware architecture, (2) a framework for testing and evaluating clustering algorithms in the context of cognitive radio networks, (3) a new RF spectrum map representation technique, (4) a new RF spectrum map merging technique, (5) a new method for generating a random key-based adaptive frequency-hopping waveform, and (6) initial integration testing toward implementing the proposed system on a field-programmable gate array (FPGA)

    Particle Swarm Optimization

    Get PDF
    Particle swarm optimization (PSO) is a population based stochastic optimization technique influenced by the social behavior of bird flocking or fish schooling.PSO shares many similarities with evolutionary computation techniques such as Genetic Algorithms (GA). The system is initialized with a population of random solutions and searches for optima by updating generations. However, unlike GA, PSO has no evolution operators such as crossover and mutation. In PSO, the potential solutions, called particles, fly through the problem space by following the current optimum particles. This book represents the contributions of the top researchers in this field and will serve as a valuable tool for professionals in this interdisciplinary field

    Dynamical Systems in Spiking Neuromorphic Hardware

    Get PDF
    Dynamical systems are universal computers. They can perceive stimuli, remember, learn from feedback, plan sequences of actions, and coordinate complex behavioural responses. The Neural Engineering Framework (NEF) provides a general recipe to formulate models of such systems as coupled sets of nonlinear differential equations and compile them onto recurrently connected spiking neural networks – akin to a programming language for spiking models of computation. The Nengo software ecosystem supports the NEF and compiles such models onto neuromorphic hardware. In this thesis, we analyze the theory driving the success of the NEF, and expose several core principles underpinning its correctness, scalability, completeness, robustness, and extensibility. We also derive novel theoretical extensions to the framework that enable it to far more effectively leverage a wide variety of dynamics in digital hardware, and to exploit the device-level physics in analog hardware. At the same time, we propose a novel set of spiking algorithms that recruit an optimal nonlinear encoding of time, which we call the Delay Network (DN). Backpropagation across stacked layers of DNs dramatically outperforms stacked Long Short-Term Memory (LSTM) networks—a state-of-the-art deep recurrent architecture—in accuracy and training time, on a continuous-time memory task, and a chaotic time-series prediction benchmark. The basic component of this network is shown to function on state-of-the-art spiking neuromorphic hardware including Braindrop and Loihi. This implementation approaches the energy-efficiency of the human brain in the former case, and the precision of conventional computation in the latter case

    Ant colony optimization on runtime reconfigurable architectures

    Get PDF

    On the Analysis of Public-Key Cryptologic Algorithms

    Get PDF
    The RSA cryptosystem introduced in 1977 by Ron Rivest, Adi Shamir and Len Adleman is the most commonly deployed public-key cryptosystem. Elliptic curve cryptography (ECC) introduced in the mid 80's by Neal Koblitz and Victor Miller is becoming an increasingly popular alternative to RSA offering competitive performance due the use of smaller key sizes. Most recently hyperelliptic curve cryptography (HECC) has been demonstrated to have comparable and in some cases better performance than ECC. The security of RSA relies on the integer factorization problem whereas the security of (H)ECC is based on the (hyper)elliptic curve discrete logarithm problem ((H)ECDLP). In this thesis the practical performance of the best methods to solve these problems is analyzed and a method to generate secure ephemeral ECC parameters is presented. The best publicly known algorithm to solve the integer factorization problem is the number field sieve (NFS). Its most time consuming step is the relation collection step. We investigate the use of graphics processing units (GPUs) as accelerators for this step. In this context, methods to efficiently implement modular arithmetic and several factoring algorithms on GPUs are presented and their performance is analyzed in practice. In conclusion, it is shown that integrating state-of-the-art NFS software packages with our GPU software can lead to a speed-up of 50%. In the case of elliptic and hyperelliptic curves for cryptographic use, the best published method to solve the (H)ECDLP is the Pollard rho algorithm. This method can be made faster using classes of equivalence induced by curve automorphisms like the negation map. We present a practical analysis of their use to speed up Pollard rho for elliptic curves and genus 2 hyperelliptic curves defined over prime fields. As a case study, 4 curves at the 128-bit theoretical security level are analyzed in our software framework for Pollard rho to estimate their practical security level. In addition, we present a novel many-core architecture to solve the ECDLP using the Pollard rho algorithm with the negation map on FPGAs. This architecture is used to estimate the cost of solving the Certicom ECCp-131 challenge with a cluster of FPGAs. Our design achieves a speed-up factor of about 4 compared to the state-of-the-art. Finally, we present an efficient method to generate unique, secure and unpredictable ephemeral ECC parameters to be shared by a pair of authenticated users for a single communication. It provides an alternative to the customary use of fixed ECC parameters obtained from publicly available standards designed by untrusted third parties. The effectiveness of our method is demonstrated with a portable implementation for regular PCs and Android smartphones. On a Samsung Galaxy S4 smartphone our implementation generates unique 128-bit secure ECC parameters in 50 milliseconds on average
    • 

    corecore