131 research outputs found
Massively parallel split-step Fourier techniques for simulating quantum systems on graphics processing units
The split-step Fourier method is a powerful technique for solving partial differential equations and simulating ultracold atomic systems of various forms. In this body of work, we focus on several variations of this method to allow for simulations of one, two, and three-dimensional quantum systems, along with several notable methods for controlling these systems. In particular, we use quantum optimal control and shortcuts to adiabaticity to study the non-adiabatic generation of superposition states in strongly correlated one-dimensional systems, analyze chaotic vortex trajectories in two dimensions by using rotation and phase imprinting methods, and create stable, threedimensional vortex structures in Bose–Einstein condensates through artificial magnetic fields generated by the evanescent field of an optical nanofiber. We also discuss algorithmic optimizations for implementing the split-step Fourier method on graphics processing units. All computational methods present in this work are demonstrated on physical systems and have been incorporated into a state-of-the-art and open-source software suite known as GPUE, which is currently the fastest quantum simulator of its kind.Okinawa Institute of Science and Technology Graduate Universit
A Comprehensive Survey on Particle Swarm Optimization Algorithm and Its Applications
Particle swarm optimization (PSO) is a heuristic global optimization method, proposed originally by Kennedy and Eberhart in 1995. It is now one of the most commonly used optimization techniques. This survey presented a comprehensive investigation of PSO. On one hand, we provided advances with PSO, including its modifications (including quantum-behaved PSO, bare-bones PSO, chaotic PSO, and fuzzy PSO), population topology (as fully connected, von Neumann, ring, star, random, etc.), hybridization (with genetic algorithm, simulated annealing, Tabu search, artificial immune system, ant colony algorithm, artificial bee colony, differential evolution, harmonic search, and biogeography-based optimization), extensions (to multiobjective, constrained, discrete, and binary optimization), theoretical analysis (parameter selection and tuning, and convergence analysis), and parallel implementation (in multicore, multiprocessor, GPU, and cloud computing forms). On the other hand, we offered a survey on applications of PSO to the following eight fields: electrical and electronic engineering, automation control systems, communication theory, operations research, mechanical engineering, fuel and energy, medicine, chemistry, and biology. It is hoped that this survey would be beneficial for the researchers studying PSO algorithms
Affordable kilo-instruction processors
Diversos motius expliquen l'estancament en el que es troba el desenvolupament del processador tradicional dissenyat per maximitzar el rendiment d'un únic fil d'execució. Per una banda, técniques agressives com la supersegmentacó del camà de dades o l'execució fora d'ordre tenen un impacte molt negatiu sobre el consum de potència i la complexitat del disseny. Altrament, l'increment en la freqüència del processador augmenta la discrepà ncia entre la velocitat del processador i el temps d'accés a memòria principal. Tot i que les memòries cau redueixen considerablement el nombre d'accessos a memòria principal, aquests accessos introdueixen latencies prou grans per reduir considerablement el rendiment. Tècniques convencionals com l'execució fora d'ordre, útils per ocultar accessos a les memòries cau de 2on nivell, no estan pensades per ocultar latències tan grans. Caldrien cues amb mides de centenars d'instruccions i milers de registres per tal de no interrompre l'execució en el moment de produir-se un accés a memòria principal. Desafortunadament, la tecnologia disponible no és eficient per implementar aquestes estructures monolÃticament, doncs resultaria un temps d'accés molt elevat, un consum de potència igualment elevat i un à rea no menyspreable. En aquesta tesi s'han estudiat tècniques que permeten l'implementació d'un processador amb capacitat per continuar processant instruccions en el cas de que es produeixin accessos a memòria principal. Les condicions per a que aquest processador sigui implementable són que estigui basat en estructures de mida convencional i que tingui una unitat de control senzilla. El repte es troba en conciliar un model de processador distribuït amb un control senzill. El problema del disseny del processador s'ha enfocat observant el comportament d'un processador de recursos infinits. S'ha observat que l'execució segueix uns patrons molt interessants, basats en la localitat d'execució. En aplicacions numèriques s'observa que més del 70% de les instruccions no depenen de accessos a memòria principal. Aixó és molt important doncs mostra que sempre hi ha una porció important d'instruccions executables poc després de la decodificació. Aixó permet proposar un nou tipus de processador amb dues unitats d'execució. La primera unitat (el "Cache Processor") processa a alta velocitat instruccions independents de memòria principal. La segona unitat ("Memory Processor") processa les instruccions dependents de accessos a memòria principal, pero de forma molt més relaxada, cosa que li permet mantenir milers de instruccions en vol. Aquesta proposta rep el nom de Decoupled KILO-Instruction Processor (D-KIP) i té forces avantatges: per un costat permet la construcció d'un kilo-instruction processor basat en estructures convencionals i per l'altre simplifica el disseny ja que minimitza les interaccions entre ambdos unitats d'execució.En aquesta tesi es proposen dos implementacions de processadors desacoblats: el D-KIP original, i el Flexible Heterogeneous MultiCore (FMC). Sobre aquestes propostes s'analitza el rendiment i es compara amb altres tècniques que incrementan el parallelisme de memoria, com el prefetching o l'execució "runahead". D'aquesta avaluació es desprén que el processador FMC té un rendiment similar al de un processador convencional amb una finestra de 1500 instruccions en vol. Posteriorment s'analitza l'integració del FMC en entorns multicore/multiprogrammats. La tesi es completa amb la proposta d'una cua de loads i stores (LSQ) per a aquest tipus de processador.Several motives explain the slowdown of high-performance single-thread processor development. On the one hand, aggressive techniques such as superpipelining or out-of-order execution have a considerable impact on power consumption and design complexity. On the other hand, the increment in processor frequencies has led to a large disparity between processor speed and memory access time. Although cache memories considerably reduce the number of accesses to main memory, the remaining accesses introduce latencies large enough to considerably decrease performance. Conventional techniques such as out-of-order execution, while effective in hiding L2 cache accesses, cannot hide latencies this large. Queues of hundreds of entries and thousands of registers would be necessary in order to prevent execution from stalling in the event of a L2 cache miss. Unfortunately, current technology cannot efficiently implement such structures monolithically, as access latencies would considerably increase, as would power consumption and area consumption.In this thesis we studied techniques that allow the processor to continue processing instructions in the event of main memory accesses. The conditions for such a processor to be implementable are that it should be based on structures of conventional size and that it should feature simple control logic. The challenge lies in being able to design a distributed processor with simple control. The design of this processor has been approached by analyzing the behavior of a processor with infinite resources. We have observed that execution follows a very interesting pattern based on execution locality. In numerical codes we observed that over 70% of all instructions do not depend on memory accesses. This is interesting since it shows that there is always a large portion of instructions that can be executed shortly after decode. This allows us to propose a new kind of processor with two execution units. The first unit, the Cache Processor, processes memory-independent instructions at high speed. The second unit, the Memory Processor, processes instructions that depend on main memory accesses, but using relaxed scheduling logic, which allows it to scale to thousands of in-flight instructions. This proposal, which receives the name of Decoupled KILO-Instruction Processor (D-KIP), has several advantages. On the one hand it allows the construction of a kilo-instruction processor based on conventional structures and, on the other hand, it simplifies the design as the interaction between both execution units is minimal. In this thesis two implementations for this kind of processor are presented: the original D-KIP and the Flexible Heterogeneous MultiCore (FMC). The performance of these proposals is analyzed and compared to other proposals that increase memory-level parallelism, such as prefetching or runahead execution. It is observed that the FMC processor performs at the same level of a conventional processor with a window of around 1500 instructions. Further, the integration of the FMC processor into a multicore/multiprogrammed environment is studied. This thesis concludes with the proposal of a two-level Load/Store Queue for this kind of processor
Ultrasound Imaging
In this book, we present a dozen state of the art developments for ultrasound imaging, for example, hardware implementation, transducer, beamforming, signal processing, measurement of elasticity and diagnosis. The editors would like to thank all the chapter authors, who focused on the publication of this book
Low-Power Human-Machine Interfaces: Analysis And Design
Human-Machine Interaction (HMI) systems, once used for clinical applications, have recently reached a broader set of scenarios, such as industrial, gaming, learning, and health tracking thanks to advancements in Digital Signal Processing (DSP) and Machine Learning (ML) techniques. A growing trend is to integrate computational capabilities into wearable devices to reduce power consumption associated with wireless data transfer while providing a natural and unobtrusive way of interaction. However, current platforms can barely cope with the computational complexity introduced by the required feature extraction and classification algorithms without compromising the battery life and the overall intrusiveness of the system. Thus, highly-wearable and real-time HMIs are yet to be introduced.
Designing and implementing highly energy-efficient biosignal devices demands a fine-tuning to meet the constraints typically required in everyday scenarios. This thesis work tackles these challenges in specific case studies, devising solutions based on bioelectrical signals, namely EEG and EMG, for advanced hand gesture recognition.
The implementation of these systems followed a complete analysis to reduce the overall intrusiveness of the system through sensor design and miniaturization of the hardware implementation. Several solutions have been studied to cope with the computational complexity of the DSP algorithms, including commercial single-core and open-source Parallel Ultra Low Power architectures, that have been selected accordingly also to reduce the overall system power consumption. By further adding energy harvesting techniques combined with the firmware and hardware optimization, the systems achieved self-sustainable operation or a significant boost in battery life.
The HMI platforms presented are entirely programmable and provide computational power to satisfy the requirements of the studies applications while employing only a fraction of the CPU resources, giving the perspective of further application more advanced paradigms for the next generation of real-time embedded biosignal processing
Custom optimization algorithms for efficient hardware implementation
The focus is on real-time optimal decision making with application in advanced control
systems. These computationally intensive schemes, which involve the repeated solution of
(convex) optimization problems within a sampling interval, require more efficient computational
methods than currently available for extending their application to highly dynamical
systems and setups with resource-constrained embedded computing platforms.
A range of techniques are proposed to exploit synergies between digital hardware, numerical
analysis and algorithm design. These techniques build on top of parameterisable
hardware code generation tools that generate VHDL code describing custom computing
architectures for interior-point methods and a range of first-order constrained optimization
methods. Since memory limitations are often important in embedded implementations we
develop a custom storage scheme for KKT matrices arising in interior-point methods for
control, which reduces memory requirements significantly and prevents I/O bandwidth
limitations from affecting the performance in our implementations. To take advantage of
the trend towards parallel computing architectures and to exploit the special characteristics
of our custom architectures we propose several high-level parallel optimal control
schemes that can reduce computation time. A novel optimization formulation was devised
for reducing the computational effort in solving certain problems independent of the computing
platform used. In order to be able to solve optimization problems in fixed-point
arithmetic, which is significantly more resource-efficient than floating-point, tailored linear
algebra algorithms were developed for solving the linear systems that form the computational
bottleneck in many optimization methods. These methods come with guarantees
for reliable operation. We also provide finite-precision error analysis for fixed-point implementations
of first-order methods that can be used to minimize the use of resources while
meeting accuracy specifications. The suggested techniques are demonstrated on several
practical examples, including a hardware-in-the-loop setup for optimization-based control
of a large airliner.Open Acces
Acceleration Techniques for Sparse Recovery Based Plane-wave Decomposition of a Sound Field
Plane-wave decomposition by sparse recovery is a reliable and accurate technique for plane-wave decomposition which can be used for source localization, beamforming, etc. In this work, we introduce techniques to accelerate the plane-wave decomposition by sparse recovery. The method consists of two main algorithms which are spherical Fourier transformation (SFT) and sparse recovery. Comparing the two algorithms, the sparse recovery is the most computationally intensive. We implement the SFT on an FPGA and the sparse recovery on a multithreaded computing platform. Then the multithreaded computing platform could be fully utilized for the sparse recovery. On the other hand, implementing the SFT on an FPGA helps to flexibly integrate the microphones and improve the portability of the microphone array. For implementing the SFT on an FPGA, we develop a scalable FPGA design model that enables the quick design of the SFT architecture on FPGAs. The model considers the number of microphones, the number of SFT channels and the cost of the FPGA and provides the design of a resource optimized and cost-effective FPGA architecture as the output. Then we investigate the performance of the sparse recovery algorithm executed on various multithreaded computing platforms (i.e., chip-multiprocessor, multiprocessor, GPU, manycore). Finally, we investigate the influence of modifying the dictionary size on the computational performance and the accuracy of the sparse recovery algorithms. We introduce novel sparse-recovery techniques which use non-uniform dictionaries to improve the performance of the sparse recovery on a parallel architecture
Recommended from our members
Physical Layer Modeling and Optimization of Silicon Photonic Interconnection Networks
The progressive blooming of silicon photonics technology (SiP) has indicated that optical interconnects may substitute the electrical wires for data movement over short distances in the future. Silicon Photonics platform has been the subject of intensive research for more than a decade now and its prospects continue to emerge as it enjoys the maturity of CMOS manufacturing industry. SiP foundries all over the world and particularly in the US (AIM Photonics) have been developing reliable photonic design kits (PDKs) that include fundamental SiP building blocks such as wavelength selective modulators and tunable filters. Microring resonators (MRR) are hailed as the most compact devices that can perform both modulation and demodulation in a wavelength division multiplexed (WDM) transceiver design. Although the use of WDM can reduce the number of fibers carrying data, it also makes the design of transceivers challenging. It is probably acceptable to achieve compactness at the expense of somewhat higher transceiver cost and power consumption. Nevertheless, these two metrics should remain close to their roadmap values for Datacom applications. An increase of an order of magnitude is clearly not acceptable. For example costs relative to bandwidth for an optical link in a data center interconnect will have to decrease from the current 1/Gbps. Additionally, the transceiver itself must remain compact.
The optical properties of SiP devices are subject to various design considerations, operation conditions, and optimization procedures. In this thesis, the general goal is to develop mathematical models that can accurately describe the thermo-optical and electro-optical behavior of individual SiP devices and then use these models to perform optimization on the parameters of such devices to maximize the capabilities of photonic links or photonic switch fabrics for datacom applications.
In Chapter 1, Introduction, we first provide an overview of the current state of the optical transceivers for data centers and datacom applications. Four main categories for optical interfaces (Pluggable transceivers, On-board optics, Co-packaged optics, monolithic integration) are briefly discussed. The structure of a silicon photonic link is also briefly introduced. Then the direction is shifted towards optical switching technologies where various technologies such as free space MEMS, liquid crystal on silicon (LCOS), SOA-based switches, and silicon-based switches are explored.
In Chapter 2, Silicon Photonic Waveguides, we present an extensive study of the silicon-on-insulator (SOI) waveguides that are the basic building blocks of all of the SiP devices. The dispersion of Si and SiO2 is modeled with Sellmiere equation for the wavelength range 1500–1600 nm and then is used to calculate the TE and TM modes of a 2D slab waveguide. There are two reasons that 2D waveguides are studied: first, the modes of these waveguides have closed form solutions and the modes of 3D waveguides can be approximated from 2D waveguides based on the effective index method. Second, when the coupling of waveguides is studied and the concept of curvature function of coupling is developed, the coupled modes of 2D waveguides are used to show that this approach has some inherent small error due to the discretization of the nonuniform coupling. This chapter finishes by describing the coefficients of the sensitivity of optical modes of the waveguides to the geometrical and material parameters. Perturbation theory is briefly presented as a way to analytically examine the impact of small perturbations on the effective index of the modes.
In Chapter 3, Compact Modeling Approach, the concept of scattering matrix of a multi-port silicon photonic device is presented. The elements of the S-matrix are complex numbers that relate the amplitude and phase relationships of the optical models in the input and output ports. Based on the scattering matrix modeling of silicon photonics devices, two methods of solving photonic circuits are developed: the first one is based on the iteration for linear circuits. The second approach is based on the construction of an equivalent signal flow graph (SFG) for the circuit. We show that the SFG approach is very efficient for circuits involving microring resonator structures. Not only SFG can provide the solution for the transmission, it also provides the signal paths and the closed-form solution based on the Mason’s graph formula. We also show how the SFG method can be utilized to formulate the backscattering effects inside a ring resonator.
In Chapter 4, Scalability of Silicon Photonic Switch Fabrics, we develop the models for electro-optic Mach-Zehnder switch elements (2×2). For the electro-optic properties, the empirical Soref’s equations are used to characterize how the loss and index of silicon changes when the charge carrier density is changed. We then use our photonic circuit solver based on the iteration method to find accurate result of light propagation in large-scale switch topologies (e.g. 4×4, and 8×8). The concept of advanced path mapping based on physical layer evaluation of the switch fabric is introduced and used to develop the optimum routing tables for 4×4 and 8×8 Benes switch topologies.
In Chapter 5, Design space of Microring Resonators, we introduce the concept of curvature function of coupling to mathematically characterize the coupling coefficient of a ring resonator to a waveguide as a function of the geometrical parameters (ring radius, coupling gap, width and height of waveguides) and the wavelength. Extensive 2D and 3D FDTD simulations are carried out to validate our modeling approach. Experimental demonstrations are also used to not only further validate our modeling of coupling, but also to extract an empirical power-law model for the bending loss of the ring resonators as a function the radius. By combining these models, we for the first time present a full characterization of the design space of microring resonators. Moreover, the value of this discussion will be further apparent when the scalability of a silicon photonic link is studied. We will show that the FSR of the rings determines the optical bandwidth but it also impacts the properties of the ring resonators.
In Chapter 6, Thermo-optic Efficiency of Microheaters, we develop analytical models for the thermo-optic properties of SiP waveguides. For the thermo-optic properties, the concept of thermal impulse response is mathematically developed for integrated micro-heaters. The thermal impulse response is a key function that determines the tradeoff between heating efficiency and heating speed (thermal bandwidth), as well as allows us to predict the pulse-width-modulation (PWM) optical response of the heater-waveguide system. One of the motivations behind this study was to find the highest possible efficiency for thermal tuning of microring resonators to use it in the evaluation of the energy consumption of a photonic link. The results indicate 2 nm/mW which is in agreement with the trends that we see in the literature.
In Chapter 7, Crosstalk Penalty, we theoretically and experimentally investigate the optical crosstalk effects in microring-based silicon photonic interconnects. Both inter-channel crosstalk and intra-channel crosstalk are investigated and approximate equations are developed for their corresponding power penalties. Inclusion of the inter-channel crosstalk is an important part of our final analysis of a silicon photonic link.
In Chapter 8, Scalability of Silicon Photonic Links, we present the analysis of a WDM silicon photonics point-to-point link based on microring modulators and microring wavelength filters. Our approach is based on the power penalty analysis of non-return-to-zero (NRZ) signals and Gaussian noise statistics. All the necessary equations for the optical power penalty calculations are presented for microring modulators and filters. The first part of the analysis is based on various ideal assumptions which lead to a maximum capacity of 2.1 Tb/s for the link. The second part of the analysis is carried out with more realistic assumptions on the photonic elements in the link, culminating in a maximum throughput of 800 Gb/s. We also provide estimations of the energy/bit metric of such links based on the optimized models of electronic circuits in 65 nm CMOS technology
- …