513 research outputs found

    Algorithm Architecture Co-design for Dense and Sparse Matrix Computations

    Get PDF
    abstract: With the end of Dennard scaling and Moore's law, architects have moved towards heterogeneous designs consisting of specialized cores to achieve higher performance and energy efficiency for a target application domain. Applications of linear algebra are ubiquitous in the field of scientific computing, machine learning, statistics, etc. with matrix computations being fundamental to these linear algebra based solutions. Design of multiple dense (or sparse) matrix computation routines on the same platform is quite challenging. Added to the complexity is the fact that dense and sparse matrix computations have large differences in their storage and access patterns and are difficult to optimize on the same architecture. This thesis addresses this challenge and introduces a reconfigurable accelerator that supports both dense and sparse matrix computations efficiently. The reconfigurable architecture has been optimized to execute the following linear algebra routines: GEMV (Dense General Matrix Vector Multiplication), GEMM (Dense General Matrix Matrix Multiplication), TRSM (Triangular Matrix Solver), LU Decomposition, Matrix Inverse, SpMV (Sparse Matrix Vector Multiplication), SpMM (Sparse Matrix Matrix Multiplication). It is a multicore architecture where each core consists of a 2D array of processing elements (PE). The 2D array of PEs is of size 4x4 and is scheduled to perform 4x4 sized matrix updates efficiently. A sequence of such updates is used to solve a larger problem inside a core. A novel partitioned block compressed sparse data structure (PBCSC/PBCSR) is used to perform sparse kernel updates. Scalable partitioning and mapping schemes are presented that map input matrices of any given size to the multicore architecture. Design trade-offs related to the PE array dimension, size of local memory inside a core and the bandwidth between on-chip memories and the cores have been presented. An optimal core configuration is developed from this analysis. Synthesis results using a 7nm PDK show that the proposed accelerator can achieve a performance of upto 32 GOPS using a single core.Dissertation/ThesisMasters Thesis Computer Engineering 201

    Harnessing the power of GPUs for problems in real algebraic geometry

    Get PDF
    This thesis presents novel parallel algorithms to leverage the power of GPUs (Graphics Processing Units) for exact computations with polynomials having large integer coefficients. The significance of such computations, especially in real algebraic geometry, is hard to undermine. On massively-parallel architectures such as GPU, the degree of datalevel parallelism exposed by an algorithm is the main performance factor. We attain high efficiency through the use of structured matrix theory to assist the realization of relevant operations on polynomials on the graphics hardware. A detailed complexity analysis, assuming the PRAM model, also confirms that our approach achieves a substantially better parallel complexity in comparison to classical algorithms used for symbolic computations. Aside from the theoretical considerations, a large portion of this work is dedicated to the actual algorithm development and optimization techniques where we pay close attention to the specifics of the graphics hardware. As a byproduct of this work, we have developed high-throughput modular arithmetic which we expect to be useful for other GPU applications, in particular, open-key cryptography. We further discuss the algorithms for the solution of a system of polynomial equations, topology computation of algebraic curves and curve visualization which can profit to the full extent from the GPU acceleration. Extensive benchmarking on a real data demonstrates the superiority of our algorithms over several state-of-the-art approaches available to date. This thesis is written in English.Diese Arbeit beschäftigt sich mit neuen parallelen Algorithmen, die das Leistungspotenzial der Grafik-Prozessoren (GPUs) zur exakten Berechnungen mit ganzzahlige Polynomen nutzen. Solche symbolische Berechnungen sind von großer Bedeutung zur Lösung vieler Probleme aus der reellen algebraischen Geometrie. Für die effziente Implementierung eines Algorithmus auf massiv-parallelen Hardwarearchitekturen, wie z.B. GPU, ist vor allem auf eine hohe Datenparallelität zu achten. Unter Verwendung von Ergebnissen aus der strukturierten Matrix-Theorie konnten wir die entsprechenden Operationen mit Polynomen auf der Grafikkarte leicht übertragen. Außerdem zeigt eine Komplexitätanalyse im PRAM-Rechenmodell, dass die von uns entwickelten Verfahren eine deutlich bessere Komplexität aufweisen als dies für die klassischen Verfahren der Fall ist. Neben dem theoretischen Ergebnis liegt ein weiterer Schwerpunkt dieser Arbeit in der praktischen Implementierung der betrachteten Algorithmen, wobei wir auf der Besonderheiten der Grafikhardware achten. Im Rahmen dieser Arbeit haben wir hocheffiziente modulare Arithmetik entwickelt, von der wir erwarten, dass sie sich für andere GPU Anwendungen, insbesondere der Public-Key-Kryptographie, als nützlich erweisen wird. Darüber hinaus betrachten wir Algorithmen für die Lösung eines Systems von Polynomgleichungen, Topologie Berechnung der algebraischen Kurven und deren Visualisierung welche in vollem Umfang von der GPU-Leistung profitieren können. Zahlreiche Experimente belegen dass wir zur Zeit die beste Verfahren zur Verfügung stellen. Diese Dissertation ist in englischer Sprache verfasst

    Efficient Algorithms for Solving Structured Eigenvalue Problems Arising in the Description of Electronic Excitations

    Get PDF
    Matrices arising in linear-response time-dependent density functional theory and many-body perturbation theory, in particular in the Bethe-Salpeter approach, show a 2 × 2 block structure. The motivation to devise new algorithms, instead of using general purpose eigenvalue solvers, comes from the need to solve large problems on high performance computers. This requires parallelizable and communication-avoiding algorithms and implementations. We point out various novel directions for diagonalizing structured matrices. These include the solution of skew-symmetric eigenvalue problems in ELPA, as well as structure preserving spectral divide-and-conquer schemes employing generalized polar decompostions

    Design and Implementation of Efficient Algorithms for Wireless MIMO Communication Systems

    Full text link
    En la última década, uno de los avances tecnológicos más importantes que han hecho culminar la nueva generación de banda ancha inalámbrica es la comunicación mediante sistemas de múltiples entradas y múltiples salidas (MIMO). Las tecnologías MIMO han sido adoptadas por muchos estándares inalámbricos tales como LTE, WiMAS y WLAN. Esto se debe principalmente a su capacidad de aumentar la máxima velocidad de transmisión , junto con la fiabilidad alcanzada y la cobertura de las comunicaciones inalámbricas actuales sin la necesidad de ancho de banda extra ni de potencia de transmisión adicional. Sin embargo, las ventajas proporcionadas por los sistemas MIMO se producen a expensas de un aumento sustancial del coste de implementación de múltiples antenas y de la complejidad del receptor, la cual tiene un gran impacto sobre el consumo de energía. Por esta razón, el diseño de receptores de baja complejidad es un tema importante que se abordará a lo largo de esta tesis. En primer lugar, se investiga el uso de técnicas de preprocesado de la matriz de canal MIMO bien para disminuir el coste computacional de decodificadores óptimos o bien para mejorar las prestaciones de detectores subóptimos lineales, SIC o de búsqueda en árbol. Se presenta una descripción detallada de dos técnicas de preprocesado ampliamente utilizadas: el método de Lenstra, Lenstra, Lovasz (LLL) para lattice reduction (LR) y el algorimo VBLAST ZF-DFE. Tanto la complejidad como las prestaciones de ambos métodos se han evaluado y comparado entre sí. Además, se propone una implementación de bajo coste del algoritmo VBLAST ZF-DFE, la cual se incluye en la evaluación. En segundo lugar, se ha desarrollado un detector MIMO basado en búsqueda en árbol de baja complejidad, denominado detector K-Best de amplitud variable (VB K-Best). La idea principal de este método es aprovechar el impacto del número de condición de la matriz de canal sobre la detección de datos con el fin de disminuir la complejidad de los sistemasRoger Varea, S. (2012). Design and Implementation of Efficient Algorithms for Wireless MIMO Communication Systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/16562Palanci

    Tamper-Resistant Arithmetic for Public-Key Cryptography

    Get PDF
    Cryptographic hardware has found many uses in many ubiquitous and pervasive security devices with a small form factor, e.g. SIM cards, smart cards, electronic security tokens, and soon even RFIDs. With applications in banking, telecommunication, healthcare, e-commerce and entertainment, these devices use cryptography to provide security services like authentication, identification and confidentiality to the user. However, the widespread adoption of these devices into the mass market, and the lack of a physical security perimeter have increased the risk of theft, reverse engineering, and cloning. Despite the use of strong cryptographic algorithms, these devices often succumb to powerful side-channel attacks. These attacks provide a motivated third party with access to the inner workings of the device and therefore the opportunity to circumvent the protection of the cryptographic envelope. Apart from passive side-channel analysis, which has been the subject of intense research for over a decade, active tampering attacks like fault analysis have recently gained increased attention from the academic and industrial research community. In this dissertation we address the question of how to protect cryptographic devices against this kind of attacks. More specifically, we focus our attention on public key algorithms like elliptic curve cryptography and their underlying arithmetic structure. In our research we address challenges such as the cost of implementation, the level of protection, and the error model in an adversarial situation. The approaches that we investigated all apply concepts from coding theory, in particular the theory of cyclic codes. This seems intuitive, since both public key cryptography and cyclic codes share finite field arithmetic as a common foundation. The major contributions of our research are (a) a generalization of cyclic codes that allow embedding of finite fields into redundant rings under a ring homomorphism, (b) a new family of non-linear arithmetic residue codes with very high error detection probability, (c) a set of new low-cost arithmetic primitives for optimal extension field arithmetic based on robust codes, and (d) design techniques for tamper resilient finite state machines

    Proceedings of the Fifth NASA/NSF/DOD Workshop on Aerospace Computational Control

    Get PDF
    The Fifth Annual Workshop on Aerospace Computational Control was one in a series of workshops sponsored by NASA, NSF, and the DOD. The purpose of these workshops is to address computational issues in the analysis, design, and testing of flexible multibody control systems for aerospace applications. The intention in holding these workshops is to bring together users, researchers, and developers of computational tools in aerospace systems (spacecraft, space robotics, aerospace transportation vehicles, etc.) for the purpose of exchanging ideas on the state of the art in computational tools and techniques

    Optical Space Division Multiplexing in Short Reach Multi-Mode Fiber Systems

    Get PDF
    The application of space division multiplexing to fiber-optic communications is a promising approach to further increase the channel capacity of optical waveguides. In this work, short reach and low-cost optical space division multiplexing systems with intensity modulation and direct detection (IM/DD) are in the focus of interest. Herein, different modes are utilized to generate spatial diversity in a multi-mode fiber. In such IM/DD systems, the process of square-law detection is inherently non-linear. In order to obtain an understanding of the channel characteristics, a system model is developed, which is able to show under which conditions the system can be considered linear in baseband. It is shown that linearity applies in scenarios with low mode cross-talk. This enables the use of linear multiple-input multiple-output (MIMO) signal processing strategies for equalization purposes. In conditions with high mode cross-talk, significant interference occurs, and the transmitted information cannot be extracted at the receiver. Furthermore, a method to determine the power coupling coefficients between mode groups is presented that does not require the excitation of individual modes, and hence it can be realized with inexpensive components. In addition, different optical components are analyzed with respect for their suitability in MIMO setups with IM/DD. The conventional approach with single-mode fiber to multi-mode fiber offset launches and optical couplers as well as a configuration that utilizes multi-segment detection are feasible options for a (2x2) setup. It is further shown that conventional photonic lanterns are not suited for MIMO with IM/DD due to their low mode orthogonality during the multiplexing process. In order to enable higher order MIMO configurations, devices for mode multiplexing and demultiplexing need to be developed, which exhibit a high mode orthogonality on one hand and are low-cost on the other hand

    MIMO Systems

    Get PDF
    In recent years, it was realized that the MIMO communication systems seems to be inevitable in accelerated evolution of high data rates applications due to their potential to dramatically increase the spectral efficiency and simultaneously sending individual information to the corresponding users in wireless systems. This book, intends to provide highlights of the current research topics in the field of MIMO system, to offer a snapshot of the recent advances and major issues faced today by the researchers in the MIMO related areas. The book is written by specialists working in universities and research centers all over the world to cover the fundamental principles and main advanced topics on high data rates wireless communications systems over MIMO channels. Moreover, the book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity

    Development of a New 3D Reconstruction Algorithm for Computed Tomography (CT)

    Full text link
    [EN] Model-based computed tomography (CT) image reconstruction is dominated by iterative algorithms. Although long reconstruction times remain as a barrier in practical applications, techniques to speed up its convergence are object of investigation, obtaining impressive results. In this thesis, a direct algorithm is proposed for model-based image reconstruction. The model-based approximation relies on the construction of a model matrix that poses a linear system which solution is the reconstructed image. The proposed algorithm consists in the QR decomposition of this matrix and the resolution of the system by a backward substitution process. The cost of this image reconstruction technique is a matrix vector multiplication and a backward substitution process, since the model construction and the QR decomposition are performed only once, because of each image reconstruction corresponds to the resolution of the same CT system for a different right hand side. Several problems regarding the implementation of this algorithm arise, such as the exact calculation of a volume intersection, definition of fill-in reduction strategies optimized for CT model matrices, or CT symmetry exploit to reduce the size of the system. These problems have been detailed and solutions to overcome them have been proposed, and as a result, a proof of concept implementation has been obtained. Reconstructed images have been analyzed and compared against the filtered backprojection (FBP) and maximum likelihood expectation maximization (MLEM) reconstruction algorithms, and results show several benefits of the proposed algorithm. Although high resolutions could not have been achieved yet, obtained results also demonstrate the prospective of this algorithm, as great performance and scalability improvements would be achieved with the success in the development of better fill-in strategies or additional symmetries in CT geometry.[ES] En la reconstrucción de imagen de tomografía axial computerizada (TAC), en su modalidad model-based, prevalecen los algoritmos iterativos. Aunque los altos tiempos de reconstrucción aún son una barrera para aplicaciones prácticas, diferentes técnicas para la aceleración de su convergencia están siendo objeto de investigación, obteniendo resultados impresionantes. En esta tesis, se propone un algoritmo directo para la reconstrucción de imagen model-based. La aproximación model-based se basa en la construcción de una matriz modelo que plantea un sistema lineal cuya solución es la imagen reconstruida. El algoritmo propuesto consiste en la descomposición QR de esta matriz y la resolución del sistema por un proceso de sustitución regresiva. El coste de esta técnica de reconstrucción de imagen es un producto matriz vector y una sustitución regresiva, ya que la construcción del modelo y la descomposición QR se realizan una sola vez, debido a que cada reconstrucción de imagen supone la resolución del mismo sistema TAC para un término independiente diferente. Durante la implementación de este algoritmo aparecen varios problemas, tales como el cálculo exacto del volumen de intersección, la definición de estrategias de reducción del relleno optimizadas para matrices de modelo de TAC, o el aprovechamiento de simetrías del TAC que reduzcan el tama\~no del sistema. Estos problemas han sido detallados y se han propuesto soluciones para superarlos, y como resultado, se ha obtenido una implementación de prueba de concepto. Las imágenes reconstruidas han sido analizadas y comparadas frente a los algoritmos de reconstrucción filtered backprojection (FBP) y maximum likelihood expectation maximization (MLEM), y los resultados muestran varias ventajas del algoritmo propuesto. Aunque no se han podido obtener resoluciones altas aún, los resultados obtenidos también demuestran el futuro de este algoritmo, ya que se podrían obtener mejoras importantes en el rendimiento y la escalabilidad con el éxito en el desarrollo de mejores estrategias de reducción de relleno o simetrías en la geometría TAC.[CA] En la reconstrucció de imatge tomografia axial computerizada (TAC) en la seua modalitat model-based prevaleixen els algorismes iteratius. Tot i que els alts temps de reconstrucció encara són un obstacle per a aplicacions pràctiques, diferents tècniques per a l'acceleració de la seua convergència estàn siguent objecte de investigació, obtenint resultats impressionants. En aquesta tesi, es proposa un algorisme direct per a la recconstrucció de image model-based. L'aproximació model-based es basa en la construcció d'una matriu model que planteja un sistema lineal quina sol·lució es la imatge reconstruida. L'algorisme propost consisteix en la descomposició QR d'aquesta matriu i la resolució del sistema per un procés de substitució regresiva. El cost d'aquesta tècnica de reconstrucció de imatge es un producte matriu vector i una substitució regresiva, ja que la construcció del model i la descomposició QR es realitzen una sola vegada, degut a que cada reconstrucció de imatge suposa la resolució del mateix sistema TAC per a un tèrme independent diferent. Durant la implementació d'aquest algorisme sorgixen diferents problemes, tals com el càlcul exacte del volum de intersecció, la definició d'estratègies de reducció de farcit optimitzades per a matrius de model de TAC, o el aprofitament de simetries del TAC que redueixquen el tamany del sistema. Aquestos problemes han sigut detallats y s'han proposat solucions per a superar-los, i com a resultat, s'ha obtingut una implementació de prova de concepte. Les imatges reconstruides han sigut analitzades i comparades front als algorismes de reconstrucció filtered backprojection (FBP) i maximum likelihood expectation maximization (MLEM), i els resultats mostren varies ventajes del algorisme propost. Encara que no s'han pogut obtindre resolucions altes ara per ara, els resultats obtinguts també demostren el futur d'aquest algorisme, ja que es prodrien obtindre millores importants en el rendiment i la escalabilitat amb l'éxit en el desemvolupament de millors estratègies de reducció de farcit o simetries en la geometria TAC.Iborra Carreres, A. (2015). Development of a New 3D Reconstruction Algorithm for Computed Tomography (CT) [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/59421TESI

    Design of optimal equalizers and precoders for MIMO channels

    Get PDF
    Channel equalization has been extensively studied as a method of combating ISI and ICI for high speed MIMO data communication systems. This dissertation focuses on optimal channel equalization in the presence of non-white observation noises with unknown PSD but bounded power-norm. A worst-case approach to optimal design of channel equalizers leads to an equivalent optimal H-infinity filtering problem for the MIMO communication systems. An explicit design algorithm is derived which not only achieves the zero-forcing (ZF) condition, but also minimizes the RMS error between the transmitted symbols and the received symbols. The second part of this dissertation investigates the design of optimal precoders which minimize the bit error rate (BER) subject to a fixed transmit-power constraint for the multiple antennas downlink communication channels under the perfect reconstruction (PR) condition. The closed form solutions are derived and an efficient design algorithm is proposed. The performance evaluations indicate that the optimal precoder design for multiple antennas communication systems proposed herein is an attractive/reasonable alternative to the existing precoder design techniques
    corecore