323 research outputs found

    Flexible HLS-Based Implementation of the Karatsuba Multiplier Targeting Homomorphic Encryption Schemes

    Get PDF
    Custom accelerators for high-precision integer arithmetic are increasingly used in compute-intensive applications, in particular homomorphic encryption schemes. This work seeks to advance a strategy for faster deployment of these accelerators using the process of high-level synthesis (HLS). Insights from existing number theory software libraries and custom hardware accelerators are used to develop a scalable implementation of Karatsuba modular polynomial multiplication. The accelerator generated from this implementation by the high-level synthesis tool Vivado HLS achieves significant speedup over the implementations available in the highly-optimized FLINT software library. This is an important first step towards a larger goal of enabling HLS-based homomorphic encryption in the cloud

    Advanced analytics through FPGA based query processing and deep reinforcement learning

    Get PDF
    Today, vast streams of structured and unstructured data have been incorporated in databases, and analytical processes are applied to discover patterns, correlations, trends and other useful relationships that help to take part in a broad range of decision-making processes. The amount of generated data has grown very large over the years, and conventional database processing methods from previous generations have not been sufficient to provide satisfactory results regarding analytics performance and prediction accuracy metrics. Thus, new methods are needed in a wide array of fields from computer architectures, storage systems, network design to statistics and physics. This thesis proposes two methods to address the current challenges and meet the future demands of advanced analytics. First, we present AxleDB, a Field Programmable Gate Array based query processing system which constitutes the frontend of an advanced analytics system. AxleDB melds highly-efficient accelerators with memory, storage and provides a unified programmable environment. AxleDB is capable of offloading complex Structured Query Language queries from host CPU. The experiments have shown that running a set of TPC-H queries, AxleDB can perform full queries between 1.8x and 34.2x faster and 2.8x to 62.1x more energy efficient compared to MonetDB, and PostgreSQL on a single workstation node. Second, we introduce TauRieL, a novel deep reinforcement learning (DRL) based method for combinatorial problems. The design idea behind combining DRL and combinatorial problems is to apply the prediction capabilities of deep reinforcement learning and to use the universality of combinatorial optimization problems to explore general purpose predictive methods. TauRieL utilizes an actor-critic inspired DRL architecture that adopts ordinary feedforward nets. Furthermore, TauRieL performs online training which unifies training and state space exploration. The experiments show that TauRieL can generate solutions two orders of magnitude faster and performs within 3% of accuracy compared to the state-of-the-art DRL on the Traveling Salesman Problem while searching for the shortest tour. Also, we present that TauRieL can be adapted to the Knapsack combinatorial problem. With a very minimal problem specific modification, TauRieL can outperform a Knapsack specific greedy heuristics.Hoy en día, se han incorporado grandes cantidades de datos estructurados y no estructurados en las bases de datos, y se les aplican procesos analíticos para descubrir patrones, correlaciones, tendencias y otras relaciones útiles que se utilizan mayormente para la toma de decisiones. La cantidad de datos generados ha crecido enormemente a lo largo de los años, y los métodos de procesamiento de bases de datos convencionales utilizados en las generaciones anteriores no son suficientes para proporcionar resultados satisfactorios respecto al rendimiento del análisis y respecto de la precisión de las predicciones. Por lo tanto, se necesitan nuevos métodos en una amplia gama de campos, desde arquitecturas de computadoras, sistemas de almacenamiento, diseño de redes hasta estadísticas y física. Esta tesis propone dos métodos para abordar los desafíos actuales y satisfacer las demandas futuras de análisis avanzado. Primero, presentamos AxleDB, un sistema de procesamiento de consultas basado en FPGAs (Field Programmable Gate Array) que constituye la interfaz de un sistema de análisis avanzado. AxleDB combina aceleradores altamente eficientes con memoria, almacenamiento y proporciona un entorno programable unificado. AxleDB es capaz de descargar consultas complejas de lenguaje de consulta estructurado desde la CPU del host. Los experimentos han demostrado que al ejecutar un conjunto de consultas TPC-H, AxleDB puede realizar consultas completas entre 1.8x y 34.2x más rápido y 2.8x a 62.1x más eficiente energéticamente que MonetDB, y PostgreSQL en un solo nodo de una estación de trabajo. En segundo lugar, presentamos TauRieL, un nuevo método basado en Deep Reinforcement Learning (DRL) para problemas combinatorios. La idea central que está detrás de la combinación de DRL y problemas combinatorios, es aplicar las capacidades de predicción del aprendizaje de refuerzo profundo y el uso de la universalidad de los problemas de optimización combinatoria para explorar métodos predictivos de propósito general. TauRieL utiliza una arquitectura DRL inspirada en el actor-crítico que se adapta a redes feedforward. Además, TauRieL realiza el entrenamieton en línea que unifica el entrenamiento y la exploración espacial de los estados. Los experimentos muestran que TauRieL puede generar soluciones dos órdenes de magnitud más rápido y funciona con un 3% de precisión en comparación con el estado del arte en DRL aplicado al problema del viajante mientras busca el recorrido más corto. Además, presentamos que TauRieL puede adaptarse al problema de la Mochila. Con una modificación específica muy mínima del problema, TauRieL puede superar a una heurística codiciosa de Knapsack Problem.Postprint (published version

    Accelerating FPGA-based evolution of wavelet transform filters by optimized task scheduling

    Get PDF
    Adaptive embedded systems are required in various applications. This work addresses these needs in the area of adaptive image compression in FPGA devices. A simplified version of an evolution strategy is utilized to optimize wavelet filters of a Discrete Wavelet Transform algorithm. We propose an adaptive image compression system in FPGA where optimized memory architecture, parallel processing and optimized task scheduling allow reducing the time of evolution. The proposed solution has been extensively evaluated in terms of the quality of compression as well as the processing time. The proposed architecture reduces the time of evolution by 44% compared to our previous reports while maintaining the quality of compression unchanged with respect to existing implementations. The system is able to find an optimized set of wavelet filters in less than 2 min whenever the input type of data changes

    Ant colony optimization on runtime reconfigurable architectures

    Get PDF

    Accelerating Homomorphic Encryption in the Cloud Environment through High-Level Synthesis and Reconfigurable Resources

    Get PDF
    The recent surge in cloud services is revolutionizing the way that data is stored and processed. Everyone with an internet connection, from large corporations to small companies and private individuals, now have access to cutting-edge processing power and vast amounts of data storage. This rise in cloud computing and storage, however, has brought with it a need for a new type of security. In order to have access to cloud services, users must allow the service provider to have full access to their private, unencrypted data. Users are required to trust the integrity of the service provider and the security of its data centers. The recent development of fully homomorphic encryption schemes can offer a solution to this dilemma. These algorithms allow encrypted data to be used in computations without ever stripping the data of the protection of encryption. Unfortunately, the demanding memory requirements and computational complexity of the proposed schemes has hindered their wide-scale use. Custom hardware accelerators for homomorphic encryption could be implemented on the increasing number of reconfigurable hardware resources in the cloud, but the long development time required for these processors would lead to high production costs. This research seeks to develop a strategy for faster development of homomorphic encryption hardware accelerators using the process of High-Level Synthesis. Insights from existing number theory software libraries and custom hardware accelerators are used to develop a scalable, proof-of-concept software implementation of Karatsuba modular polynomial multiplication. This implementation was designed to be used with High-Level Synthesis to accelerate the large modular polynomial multiplication operations required by homomorphic encryption. The accelerator generated from this implementation by the High-Level Synthesis tool Vivado HLS achieved significant speedup over the implementations available in the highly-optimized FLINT software library

    Advanced analytics through FPGA based query processing and deep reinforcement learning

    Get PDF
    Today, vast streams of structured and unstructured data have been incorporated in databases, and analytical processes are applied to discover patterns, correlations, trends and other useful relationships that help to take part in a broad range of decision-making processes. The amount of generated data has grown very large over the years, and conventional database processing methods from previous generations have not been sufficient to provide satisfactory results regarding analytics performance and prediction accuracy metrics. Thus, new methods are needed in a wide array of fields from computer architectures, storage systems, network design to statistics and physics. This thesis proposes two methods to address the current challenges and meet the future demands of advanced analytics. First, we present AxleDB, a Field Programmable Gate Array based query processing system which constitutes the frontend of an advanced analytics system. AxleDB melds highly-efficient accelerators with memory, storage and provides a unified programmable environment. AxleDB is capable of offloading complex Structured Query Language queries from host CPU. The experiments have shown that running a set of TPC-H queries, AxleDB can perform full queries between 1.8x and 34.2x faster and 2.8x to 62.1x more energy efficient compared to MonetDB, and PostgreSQL on a single workstation node. Second, we introduce TauRieL, a novel deep reinforcement learning (DRL) based method for combinatorial problems. The design idea behind combining DRL and combinatorial problems is to apply the prediction capabilities of deep reinforcement learning and to use the universality of combinatorial optimization problems to explore general purpose predictive methods. TauRieL utilizes an actor-critic inspired DRL architecture that adopts ordinary feedforward nets. Furthermore, TauRieL performs online training which unifies training and state space exploration. The experiments show that TauRieL can generate solutions two orders of magnitude faster and performs within 3% of accuracy compared to the state-of-the-art DRL on the Traveling Salesman Problem while searching for the shortest tour. Also, we present that TauRieL can be adapted to the Knapsack combinatorial problem. With a very minimal problem specific modification, TauRieL can outperform a Knapsack specific greedy heuristics.Hoy en día, se han incorporado grandes cantidades de datos estructurados y no estructurados en las bases de datos, y se les aplican procesos analíticos para descubrir patrones, correlaciones, tendencias y otras relaciones útiles que se utilizan mayormente para la toma de decisiones. La cantidad de datos generados ha crecido enormemente a lo largo de los años, y los métodos de procesamiento de bases de datos convencionales utilizados en las generaciones anteriores no son suficientes para proporcionar resultados satisfactorios respecto al rendimiento del análisis y respecto de la precisión de las predicciones. Por lo tanto, se necesitan nuevos métodos en una amplia gama de campos, desde arquitecturas de computadoras, sistemas de almacenamiento, diseño de redes hasta estadísticas y física. Esta tesis propone dos métodos para abordar los desafíos actuales y satisfacer las demandas futuras de análisis avanzado. Primero, presentamos AxleDB, un sistema de procesamiento de consultas basado en FPGAs (Field Programmable Gate Array) que constituye la interfaz de un sistema de análisis avanzado. AxleDB combina aceleradores altamente eficientes con memoria, almacenamiento y proporciona un entorno programable unificado. AxleDB es capaz de descargar consultas complejas de lenguaje de consulta estructurado desde la CPU del host. Los experimentos han demostrado que al ejecutar un conjunto de consultas TPC-H, AxleDB puede realizar consultas completas entre 1.8x y 34.2x más rápido y 2.8x a 62.1x más eficiente energéticamente que MonetDB, y PostgreSQL en un solo nodo de una estación de trabajo. En segundo lugar, presentamos TauRieL, un nuevo método basado en Deep Reinforcement Learning (DRL) para problemas combinatorios. La idea central que está detrás de la combinación de DRL y problemas combinatorios, es aplicar las capacidades de predicción del aprendizaje de refuerzo profundo y el uso de la universalidad de los problemas de optimización combinatoria para explorar métodos predictivos de propósito general. TauRieL utiliza una arquitectura DRL inspirada en el actor-crítico que se adapta a redes feedforward. Además, TauRieL realiza el entrenamieton en línea que unifica el entrenamiento y la exploración espacial de los estados. Los experimentos muestran que TauRieL puede generar soluciones dos órdenes de magnitud más rápido y funciona con un 3% de precisión en comparación con el estado del arte en DRL aplicado al problema del viajante mientras busca el recorrido más corto. Además, presentamos que TauRieL puede adaptarse al problema de la Mochila. Con una modificación específica muy mínima del problema, TauRieL puede superar a una heurística codiciosa de Knapsack Problem

    A genetic parallel programming based logic circuit synthesizer.

    Get PDF
    Lau, Wai Shing.Thesis submitted in: November 2006.Thesis (M.Phil.)--Chinese University of Hong Kong, 2007.Includes bibliographical references (leaves 85-94).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Field Programmable Gate Arrays --- p.2Chapter 1.2 --- FPGA technology mapping problem --- p.3Chapter 1.3 --- Motivations --- p.5Chapter 1.4 --- Contributions --- p.6Chapter 1.5 --- Thesis Organization --- p.9Chapter 2 --- Background Study --- p.11Chapter 2.1 --- Deterministic approach to technology mapping problem --- p.11Chapter 2.1.1 --- FlowMap --- p.12Chapter 2.1.2 --- DAOMap --- p.14Chapter 2.2 --- Stochastic approach --- p.15Chapter 2.2.1 --- Bio-Inspired Methods for Multi-Level Combinational Logic Circuit Design --- p.15Chapter 2.2.2 --- A Survey of Combinational Logic Circuit Representations in stochastic algorithms --- p.17Chapter 2.3 --- Genetic Parallel Programming --- p.20Chapter 2.3.1 --- Accelerating Phenomenon --- p.22Chapter 2.4 --- Chapter Summary --- p.23Chapter 3 --- A GPP based Logic Circuit Synthesizer --- p.24Chapter 3.1 --- Overall system architecture --- p.25Chapter 3.2 --- Multi-Logic-Unit Processor --- p.26Chapter 3.3 --- The Genotype of a MLP program --- p.28Chapter 3.4 --- The Phenotype of a MLP program --- p.31Chapter 3.5 --- The Evolution Engine --- p.33Chapter 3.5.1 --- The Dual-Phase Approach --- p.33Chapter 3.5.2 --- Genetic operators --- p.35Chapter 3.6 --- Chapter Summary --- p.38Chapter 4 --- MLP in hardware --- p.39Chapter 4.1 --- Motivation --- p.39Chapter 4.2 --- Hardware Design and Implementation --- p.40Chapter 4.3 --- Experimental Settings --- p.43Chapter 4.4 --- Experimental Results and Evaluations --- p.46Chapter 4.5 --- Chapter Summary --- p.50Chapter 5 --- Feasibility Study of Multi MLPs --- p.51Chapter 5.1 --- Motivation --- p.52Chapter 5.2 --- Overall Architecture --- p.53Chapter 5.3 --- Experimental settings --- p.55Chapter 5.4 --- Experimental results and evaluations --- p.59Chapter 5.5 --- Chapter Summary --- p.59Chapter 6 --- A Hybridized GPPLCS --- p.61Chapter 6.1 --- Motivation --- p.62Chapter 6.2 --- Overall system architecture --- p.62Chapter 6.3 --- Experimental settings --- p.64Chapter 6.4 --- Experimental results and evaluations --- p.66Chapter 6.5 --- Chapter Summary --- p.70Chapter 7 --- A Memetic GPPLCS --- p.71Chapter 7.1 --- Motivation --- p.72Chapter 7.2 --- Overall system architecture --- p.72Chapter 7.3 --- Experimental settings --- p.76Chapter 7.4 --- Experimental results and evaluations --- p.77Chapter 7.5 --- Chapter Summary --- p.80Chapter 8 --- Conclusion --- p.82Chapter 8.1 --- Future work --- p.83Bibliography --- p.8

    The synthesis of application-specific machines using the Euler language

    Get PDF
    A rapid prototyping environment, called SAMUEL, for creating custom computing machines is described. The custom computing machines are synthesized by a compiler from a general purpose algorithmic language and a library of Verilog opcode circuits. The opcode circuits implement the interpretation rules defined for the algorithmic language. The compiler produces as output a Verilog description of the custom computing machine. This description can be used for simulation, or for synthesis with commercial tools;The opcode library makes SAMUEL unique among other research work that has been documented by raising the semantic level of the level 0 circuits. SAMUEL is also unique because the algorithmic language used is not a hardware description language, and it has not been modified in any way from the original language definition. Finally, SAMUEL is unique because the language chosen supports dynamic procedure definition. This allows a procedure to transform into a completely different procedure at runtime. This is language-supported reconfigurability which enhances the current research trends in reconfigurable devices;Custom computing machines generated by SAMUEL can be described using the scheme given by Milutinovic as software translated, language corresponding, complex, directly executing architectural support for the high-level language Euler (1). The approach differs from other work, however, by exploiting the field programmability of gate arrays (and the freedom guaranteed by a simulation environment) to create custom computing machines that only support the required language opcodes. This is important when the limited real-estate space of programmable logic is considered. Averaged real-estate savings can be achieved by not implementing support for the entire language on every custom computing machine

    Raw fabric hardware implementation and characterization

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 109-110).The Raw architecture is scalable, improving performance not by pushing the limits of clock frequency, but by spreading computation across numerous simple, replicated tiles. The first Raw processors fabricated have 16 RISC processor tiles that share the workload. The Raw Fabric system extends Raw's scalability by weaving together multiple 16-tile Raw processors. The Raw Fabric is a modular and scalable system comprised of two board types: one to house 4 Raw processors (Processor board) and one to handle communications (I/O board). The design is modular because it breaks down the system into smaller parts, and it is scalable because these modules may be combined to create large Fabrics. The ultimate goal is to produce a Raw Fabric with 16 Processor boards (equivalently, 64 Raw processors or 1024 tiles), though the current largest Fabric system includes one Processor board and 3 I/O boards. This thesis walks through the important design and implementation challenges and documents how they were solved. The most basic challenge faced was to design a system flexible enough to accommodate a variety of Fabric sizes.(cont.) Next, the distribution of vital signals such as power and clock provides a problem unique to the Fabric system because of the possible size of the final product. Finally, the astounding number of signal wires running between boards presents a unique challenge in finding parts and designing the mechanical aspects. The intent of this thesis is to provide the reader with an idea of the considerations necessary for designing and implementing a system of this magnitude and level of flexibility.by Albert Sun.M.Eng

    Reconfigurable Instruction Cell Architecture Reconfiguration and Interconnects

    Get PDF
    corecore