101,479 research outputs found

    FPGAS and Database Engines Integration

    Get PDF
    This work focuses on the integration of an FPGA with a database engine. The purpose is to design and implement hardware-accelerated algorithms that are useful for the real-time work demands of modern databases in the world of data science and “Big Data” analysis. The work proposes an integration architecture of a commercial FPGA and an open source database. As an example of application, it is proposed the encryption and decryption of data using these services implemented in an FPGA. The algorithm used is the well-known 128-bit Advanced Encryption Standard (AES). The hardware-accelerated algorithm is used within the MySQL open-source database engine by using a UDF function in the API provided by the engine to test the algorithms implemented in hardware for processing large amounts of data. The FPGA is a commercial version of the manufacturer Xilinx.Consejo Nacional de Ciencia y Tecnologí

    Scout: a hardware-accelerated system for quantitatively driven visualization and analysis

    Get PDF
    Journal ArticleQuantitative techniques for visualization are critical to the successful analysis of both acquired and simulated scientific data. Many visualization techniques rely on indirect mappings, such as transfer functions, to produce the final imagery. In many situations, it is preferable and more powerful to express these mappings as mathematical expressions, or queries, that can then be directly applied to the data. In this paper, we present a hardware-accelerated system that provides such capabilities and exploits current graphics hardware for portions of the computational tasks that would otherwise be executed on the CPU. In our approach, the direct programming of the graphics processor using a concise data parallel language, gives scientists the capability to efficiently explore and visualize data sets

    Modeling of a hardware VLSI placement system: Accelerating the Simulated Annealing algorithm

    Get PDF
    An essential step in the automation of electronic design is the placement of the physical components on the target semiconductor die. The placement step presents the opportunity to reduce costs in terms of wire length and performance degradation; however it is compute intensive and is NP-complete in terms of obtaining an optimal solution. As designs have grown in complexity and gate count, obtaining an optimal solution is not feasible due to time to market constraints or sheer compute effort required. Heuristic algorithms allow for efficient but sub-optimal designs to be produced with a reduction in processing time. A widely used algorithm is Simulated Annealing (SA). The goal of this work was to develop a model that would enable an analysis into the feasibility of developing a hardware accelerated placement system which uses SA at its core. The SA heuristic was analyzed for possible improvements in efficiency with focus given to targeting the system for hardware. A solution implementing parallel computing with specialized hardware configurations inside a field programmable gate array (FPGA) was investigated as having the possibility to improve the efficiency of the SA-based algorithm. All supporting subsystems were also described for a hardware accelerated model. A large speedup was analytically shown from both accelerating the critical path of the SA algorithm as well as novel methods of improving SA\u27s efficiency. As data throughput requirements were not included in this work, the results presented may be optimistic for an overall system speedup. However, the results clearly show that future work is warranted in studying the concept of a hardware accelerated placement system

    Investigating the impact of image content on the energy efficiency of hardware-accelerated digital spatial filters

    Get PDF
    Battery-operated low-power portable computing devices are becoming an inseparable part of human daily life. One of the major goals is to achieve the longest battery life in such a device. Additionally, the need for performance in processing multimedia content is ever increasing. Processing image and video content consume more power than other applications. A widely used approach to improving energy efficiency is to implement the computationally intensive functions as digital hardware accelerators. Spatial filtering is one of the most commonly used methods of digital image processing. As per the Fourier theory, an image can be considered as a two-dimensional signal that is composed of spatially extended two-dimensional sinusoidal patterns called gratings. Spatial frequency theory states that sinusoidal gratings can be characterised by its spatial frequency, phase, amplitude, and orientation. This article presents results from our investigation into assessing the impact of these characteristics of a digital image on the energy efficiency of hardware-accelerated spatial filters employed to process the same image. Two greyscale images each of size 128 × 128 pixels comprising two-dimensional sinusoidal gratings at maximum spatial frequency of 64 cycles per image orientated at 0° and 90°, respectively, were processed in a hardware implemented Gaussian smoothing filter. The energy efficiency of the filter was compared with the baseline energy efficiency of processing a featureless plain black image. The results show that energy efficiency of the filter drops to 12.5% when the gratings are orientated at 0° whilst rises to 72.38% at 90°

    Hardware-accelerated interactive data visualization for neuroscience in Python.

    Get PDF
    Large datasets are becoming more and more common in science, particularly in neuroscience where experimental techniques are rapidly evolving. Obtaining interpretable results from raw data can sometimes be done automatically; however, there are numerous situations where there is a need, at all processing stages, to visualize the data in an interactive way. This enables the scientist to gain intuition, discover unexpected patterns, and find guidance about subsequent analysis steps. Existing visualization tools mostly focus on static publication-quality figures and do not support interactive visualization of large datasets. While working on Python software for visualization of neurophysiological data, we developed techniques to leverage the computational power of modern graphics cards for high-performance interactive data visualization. We were able to achieve very high performance despite the interpreted and dynamic nature of Python, by using state-of-the-art, fast libraries such as NumPy, PyOpenGL, and PyTables. We present applications of these methods to visualization of neurophysiological data. We believe our tools will be useful in a broad range of domains, in neuroscience and beyond, where there is an increasing need for scalable and fast interactive visualization

    Evaluating Rapid Application Development with Python for Heterogeneous Processor-based FPGAs

    Full text link
    As modern FPGAs evolve to include more het- erogeneous processing elements, such as ARM cores, it makes sense to consider these devices as processors first and FPGA accelerators second. As such, the conventional FPGA develop- ment environment must also adapt to support more software- like programming functionality. While high-level synthesis tools can help reduce FPGA development time, there still remains a large expertise gap in order to realize highly performing implementations. At a system-level the skill set necessary to integrate multiple custom IP hardware cores, interconnects, memory interfaces, and now heterogeneous processing elements is complex. Rather than drive FPGA development from the hardware up, we consider the impact of leveraging Python to ac- celerate application development. Python offers highly optimized libraries from an incredibly large developer community, yet is limited to the performance of the hardware system. In this work we evaluate the impact of using PYNQ, a Python development environment for application development on the Xilinx Zynq devices, the performance implications, and bottlenecks associated with it. We compare our results against existing C-based and hand-coded implementations to better understand if Python can be the glue that binds together software and hardware developers.Comment: To appear in 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM'17

    FPGA-accelerated machine learning inference as a service for particle physics computing

    Full text link
    New heterogeneous computing paradigms on dedicated hardware with increased parallelization, such as Field Programmable Gate Arrays (FPGAs), offer exciting solutions with large potential gains. The growing applications of machine learning algorithms in particle physics for simulation, reconstruction, and analysis are naturally deployed on such platforms. We demonstrate that the acceleration of machine learning inference as a web service represents a heterogeneous computing solution for particle physics experiments that potentially requires minimal modification to the current computing model. As examples, we retrain the ResNet-50 convolutional neural network to demonstrate state-of-the-art performance for top quark jet tagging at the LHC and apply a ResNet-50 model with transfer learning for neutrino event classification. Using Project Brainwave by Microsoft to accelerate the ResNet-50 image classification model, we achieve average inference times of 60 (10) milliseconds with our experimental physics software framework using Brainwave as a cloud (edge or on-premises) service, representing an improvement by a factor of approximately 30 (175) in model inference latency over traditional CPU inference in current experimental hardware. A single FPGA service accessed by many CPUs achieves a throughput of 600--700 inferences per second using an image batch of one, comparable to large batch-size GPU throughput and significantly better than small batch-size GPU throughput. Deployed as an edge or cloud service for the particle physics computing model, coprocessor accelerators can have a higher duty cycle and are potentially much more cost-effective.Comment: 16 pages, 14 figures, 2 table
    corecore