47 research outputs found

    An empirical evaluation of High-Level Synthesis languages and tools for database acceleration

    Get PDF
    High Level Synthesis (HLS) languages and tools are emerging as the most promising technique to make FPGAs more accessible to software developers. Nevertheless, picking the most suitable HLS for a certain class of algorithms depends on requirements such as area and throughput, as well as on programmer experience. In this paper, we explore the different trade-offs present when using a representative set of HLS tools in the context of Database Management Systems (DBMS) acceleration. More specifically, we conduct an empirical analysis of four representative frameworks (Bluespec SystemVerilog, Altera OpenCL, LegUp and Chisel) that we utilize to accelerate commonly-used database algorithms such as sorting, the median operator, and hash joins. Through our implementation experience and empirical results for database acceleration, we conclude that the selection of the most suitable HLS depends on a set of orthogonal characteristics, which we highlight for each HLS framework.Peer ReviewedPostprint (author’s final draft

    Embedded Machine Learning: Emphasis on Hardware Accelerators and Approximate Computing for Tactile Data Processing

    Get PDF
    Machine Learning (ML) a subset of Artificial Intelligence (AI) is driving the industrial and technological revolution of the present and future. We envision a world with smart devices that are able to mimic human behavior (sense, process, and act) and perform tasks that at one time we thought could only be carried out by humans. The vision is to achieve such a level of intelligence with affordable, power-efficient, and fast hardware platforms. However, embedding machine learning algorithms in many application domains such as the internet of things (IoT), prostheses, robotics, and wearable devices is an ongoing challenge. A challenge that is controlled by the computational complexity of ML algorithms, the performance/availability of hardware platforms, and the application\u2019s budget (power constraint, real-time operation, etc.). In this dissertation, we focus on the design and implementation of efficient ML algorithms to handle the aforementioned challenges. First, we apply Approximate Computing Techniques (ACTs) to reduce the computational complexity of ML algorithms. Then, we design custom Hardware Accelerators to improve the performance of the implementation within a specified budget. Finally, a tactile data processing application is adopted for the validation of the proposed exact and approximate embedded machine learning accelerators. The dissertation starts with the introduction of the various ML algorithms used for tactile data processing. These algorithms are assessed in terms of their computational complexity and the available hardware platforms which could be used for implementation. Afterward, a survey on the existing approximate computing techniques and hardware accelerators design methodologies is presented. Based on the findings of the survey, an approach for applying algorithmic-level ACTs on machine learning algorithms is provided. Then three novel hardware accelerators are proposed: (1) k-Nearest Neighbor (kNN) based on a selection-based sorter, (2) Tensorial Support Vector Machine (TSVM) based on Shallow Neural Networks, and (3) Hybrid Precision Binary Convolution Neural Network (BCNN). The three accelerators offer a real-time classification with monumental reductions in the hardware resources and power consumption compared to existing implementations targeting the same tactile data processing application on FPGA. Moreover, the approximate accelerators maintain a high classification accuracy with a loss of at most 5%

    Large-Block Multi-rate Streaming Sort

    Get PDF

    Hardware acceleration of the trace transform for vision applications

    Get PDF
    Computer Vision is a rapidly developing field in which machines process visual data to extract meaningful information. Digitised images in their pixels and bits serve no purpose of their own. It is only by interpreting the data, and extracting higher level information that a scene can be understood. The algorithms that enable this process are often complex, and data-intensive, limiting the processing rate when implemented in software. Hardware-accelerated implementations provide a significant performance boost that can enable real- time processing. The Trace Transform is a newly proposed algorithm that has been proven effective in image categorisation and recognition tasks. It is flexibly defined allowing the mathematical details to be tailored to the target application. However, it is highly computationally intensive, which limits its applications. Modern heterogeneous FPGAs provide an ideal platform for accelerating the Trace transform for real-time performance, while also allowing an element of flexibility, which highly suits the generality of the Trace transform. This thesis details the implementation of an extensible Trace transform architecture for vision applications, before extending this architecture to a full flexible platform suited to the exploration of Trace transform applications. As part of the work presented, a general set of architectures for large-windowed median and weighted median filters are presented as required for a number of Trace transform implementations. Finally an acceleration of Pseudo 2-Dimensional Hidden Markov Model decoding, usable in a person detection system, is presented. Such a system can be used to extract frames of interest from a video sequence, to be subsequently processed by the Trace transform. All these architectures emphasise the need for considered, platform-driven design in achieving maximum performance through hardware acceleration

    High-dimensional decoy-state quantum key distribution over 0.3 km of multicore telecommunication optical fibers

    Get PDF
    Multiplexing is a strategy to augment the transmission capacity of a communication system. It consists of combining multiple signals over the same data channel and it has been very successful in classical communications. However, the use of enhanced channels has only reached limited practicality in quantum communications (QC) as it requires the complex manipulation of quantum systems of higher dimensions. Considerable effort is being made towards QC using high-dimensional quantum systems encoded into the transverse momentum of single photons but, so far, no approach has been proven to be fully compatible with the existing telecommunication infrastructure. Here, we overcome such a technological challenge and demonstrate a stable and secure high-dimensional decoy-state quantum key distribution session over a 0.3 km long multicore optical fiber. The high-dimensional quantum states are defined in terms of the multiple core modes available for the photon transmission over the fiber, and the decoy-state analysis demonstrates that our technique enables a positive secret key generation rate up to 25 km of fiber propagation. Finally, we show how our results build up towards a high-dimensional quantum network composed of free-space and fiber based linksComment: Please see the complementary work arXiv:1610.01812 (2016

    Doctor of Philosophy

    Get PDF
    dissertationIn-memory big data applications are growing in popularity, including in-memory versions of the MapReduce framework. The move away from disk-based datasets shifts the performance bottleneck from slow disk accesses to memory bandwidth. MapReduce is a data-parallel application, and is therefore amenable to being executed on as many parallel processors as possible, with each processor requiring high amounts of memory bandwidth. We propose using Near Data Computing (NDC) as a means to develop systems that are optimized for in-memory MapReduce workloads, offering high compute parallelism and even higher memory bandwidth. This dissertation explores three different implementations and styles of NDC to improve MapReduce execution. First, we use 3D-stacked memory+logic devices to process the Map phase on compute elements in close proximity to database splits. Second, we attempt to replicate the performance characteristics of the 3D-stacked NDC using only commodity memory and inexpensive processors to improve performance of both Map and Reduce phases. Finally, we incorporate fixed-function hardware accelerators to improve sorting performance within the Map phase. This dissertation shows that it is possible to improve in-memory MapReduce performance by potentially two orders of magnitude by designing system and memory architectures that are specifically tailored to that end

    Estimation of power generation potential of nonwoody biomass species

    Get PDF
    In view of high energy potentials in non-woody biomass species and an increasing interest in their utilization for power generation, an attempt has been made in this study to assess the proximate analysis and energy content of different components of Sida rhombifolia, Xanthium strumarium, Anisomeles lamiaceae and Eupatorium coelestinum biomass species (both non-woody), and their impact on power generation and land requirement for energy plantations. The net energy content in Sida is the highest. Xanthium biomass species appears to have slightly higher calorific values in its components than those of Anisomeles and Eupatorium. The pattern of variation of calorific value in the components like stump, branch, leaf and bark is not identical for all the presently studied biomass species. In all these studied biomass species, the calorific values of leaves, in general, are after stump and branch. The data for proximate and ultimate analysis of the components of these species are very close to each other and hence it is very difficult to draw a concrete conclusion. However, it appears from the present work that Eupatorium and Xanthium biomass species have the highest fixed carbon and lowest volatile matter contents in their stumps than the stumps of the others. As for ash fusion temperature The Sida biomass species has the highest values of IDT, ST, HT and FT (7860- 1490˚C) for its ash, followed by Anisomeles (740-1441˚C). Xanthium and Eupatorium have lower values for IDT, ST, HT and FT (670- 1244˚C) for their ashes. The results have shown that approximately 4, 7, 6 and 2 hectares of land are required to generate 20,000 kWh/day electricity from Sida rhombifolia, Eupatorium coelestinum, Xanthium strumarium and Anisomeles lamiaceae biomass species. Coal samples, obtained from six different local mines, were also examined for their qualities and the results were compared with those of studied biomass materials. This comparison reveals much higher power output with negligible emission of suspended particulate matters (SPM) from biomass materials

    A low power selective median filter design

    Get PDF
    A selective median filter which consumes less power has been designed and different logics for majority bit evaluation has been applied and simulated in VHDL .It is rightly called as selective because an edge pixel detector [2] has been used to select those pixels which are to be processed through median filter. As for median value calculation; sorting of 3 x 3 window’s pixel values has been done using majority bit circuit [4].Different majority bit calculation method has been implemented and the result sorting circuit has been analyzed for power analysis. In this work a general median filter which uses binary sorting method known as Majority Voting Circuit (MVC) has been designed using VHDL and optimized using SYNOPSIS which has used 0.13μm CMOS technology .The digital design of sorting circuit saves approximately 60% of power comprising of cell leakage and dynamic power comparing to a mixed signal design of Floating gate based Majority bit median filter [4]. Before operating median filter on each pixel double derivative filter [2] has been applied to check whether it is an edge pixel or not. Overall this is a digital design of a mixed filter which preserves edges and removes noises as well.Low power techniques at logic level and algorithmic level have been embedded into this work. In our work we have also designed a small microprocessor using VHDL code. Later a memory (for the purpose of image storing) based Control Unit for single median value evaluation has been designed and simulated in XILINX. Here for sorting circuit a common logic based circuit (component) has been put forward. The power, latency or delay, area of whole design has been compared and tested with other designs
    corecore