23 research outputs found

    130nm CMOS SAR-ADC with Low Complexity Digital Control Logic

    Get PDF
    This paper reports on an original approach to design the digital control logic of a Successive Approximation Register Analog to Digital Converter, where no sequencers or code registers are used. It turns out a low complexity digital circuitry, which is applied to the design of a 130nm CMOS 8-bit SAR ADC. The simulations demonstrate that the proposed digital control logic correctly works leading to an Analog to Digital Converter exhibiting performances well aligned with the literature in terms of linearity, dissipated power, and energy spent per bit generation

    Design of Intellectual Property-Based Hardware Blocks Integrable with Embedded RISC Processors

    Get PDF
    The main focus of this thesis is to research methods, architecture, and implementation of hardware acceleration for a Reduced Instruction Set Computer (RISC) platform. The target platform is a single-core general-purpose embedded processor (the COFFEE core) which was developed by our group at Tampere University of Technology. The COFFEE core alone cannot meet the requirements of the modern applications due to the lack of several components of which the Memory Management Unit (MMU) is one of the prominent ones. Since the MMU is one of the main requirements of today’s processors, COFFEE with no MMU was not able to run an operating system. In the design of the MMU, we employed two additional micro-Translation-Lookaside Buffers (TLBs) to speed up the translation process, as well as minimizing congestions of the data/instruction address translations with a unified TLB. The MMU is tightly-coupled with the COFFEE RISC core through the Peripheral Control Block (PCB) interface of the core. The hardware implementation, alongside some optimization techniques and post synthesis results are presented, as well.Another intention of this work is to prepare a reconfigurable platform to send and receive data packets of the next generation wireless communications. Hence, we will further discuss a recently emerged wireless modulation technique known as Non-Contiguous Orthogonal Frequency Division Multiplexing (NC-OFDM), a promising technique to alleviate spectrum scarcity problem. However, one of the primary concerns in such systems is the synchronization. To that end, we developed a reconfigurable hardware component to perform as a synchronizer. The developed module exploits Partial Reconfiguration (PR) feature in order to reconfigure itself. Eventually, we will come up with several architectural choices for systems with different limiting factors such as power consumption, operating frequency, and silicon area. The synchronizer can be loosely-coupled via one of the available co-processor slots of the target processor, the COFFEE RISC core.In addition, we are willing to improve the versatility of the COFFEE core even in industrial use cases. Hence, we developed a reconfigurable hardware component capable of operating in the Controller Area Network (CAN) protocol. In the first step of this implementation, we mainly concentrate on receiving, decoding, and extracting the data segment of a CAN-based packet. Moreover, this hardware block can reconfigure itself on-the-fly to operate on different data frames. More details regarding hardware implementation issues, as well as post synthesis results are also presented. The CAN module is loosely-coupled with the COFFEE RISC processor through one of the available co-processor block

    Design and development from single core reconfigurable accelerators to a heterogeneous accelerator-rich platform

    Get PDF
    The performance of a platform is evaluated based on its ability to deal with the processing of multiple applications of different nature. In this context, the platform under evaluation can be of homogeneous, heterogeneous or of hybrid architecture. The selection of an architecture type is generally based on the set of different target applications and performance parameters, where the applications can be of serial or parallel nature. The evaluation is normally based on different performance metrics, e.g., resource/area utilization, execution time, power and energy consumption. This process can also include high-level performance metrics, e.g., Operations Per Second (OPS), OPS/Watt, OPS/Hz, Watt/Area etc. An example of architecture selection can be related to a wireless communication system where the processing of computationally-intensive signal-processing algorithms has strict execution-time constraints and in this case, a platform with special-purpose accelerators is relatively more suitable than a typical homogeneous platform. A couple of decades ago, it was expensive to plant many special-purpose accelerators on a chip as the cost per unit area was relatively higher than today. The utilization wall is also becoming a limiting factor in homogeneous multicore scaling which means that all the cores on a platform cannot be operated at their maximum frequency due to a possible thermal meltdown. In this case, some of the processing cores have to be turned-off or to be operated at very low frequencies making most of the part of the chip to stay underutilized. A possible solution lies in the use of heterogeneous multicore platforms where many application-specific cores operate at lower frequencies, therefore reducing power dissipation density and increasing other performance parameters. However, to achieve maximum flexibility in processing, a general-purpose flavor can also be introduced by adding a few Reduced Instruction-Set Computing (RISC) cores. A power class of heterogeneous multicore platforms is an accelerator-rich platform where many application-specific accelerators are loosely connected with each other for work load distribution or to execute the tasks independently. This research work spans from the design and development of three different types of template-based Coarse-Grain Reconfigurable Arrays (CGRAs), i.e., CREMA, AVATAR and SCREMA to a Heterogeneous Accelerator-Rich Platform (HARP). The accelerators generated from the three CGRAs could perform different lengths and types of Fast Fourier Transform (FFT), real and complex Matrix-Vector Multiplication (MVM) algorithms. CREMA and AVATAR were fixed CGRAs with eight and sixteen number of Processing Element (PE) columns, respectively. SCREMA could flex between four, eight, sixteen and thirty two number of PE columns. Many case studies were conducted to evaluate the performance of the reconfigurable accelerators generated from these CGRA templates. All of these CGRAs work in a processor/coprocessor model tightly integrated with a Direct Memory Access (DMA) device. Apart from these platforms, a reconfigurable Application-Specific Instruction-set Processor (rASIP) is also designed, tested for FFT execution under IEEE-802.11n timing constraints and evaluated against a processor/coprocessor model. It was designed by integrating AVATAR generated radix-(2, 4) FFT accelerator into the datapath of a RISC processor. The instruction set of the RISC processor was extended to perform additional operations related to AVATAR. As mentioned earlier, the underutilized part of the chip, now-a-days called Dark Silicon is posing many challenges for the designers. Apart from software optimizations, clock gating, dynamic voltage/frequency scaling and other high-level techniques, one way of dealing with this problem is to use many application-specific cores. In an effort to maximize the number of reconfigurable processing resources on a platform, the accelerator-rich architecture HARP was designed and evaluated in terms of different performance metrics. HARP is constructed on a Network-on-Chip (NoC) of 3x3 nodes where with every node, a CGRA of application-specific size is integrated other than the central node which is attached to a RISC processor. The RISC establishes synchronization between the nodes for data transfer and also performs the supervisory control. While using the NoC as the backbone of communication between the cores, it becomes possible for all the cores to address each other and also perform execution simultaneously and independently of each other. The performance of accelerators generated from CREMA, AVATAR and SCREMA templates were evaluated individually and also when attached to HARP's NoC nodes. The individual CGRAs show promising results in their own capacity but when integrated all together in the framework of HARP, interesting comparisons were established in terms of overall execution times, resource utilization, operating frequencies, power and energy consumption. In evaluating HARP, estimates and measurements were also made in some advanced performance metrics, e.g., in MOPS/mW and MOPS/MHz. The overall research work promotes the idea of heterogeneous accelerator-rich platform as a solution to current problems and future needs of industry and academia

    Power and Energy Aware Heterogeneous Computing Platform

    Get PDF
    During the last decade, wireless technologies have experienced significant development, most notably in the form of mobile cellular radio evolution from GSM to UMTS/HSPA and thereon to Long-Term Evolution (LTE) for increasing the capacity and speed of wireless data networks. Considering the real-time constraints of the new wireless standards and their demands for parallel processing, reconfigurable architectures and in particular, multicore platforms are part of the most successful platforms due to providing high computational parallelism and throughput. In addition to that, by moving toward Internet-of-Things (IoT), the number of wireless sensors and IP-based high throughput network routers is growing at a rapid pace. Despite all the progression in IoT, due to power and energy consumption, a single chip platform for providing multiple communication standards and a large processing bandwidth is still missing.The strong demand for performing different sets of operations by the embedded systems and increasing the computational performance has led to the use of heterogeneous multicore architectures with the help of accelerators for computationally-intensive data-parallel tasks acting as coprocessors. Currently, highly heterogeneous systems are the most power-area efficient solution for performing complex signal processing systems. Additionally, the importance of IoT has increased significantly the need for heterogeneous and reconfigurable platforms.On the other hand, subsequent to the breakdown of the Dennardian scaling and due to the enormous heat dissipation, the performance of a single chip was obstructed by the utilization wall since all cores cannot be clocked at their maximum operating frequency. Therefore, a thermal melt-down might be happened as a result of high instantaneous power dissipation. In this context, a large fraction of the chip, which is switched-off (Dark) or operated at a very low frequency (Dim) is called Dark Silicon. The Dark Silicon issue is a constraint for the performance of computers, especially when the up-coming IoT scenario will demand a very high performance level with high energy efficiency. Among the suggested solution to combat the problem of Dark-Silicon, the use of application-specific accelerators and in particular Coarse-Grained Reconfigurable Arrays (CGRAs) are the main motivation of this thesis work.This thesis deals with design and implementation of Software Defined Radio (SDR) as well as High Efficiency Video Coding (HEVC) application-specific accelerators for computationally intensive kernels and data-parallel tasks. One of the most important data transmission schemes in SDR due to its ability of providing high data rates is Orthogonal Frequency Division Multiplexing (OFDM). This research work focuses on the evaluation of Heterogeneous Accelerator-Rich Platform (HARP) by implementing OFDM receiver blocks as designs for proof-of-concept. The HARP template allows the designer to instantiate a heterogeneous reconfigurable platform with a very large amount of custom-tailored computational resources while delivering a high performance in terms of many high-level metrics. The availability of this platform lays an excellent foundation to investigate techniques and methods to replace the Dark or Dim part of chip with high-performance silicon dissipating very low power and energy. Furthermore, this research work is also addressing the power and energy issues of the embedded computing systems by tailoring the HARP for self-aware and energy-aware computing models. In this context, the instantaneous power dissipation and therefore the heat dissipation of HARP are mitigated on FPGA/ASIC by using Dynamic Voltage and Frequency Scaling (DVFS) to minimize the dark/dim part of the chip. Upgraded HARP for self-aware and energy-aware computing can be utilized as an energy-efficient general-purpose transceiver platform that is cognitive to many radio standards and can provide high throughput while consuming as little energy as possible. The evaluation of HARP has shown promising results, which makes it a suitable platform for avoiding Dark Silicon in embedded computing platforms and also for diverse needs of IoT communications.In this thesis, the author designed the blocks of OFDM receiver by crafting templatebased CGRA devices and then attached them to HARP’s Network-on-Chip (NoC) nodes. The performance of application-specific accelerators generated from templatebased CGRAs, the performance of the entire platform subsequent to integrating the CGRA nodes on HARP and the NoC traffic are recorded in terms of several highlevel performance metrics. In evaluating HARP on FPGA prototype, it delivers a performance of 0.012 GOPS/mW. Because of the scalability and regularity in HARP, the author considered its value as architectural constant. In addition to showing the gain and the benefits of maximizing the number of reconfigurable processing resources on a platform in comparison to the scaled performance of several state-of-the-art platforms, HARP’s architectural constant ensures application-independent figure of merit. HARP is further evaluated by implementing various sizes of Discrete Cosine transform (DCT) and Discrete Sine Transform (DST) dedicated for HEVC standard, which showed its ability to sustain Full HD 1080p format at 30 fps on FPGA. The author also integrated self-aware computing model in HARP to mitigate the power dissipation of an OFDM receiver. In the case of FPGA implementation, the total power dissipation of the platform showed 16.8% reduction due to employing the Feedback Control System (FCS) technique with Dynamic Frequency Scaling (DFS). Furthermore, by moving to ASIC technology and scaling both frequency and voltage simultaneously, significant dynamic power reduction (up to 82.98%) was achieved, which proved the DFS/DVFS techniques as one step forward to mitigate the Dark Silicon issue

    SMARAD - Centre of Excellence in Smart Radios and Wireless Research - Activity Report 2011 - 2013

    Get PDF
    Centre of Excellence in Smart Radios and Wireless Research (SMARAD), originally established with the name Smart and Novel Radios Research Unit, is aiming at world-class research and education in Future radio and antenna systems, Cognitive radio, Millimetre wave and THz techniques, Sensors, and Materials and energy, using its expertise in RF, microwave and millimeter wave engineering, in integrated circuit design for multi-standard radios as well as in wireless communications. SMARAD has the Centre of Excellence in Research status from the Academy of Finland since 2002 (2002-2007 and 2008-2013). Currently SMARAD consists of five research groups from three departments, namely the Department of Radio Science and Engineering, Department of Micro and Nanosciences, and Department of Signal Processing and Acoustics, all within the Aalto University School of Electrical Engineering. The total number of employees within the research unit is about 100 including 8 professors, about 30 senior scientists and about 40 graduate students and several undergraduate students working on their Master thesis. The relevance of SMARAD to the Finnish society is very high considering the high national income from exports of telecommunications and electronics products. The unit conducts basic research but at the same time maintains close co-operation with industry. Novel ideas are applied in design of new communication circuits and platforms, transmission techniques and antenna structures. SMARAD has a well-established network of co-operating partners in industry, research institutes and academia worldwide. It coordinates a few EU projects. The funding sources of SMARAD are diverse including the Academy of Finland, EU, ESA, Tekes, and Finnish and foreign telecommunications and semiconductor industry. As a by-product of this research SMARAD provides highest-level education and supervision to graduate students in the areas of radio engineering, circuit design and communications through Aalto University and Finnish graduate schools. During years 2011 – 2013, 18 doctor degrees were awarded to the students of SMARAD. In the same period, the SMARAD researchers published 197 refereed journal articles and 360 conference papers

    Design and Silicon Area Optimization of Time-Domain GNSS Receiver Baseband Architectures

    Get PDF
    The use of Global Navigation Satellite Systems (GNSSs) in a wide range of portable devices has exploded in the recent years. Demands for a lower cost while expecting longer battery life and better performance are constantly increasing. The general GNSS receiver operation and algorithms are already well studied in the literature, but the hardware architectures and designs have not been discussed in detail.This thesis introduces a high level gate count estimation method that provides good accuracy without requiring the hardware being fully specified. It is based on developing hierarchical models, which are parameterizable, while requiring minimal amount of information about the silicon technology used for the implementation. The average accuracy has been shown to be 4%.Three time-domain, real-time GNSS receiver baseband architectures are described with a discussion about various optimization methods for efficient implementation: the correlator, the matched filter, and the group correlator, which is a new architecture combining some of the features of the two first ones.Four use cases are defined for different GNSS operating modes: Acquisition, tracking, assisted GNSS, and the combination of the first three modes. A comparison is made for receiver basebands including all necessary blocks for full functionality to find out which of the three architectures provides the most silicon area efficient implementation.It is shown that the correlator offers good flexibility, but yields the highest silicon area for acquisition use cases. The matched filter is best suited for the acquisition, but has large overhead when it comes to tracking the signals. The group correlator offers a reasonably good flexibility and area efficiency in all use cases.The main contributions of the thesis are: Development of domain specific optimizations for GNSS receivers and an accurate gate count estimation method, which are applied for a quantitative comparison of different GNSS receiver architectures. The results show that no single architecture excels in all cases, and the best choice depends on the actual use case

    Architecture and network-on-chip implementation of a new hierarchical interconnection network

    Get PDF
    A Midimew-connected Mesh Network (MMN) is a minimal distance mesh with wrap-around links network of multiple basic modules (BMs), in which the BMs are 2D-mesh networks that are hierarchically interconnected for higher-level networks. In this paper, we present the architecture of the MMN, addressing of node, routing of message, and evaluate the static network performance of MMN, TESH, mesh and torus networks. In addition, we propose the network-on-chip (NoC) implementation of MMN. With innovative combination of diagonal and hierarchical structure, the MMN possesses several attractive features, including constant degree, small diameter, low cost, small average distance, moderate bisection width and high fault tolerant performance than that of other conventional and hierarchical interconnection networks. The simple architecture of MMN is also highly suitable for NoC implementation. To implement all the links of level-3 MMN, only four layers are needed which is feasible with current and future VLSI technologies

    Transaction Generator - Tool for Network-on-Chip Benchmarking

    Get PDF
    This thesis focuses on benchmarking on-chip communication networks. Multiprocessor System-on-Chips (MP-SoC) utilizing the Network-on-Chip (NoC) paradigm are becoming more prominent. Required communication capabilities differ considerably among the diverse set of application categories. A standard and commonly used benchmarking methodology for networks is needed to ease finding a suitable network topology and its configuration parameters for those applications. This thesis presents a simulation based tool Transaction Generator (TG) for evaluating NoCs. TG conforms the Open Core Protocol - International Partnership (OCP-IP) NoC benchmarking group's proposed methodology. TG relies on abstract task graphs made after real Multiprocessor System-on-Chip (MP-SoC) applications or synthetic test cases. TG simulates the workload tasks on Processing Elements (PEs) and generates the network traffc accordingly and collects statistics. TG was initially introduced in 2003 and this thesis presents the current state of the tool and the modifications made. The work for this thesis includes refactoring the whole program from a TCL and C++ SystemC 1 based code generator to a C++ SystemC 2 based dynamic simulation kernel. In addition to the refactoring, new features were implemented, such as memory modeling with the Accurate Dynamic Random Access Memory (DRAM) Model (ADM) package, possibility of simulating Mobile Computing System Lab (MCSL) NoC Tra c Patterns workload models and diversity to modeling the workload. The current implementation of TG consists of 10k lines of code for the simulator core, the result of this thesis, and 50k lines of code for the support programs and example NoC models. Thesis presents 3 example use cases requiring around 100 simulations, which can be executed and analyzed in a work day with the TG

    Design and Implementation of MIMO OFDM IEEE802.11n Receiver Blocks on Heterogeneous Multicore Architecture

    Get PDF
    In this thesis, the performance of a heterogeneous multicore platform in terms of technical capability is evaluated. Therefore, the choice of architecture in general can be based on a set of diverse applications. Selected applications can be parallel or serial in nature. Applications evaluation are often based on various performance metrics including the resource utilization and execution time. The wireless communication systems are expanded to accelerate their functions execution in both software and hardware. The embedded systems which involve several types of communication systems perform a large number of computations which require short execution time and minimized power consumption. Also, there is a growing demand for application-specific accelerators aiding general-purpose. One feasible way is to use heterogeneous multi-core platforms. Furthermore, many application-specific accelerators are loosely connected with each other. In this study, the implementation of Multiple-Input Multiple-Output (MIMO) Orthogonal Frequency Division Multiplexing (OFDM) receiver is evaluated by applying a Heterogeneous Multicore Architecture (HMA). The MIMO OFDM receiver is composed of computationally intensive and general-purpose processing tasks and can serve maximum coverage for evaluation of the HMA. The receiver blocks are designed by crafting template-based Coarse-grained Reconfigurable Array (CGRA) devices. In this case study, four streams (antennas) are proposed in order to process the data over CGRAs simultaneously. HMA nodes will be reconfigured at run-time in different blocks of the receiver. In this experimental work, according to the performance of each CGRA, the collective performance of the entire platform as well as NoC traffic is recorded considering the number of clock cycles and also several high-level performance criteria. The implementation of OFDM receiver scaled CGRAs to various dimensions. The data can also be exchanged between diverse nodes on the NoC structure by utilizing direct memory access (DMA) devices independently

    Design for energy-efficient and reliable fog-assisted healthcare IoT systems

    Get PDF
    Cardiovascular disease and diabetes are two of the most dangerous diseases as they are the leading causes of death in all ages. Unfortunately, they cannot be completely cured with the current knowledge and existing technologies. However, they can be effectively managed by applying methods of continuous health monitoring. Nonetheless, it is difficult to achieve a high quality of healthcare with the current health monitoring systems which often have several limitations such as non-mobility support, energy inefficiency, and an insufficiency of advanced services. Therefore, this thesis presents a Fog computing approach focusing on four main tracks, and proposes it as a solution to the existing limitations. In the first track, the main goal is to introduce Fog computing and Fog services into remote health monitoring systems in order to enhance the quality of healthcare. In the second track, a Fog approach providing mobility support in a real-time health monitoring IoT system is proposed. The handover mechanism run by Fog-assisted smart gateways helps to maintain the connection between sensor nodes and the gateways with a minimized latency. Results show that the handover latency of the proposed Fog approach is 10%-50% less than other state-of-the-art mobility support approaches. In the third track, the designs of four energy-efficient health monitoring IoT systems are discussed and developed. Each energy-efficient system and its sensor nodes are designed to serve a specific purpose such as glucose monitoring, ECG monitoring, or fall detection; with the exception of the fourth system which is an advanced and combined system for simultaneously monitoring many diseases such as diabetes and cardiovascular disease. Results show that these sensor nodes can continuously work, depending on the application, up to 70-155 hours when using a 1000 mAh lithium battery. The fourth track mentioned above, provides a Fog-assisted remote health monitoring IoT system for diabetic patients with cardiovascular disease. Via several proposed algorithms such as QT interval extraction, activity status categorization, and fall detection algorithms, the system can process data and detect abnormalities in real-time. Results show that the proposed system using Fog services is a promising approach for improving the treatment of diabetic patients with cardiovascular disease
    corecore