1,959 research outputs found

    A proxy for reliable 5G (and beyond) mmWave communications. Contributions to multi-path scheduling for a reliability focused mmWave proxy

    Get PDF
    Reliable, consistent and very high data rate mobile communication will become especially important for future services such as, among other things, future emergency communication needs. MmWave technology provides the needed capacity, however, lacks the reliability due to the abrupt capacity changes any one path experiences. Intelligently making use of varying numbers of available mmWave paths, efficiently scheduling data across the paths, perhaps even through multi-operator agreements; and balancing mobile power consumption with path costs and the need for reliable consistent quality will be critical to attaining this aim. In this thesis, the multipath scheduling problem in a mmWave proxy when the paths have dynamically changing path characteristics is considered. To address this problem, a hybrid scheduler is proposed, the performance of which is compared with the Round Robin scheduler, Random scheduler and the Highest Capacity First scheduler. Forward error correction is explored as a means of enhancing the scheduling. Keywords:Multipath Scheduling, mmWave Proxy, Forward Error Correction, beyond 5G

    Hardware Considerations for Signal Processing Systems: A Step Toward the Unconventional.

    Full text link
    As we progress into the future, signal processing algorithms are becoming more computationally intensive and power hungry while the desire for mobile products and low power devices is also increasing. An integrated ASIC solution is one of the primary ways chip developers can improve performance and add functionality while keeping the power budget low. This work discusses ASIC hardware for both conventional and unconventional signal processing systems, and how integration, error resilience, emerging devices, and new algorithms can be leveraged by signal processing systems to further improve performance and enable new applications. Specifically this work presents three case studies: 1) a conventional and highly parallel mix signal cross-correlator ASIC for a weather satellite performing real-time synthetic aperture imaging, 2) an unconventional native stochastic computing architecture enabled by memristors, and 3) two unconventional sparse neural network ASICs for feature extraction and object classification. As improvements from technology scaling alone slow down, and the demand for energy efficient mobile electronics increases, such optimization techniques at the device, circuit, and system level will become more critical to advance signal processing capabilities in the future.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/116685/1/knagphil_1.pd

    Performance and Memory Space Optimizations for Embedded Systems

    Get PDF
    Embedded systems have three common principles: real-time performance, low power consumption, and low price (limited hardware). Embedded computers use chip multiprocessors (CMPs) to meet these expectations. However, one of the major problems is lack of efficient software support for CMPs; in particular, automated code parallelizers are needed. The aim of this study is to explore various ways to increase performance, as well as reducing resource usage and energy consumption for embedded systems. We use code restructuring, loop scheduling, data transformation, code and data placement, and scratch-pad memory (SPM) management as our tools in different embedded system scenarios. The majority of our work is focused on loop scheduling. Main contributions of our work are: We propose a memory saving strategy that exploits the value locality in array data by storing arrays in a compressed form. Based on the compressed forms of the input arrays, our approach automatically determines the compressed forms of the output arrays and also automatically restructures the code. We propose and evaluate a compiler-directed code scheduling scheme, which considers both parallelism and data locality. It analyzes the code using a locality parallelism graph representation, and assigns the nodes of this graph to processors.We also introduce an Integer Linear Programming based formulation of the scheduling problem. We propose a compiler-based SPM conscious loop scheduling strategy for array/loop based embedded applications. The method is to distribute loop iterations across parallel processors in an SPM-conscious manner. The compiler identifies potential SPM hits and misses, and distributes loop iterations such that the processors have close execution times. We present an SPM management technique using Markov chain based data access. We propose a compiler directed integrated code and data placement scheme for 2-D mesh based CMP architectures. Using a Code-Data Affinity Graph (CDAG) to represent the relationship between loop iterations and array data, it assigns the sets of loop iterations to processing cores and sets of data blocks to on-chip memories. We present a memory bank aware dynamic loop scheduling scheme for array intensive applications.The goal is to minimize the number of memory banks needed for executing the group of loop iterations

    Novel LDPC coding and decoding strategies: design, analysis, and algorithms

    Get PDF
    In this digital era, modern communication systems play an essential part in nearly every aspect of life, with examples ranging from mobile networks and satellite communications to Internet and data transfer. Unfortunately, all communication systems in a practical setting are noisy, which indicates that we can either improve the physical characteristics of the channel or find a possible systematical solution, i.e. error control coding. The history of error control coding dates back to 1948 when Claude Shannon published his celebrated work “A Mathematical Theory of Communication”, which built a framework for channel coding, source coding and information theory. For the first time, we saw evidence for the existence of channel codes, which enable reliable communication as long as the information rate of the code does not surpass the so-called channel capacity. Nevertheless, in the following 60 years none of the codes have been proven closely to approach the theoretical bound until the arrival of turbo codes and the renaissance of LDPC codes. As a strong contender of turbo codes, the advantages of LDPC codes include parallel implementation of decoding algorithms and, more crucially, graphical construction of codes. However, there are also some drawbacks to LDPC codes, e.g. significant performance degradation due to the presence of short cycles or very high decoding latency. In this thesis, we will focus on the practical realisation of finite-length LDPC codes and devise algorithms to tackle those issues. Firstly, rate-compatible (RC) LDPC codes with short/moderate block lengths are investigated on the basis of optimising the graphical structure of the tanner graph (TG), in order to achieve a variety of code rates (0.1 < R < 0.9) by only using a single encoder-decoder pair. As is widely recognised in the literature, the presence of short cycles considerably reduces the overall performance of LDPC codes which significantly limits their application in communication systems. To reduce the impact of short cycles effectively for different code rates, algorithms for counting short cycles and a graph-related metric called Extrinsic Message Degree (EMD) are applied with the development of the proposed puncturing and extension techniques. A complete set of simulations are carried out to demonstrate that the proposed RC designs can largely minimise the performance loss caused by puncturing or extension. Secondly, at the decoding end, we study novel decoding strategies which compensate for the negative effect of short cycles by reweighting part of the extrinsic messages exchanged between the nodes of a TG. The proposed reweighted belief propagation (BP) algorithms aim to implement efficient decoding, i.e. accurate signal reconstruction and low decoding latency, for LDPC codes via various design methods. A variable factor appearance probability belief propagation (VFAP-BP) algorithm is proposed along with an improved version called a locally-optimized reweighted (LOW)-BP algorithm, both of which can be employed to enhance decoding performance significantly for regular and irregular LDPC codes. More importantly, the optimisation of reweighting parameters only takes place in an offline stage so that no additional computational complexity is required during the real-time decoding process. Lastly, two iterative detection and decoding (IDD) receivers are presented for multiple-input multiple-output (MIMO) systems operating in a spatial multiplexing configuration. QR decomposition (QRD)-type IDD receivers utilise the proposed multiple-feedback (MF)-QRD or variable-M (VM)-QRD detection algorithm with a standard BP decoding algorithm, while knowledge-aided (KA)-type receivers are equipped with a simple soft parallel interference cancellation (PIC) detector and the proposed reweighted BP decoders. In the uncoded scenario, the proposed MF-QRD and VM-QRD algorithms are shown to approach optimal performance, yet require a reduced computational complexity. In the LDPC-coded scenario, simulation results have illustrated that the proposed QRD-type IDD receivers can offer near-optimal performance after a small number of detection/decoding iterations and the proposed KA-type IDD receivers significantly outperform receivers using alternative decoding algorithms, while requiring similar decoding complexity

    Channel Coding in Molecular Communication

    Get PDF
    This dissertation establishes and analyzes a complete molecular transmission system from a communication engineering perspective. Its focus is on diffusion-based molecular communication in an unbounded three-dimensional fluid medium. As a basis for the investigation of transmission algorithms, an equivalent discrete-time channel model (EDTCM) is developed and the characterization of the channel is described by an analytical derivation, a random walk based simulation, a trained artificial neural network (ANN), and a proof of concept testbed setup. The investigated transmission algorithms cover modulation schemes at the transmitter side, as well as channel equalizers and detectors at the receiver side. In addition to the evaluation of state-of-the-art techniques and the introduction of orthogonal frequency-division multiplexing (OFDM), the novel variable concentration shift keying (VCSK) modulation adapted to the diffusion-based transmission channel, the lowcomplex adaptive threshold detector (ATD) working without explicit channel knowledge, the low-complex soft-output piecewise linear detector (PLD), and the optimal a posteriori probability (APP) detector are of particular importance and treated. To improve the error-prone information transmission, block codes, convolutional codes, line codes, spreading codes and spatial codes are investigated. The analysis is carried out under various approaches of normalization and gains or losses compared to the uncoded transmission are highlighted. In addition to state-of-the-art forward error correction (FEC) codes, novel line codes adapted to the error statistics of the diffusion-based channel are proposed. Moreover, the turbo principle is introduced into the field of molecular communication, where extrinsic information is exchanged iteratively between detector and decoder. By means of an extrinsic information transfer (EXIT) chart analysis, the potential of the iterative processing is shown and the communication channel capacity is computed, which represents the theoretical performance limit for the system under investigation. In addition, the construction of an irregular convolutional code (IRCC) using the EXIT chart is presented and its performance capability is demonstrated. For the evaluation of all considered transmission algorithms the bit error rate (BER) performance is chosen. The BER is determined by means of Monte Carlo simulations and for some algorithms by theoretical derivation

    Design Techniques for Energy-Quality Scalable Digital Systems

    Get PDF
    Energy efficiency is one of the key design goals in modern computing. Increasingly complex tasks are being executed in mobile devices and Internet of Things end-nodes, which are expected to operate for long time intervals, in the orders of months or years, with the limited energy budgets provided by small form-factor batteries. Fortunately, many of such tasks are error resilient, meaning that they can toler- ate some relaxation in the accuracy, precision or reliability of internal operations, without a significant impact on the overall output quality. The error resilience of an application may derive from a number of factors. The processing of analog sensor inputs measuring quantities from the physical world may not always require maximum precision, as the amount of information that can be extracted is limited by the presence of external noise. Outputs destined for human consumption may also contain small or occasional errors, thanks to the limited capabilities of our vision and hearing systems. Finally, some computational patterns commonly found in domains such as statistics, machine learning and operational research, naturally tend to reduce or eliminate errors. Energy-Quality (EQ) scalable digital systems systematically trade off the quality of computations with energy efficiency, by relaxing the precision, the accuracy, or the reliability of internal software and hardware components in exchange for energy reductions. This design paradigm is believed to offer one of the most promising solutions to the impelling need for low-energy computing. Despite these high expectations, the current state-of-the-art in EQ scalable design suffers from important shortcomings. First, the great majority of techniques proposed in literature focus only on processing hardware and software components. Nonetheless, for many real devices, processing contributes only to a small portion of the total energy consumption, which is dominated by other components (e.g. I/O, memory or data transfers). Second, in order to fulfill its promises and become diffused in commercial devices, EQ scalable design needs to achieve industrial level maturity. This involves moving from purely academic research based on high-level models and theoretical assumptions to engineered flows compatible with existing industry standards. Third, the time-varying nature of error tolerance, both among different applications and within a single task, should become more central in the proposed design methods. This involves designing “dynamic” systems in which the precision or reliability of operations (and consequently their energy consumption) can be dynamically tuned at runtime, rather than “static” solutions, in which the output quality is fixed at design-time. This thesis introduces several new EQ scalable design techniques for digital systems that take the previous observations into account. Besides processing, the proposed methods apply the principles of EQ scalable design also to interconnects and peripherals, which are often relevant contributors to the total energy in sensor nodes and mobile systems respectively. Regardless of the target component, the presented techniques pay special attention to the accurate evaluation of benefits and overheads deriving from EQ scalability, using industrial-level models, and on the integration with existing standard tools and protocols. Moreover, all the works presented in this thesis allow the dynamic reconfiguration of output quality and energy consumption. More specifically, the contribution of this thesis is divided in three parts. In a first body of work, the design of EQ scalable modules for processing hardware data paths is considered. Three design flows are presented, targeting different technologies and exploiting different ways to achieve EQ scalability, i.e. timing-induced errors and precision reduction. These works are inspired by previous approaches from the literature, namely Reduced-Precision Redundancy and Dynamic Accuracy Scaling, which are re-thought to make them compatible with standard Electronic Design Automation (EDA) tools and flows, providing solutions to overcome their main limitations. The second part of the thesis investigates the application of EQ scalable design to serial interconnects, which are the de facto standard for data exchanges between processing hardware and sensors. In this context, two novel bus encodings are proposed, called Approximate Differential Encoding and Serial-T0, that exploit the statistical characteristics of data produced by sensors to reduce the energy consumption on the bus at the cost of controlled data approximations. The two techniques achieve different results for data of different origins, but share the common features of allowing runtime reconfiguration of the allowed error and being compatible with standard serial bus protocols. Finally, the last part of the manuscript is devoted to the application of EQ scalable design principles to displays, which are often among the most energy- hungry components in mobile systems. The two proposals in this context leverage the emissive nature of Organic Light-Emitting Diode (OLED) displays to save energy by altering the displayed image, thus inducing an output quality reduction that depends on the amount of such alteration. The first technique implements an image-adaptive form of brightness scaling, whose outputs are optimized in terms of balance between power consumption and similarity with the input. The second approach achieves concurrent power reduction and image enhancement, by means of an adaptive polynomial transformation. Both solutions focus on minimizing the overheads associated with a real-time implementation of the transformations in software or hardware, so that these do not offset the savings in the display. For each of these three topics, results show that the aforementioned goal of building EQ scalable systems compatible with existing best practices and mature for being integrated in commercial devices can be effectively achieved. Moreover, they also show that very simple and similar principles can be applied to design EQ scalable versions of different system components (processing, peripherals and I/O), and to equip these components with knobs for the runtime reconfiguration of the energy versus quality tradeoff

    Reconfigurable architectures for beyond 3G wireless communication systems

    Get PDF

    Hardware Acceleration of Deep Convolutional Neural Networks on FPGA

    Get PDF
    abstract: The rapid improvement in computation capability has made deep convolutional neural networks (CNNs) a great success in recent years on many computer vision tasks with significantly improved accuracy. During the inference phase, many applications demand low latency processing of one image with strict power consumption requirement, which reduces the efficiency of GPU and other general-purpose platform, bringing opportunities for specific acceleration hardware, e.g. FPGA, by customizing the digital circuit specific for the deep learning algorithm inference. However, deploying CNNs on portable and embedded systems is still challenging due to large data volume, intensive computation, varying algorithm structures, and frequent memory accesses. This dissertation proposes a complete design methodology and framework to accelerate the inference process of various CNN algorithms on FPGA hardware with high performance, efficiency and flexibility. As convolution contributes most operations in CNNs, the convolution acceleration scheme significantly affects the efficiency and performance of a hardware CNN accelerator. Convolution involves multiply and accumulate (MAC) operations with four levels of loops. Without fully studying the convolution loop optimization before the hardware design phase, the resulting accelerator can hardly exploit the data reuse and manage data movement efficiently. This work overcomes these barriers by quantitatively analyzing and optimizing the design objectives (e.g. memory access) of the CNN accelerator based on multiple design variables. An efficient dataflow and hardware architecture of CNN acceleration are proposed to minimize the data communication while maximizing the resource utilization to achieve high performance. Although great performance and efficiency can be achieved by customizing the FPGA hardware for each CNN model, significant efforts and expertise are required leading to long development time, which makes it difficult to catch up with the rapid development of CNN algorithms. In this work, we present an RTL-level CNN compiler that automatically generates customized FPGA hardware for the inference tasks of various CNNs, in order to enable high-level fast prototyping of CNNs from software to FPGA and still keep the benefits of low-level hardware optimization. First, a general-purpose library of RTL modules is developed to model different operations at each layer. The integration and dataflow of physical modules are predefined in the top-level system template and reconfigured during compilation for a given CNN algorithm. The runtime control of layer-by-layer sequential computation is managed by the proposed execution schedule so that even highly irregular and complex network topology, e.g. GoogLeNet and ResNet, can be compiled. The proposed methodology is demonstrated with various CNN algorithms, e.g. NiN, VGG, GoogLeNet and ResNet, on two different standalone FPGAs achieving state-of-the art performance. Based on the optimized acceleration strategy, there are still a lot of design options, e.g. the degree and dimension of computation parallelism, the size of on-chip buffers, and the external memory bandwidth, which impact the utilization of computation resources and data communication efficiency, and finally affect the performance and energy consumption of the accelerator. The large design space of the accelerator makes it impractical to explore the optimal design choice during the real implementation phase. Therefore, a performance model is proposed in this work to quantitatively estimate the accelerator performance and resource utilization. By this means, the performance bottleneck and design bound can be identified and the optimal design option can be explored early in the design phase.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201
    • 

    corecore