65 research outputs found

    Development of Low Power Image Compression Techniques

    Get PDF
    Digital camera is the main medium for digital photography. The basic operation performed by a simple digital camera is, to convert the light energy to electrical energy, then the energy is converted to digital format and a compression algorithm is used to reduce memory requirement for storing the image. This compression algorithm is frequently called for capturing and storing the images. This leads us to develop an efficient compression algorithm which will give the same result as that of the existing algorithms with low power consumption. As a result the new algorithm implemented camera can be used for capturing more images then the previous one. 1) Discrete Cosine Transform (DCT) based JPEG is an accepted standard for lossy compression of still image. Quantisation is mainly responsible for the amount loss in the image quality in the process of lossy compression. A new Energy Quantisation (EQ) method proposed for speeding up the coding and decoding procedure while preserving image qu..

    Realtime image noise reduction FPGA implementation with edge detection

    Get PDF
    The purpose of this dissertation was to develop and implement, in a Field Programmable Gate Array (FPGA), a noise reduction algorithm for real-time sensor acquired images. A Moving Average filter was chosen due to its fulfillment of a low demanding computational expenditure nature, speed, good precision and low to medium hardware resources utilization. The technique is simple to implement, however, if all pixels are indiscriminately filtered, the result will be a blurry image which is undesirable. Since human eye is more sensitive to contrasts, a technique was introduced to preserve sharp contour transitions which, in the author’s opinion, is the dissertation contribution. Synthetic and real images were tested. Synthetic, composed both with sharp and soft tone transitions, were generated with a developed algorithm, while real images were captured with an 8-kbit (8192 shades) high resolution sensor scaled up to 10 × 103 shades. A least-squares polynomial data smoothing filter, Savitzky-Golay, was used as comparison. It can be adjusted using 3 degrees of freedom ─ the window frame length which varies the filtering relation size between pixels’ neighborhood, the derivative order, which varies the curviness and the polynomial coefficients which change the adaptability of the curve. Moving Average filter only permits one degree of freedom, the window frame length. Tests revealed promising results with 2 and 4ℎ polynomial orders. Higher qualitative results were achieved with Savitzky-Golay’s better signal characteristics preservation, especially at high frequencies. FPGA algorithms were implemented in 64-bit integer registers serving two purposes: increase precision, hence, reducing the error comparatively as if it were done in floating-point registers; accommodate the registers’ growing cumulative multiplications. Results were then compared with MATLAB’s double precision 64-bit floating-point computations to verify the error difference between both. Used comparison parameters were Mean Squared Error, Signalto-Noise Ratio and Similarity coefficient.O objetivo desta dissertação foi desenvolver e implementar, em FPGA, um algoritmo de redução de ruído para imagens adquiridas em tempo real. Optou-se por um filtro de Média Deslizante por não exigir uma elevada complexidade computacional, ser rápido, ter boa precisão e requerer moderada utilização de recursos. A técnica é simples, mas se abordada como filtragem monotónica, o resultado é uma indesejável imagem desfocada. Dado o olho humano ser mais sensível ao contraste, introduziu-se uma técnica para preservar os contornos que, na opinião do autor, é a sua principal contribuição. Utilizaram-se imagens sintéticas e reais nos testes. As sintéticas, compostas por fortes e suaves contrastes foram geradas por um algoritmo desenvolvido. As reais foram capturadas com um sensor de alta resolução de 8-kbit (8192 tons) e escalonadas a 10 × 103 tons. Um filtro com suavização polinomial de mínimos quadrados, SavitzkyGolay, foi usado como comparação. Possui 3 graus de liberdade: o tamanho da janela, que varia o tamanho da relação de filtragem entre os pixels vizinhos; a ordem da derivada, que varia a curvatura do filtro e os coeficientes polinomiais, que variam a adaptabilidade da curva aos pontos a suavizar. O filtro de Média Deslizante é apenas ajustável no tamanho da janela. Os testes revelaram-se promissores nas 2ª e 4ª ordens polinomiais. Obtiveram-se resultados qualitativos com o filtro Savitzky-Golay que detém melhores características na preservação do sinal, especialmente em altas frequências. Os algoritmos em FPGA foram implementados em registos de vírgula fixa de 64-bits, servindo dois propósitos: aumentar a precisão, reduzindo o erro comparativamente ao terem sido em vírgula flutuante; acomodar o efeito cumulativo das multiplicações. Os resultados foram comparados com os cálculos de 64-bits obtidos pelo MATLAB para verificar a diferença de erro entre ambos. Os parâmetros de medida foram MSE, SNR e coeficiente de Semelhança

    Hardware / Software Architectural and Technological Exploration for Energy-Efficient and Reliable Biomedical Devices

    Get PDF
    Nowadays, the ubiquity of smart appliances in our everyday lives is increasingly strengthening the links between humans and machines. Beyond making our lives easier and more convenient, smart devices are now playing an important role in personalized healthcare delivery. This technological breakthrough is particularly relevant in a world where population aging and unhealthy habits have made non-communicable diseases the first leading cause of death worldwide according to international public health organizations. In this context, smart health monitoring systems termed Wireless Body Sensor Nodes (WBSNs), represent a paradigm shift in the healthcare landscape by greatly lowering the cost of long-term monitoring of chronic diseases, as well as improving patients' lifestyles. WBSNs are able to autonomously acquire biological signals and embed on-node Digital Signal Processing (DSP) capabilities to deliver clinically-accurate health diagnoses in real-time, even outside of a hospital environment. Energy efficiency and reliability are fundamental requirements for WBSNs, since they must operate for extended periods of time, while relying on compact batteries. These constraints, in turn, impose carefully designed hardware and software architectures for hosting the execution of complex biomedical applications. In this thesis, I develop and explore novel solutions at the architectural and technological level of the integrated circuit design domain, to enhance the energy efficiency and reliability of current WBSNs. Firstly, following a top-down approach driven by the characteristics of biomedical algorithms, I perform an architectural exploration of a heterogeneous and reconfigurable computing platform devoted to bio-signal analysis. By interfacing a shared Coarse-Grained Reconfigurable Array (CGRA) accelerator, this domain-specific platform can achieve higher performance and energy savings, beyond the capabilities offered by a baseline multi-processor system. More precisely, I propose three CGRA architectures, each contributing differently to the maximization of the application parallelization. The proposed Single, Multi and Interleaved-Datapath CGRA designs allow the developed platform to achieve substantial energy savings of up to 37%, when executing complex biomedical applications, with respect to a multi-core-only platform. Secondly, I investigate how the modeling of technology reliability issues in logic and memory components can be exploited to adequately adjust the frequency and supply voltage of a circuit, with the aim of optimizing its computing performance and energy efficiency. To this end, I propose a novel framework for workload-dependent Bias Temperature Instability (BTI) impact analysis on biomedical application results quality. Remarkably, the framework is able to determine the range of safe circuit operating frequencies without introducing worst-case guard bands. Experiments highlight the possibility to safely raise the frequency up to 101% above the maximum obtained with the classical static timing analysis. Finally, through the study of several well-known biomedical algorithms, I propose an approach allowing energy savings by dynamically and unequally protecting an under-powered data memory in a new way compared to regular error protection schemes. This solution relies on the Dynamic eRror compEnsation And Masking (DREAM) technique that reduces by approximately 21% the energy consumed by traditional error correction codes

    Enhancing Real-time Embedded Image Processing Robustness on Reconfigurable Devices for Critical Applications

    Get PDF
    Nowadays, image processing is increasingly used in several application fields, such as biomedical, aerospace, or automotive. Within these fields, image processing is used to serve both non-critical and critical tasks. As example, in automotive, cameras are becoming key sensors in increasing car safety, driving assistance and driving comfort. They have been employed for infotainment (non-critical), as well as for some driver assistance tasks (critical), such as Forward Collision Avoidance, Intelligent Speed Control, or Pedestrian Detection. The complexity of these algorithms brings a challenge in real-time image processing systems, requiring high computing capacity, usually not available in processors for embedded systems. Hardware acceleration is therefore crucial, and devices such as Field Programmable Gate Arrays (FPGAs) best fit the growing demand of computational capabilities. These devices can assist embedded processors by significantly speeding-up computationally intensive software algorithms. Moreover, critical applications introduce strict requirements not only from the real-time constraints, but also from the device reliability and algorithm robustness points of view. Technology scaling is highlighting reliability problems related to aging phenomena, and to the increasing sensitivity of digital devices to external radiation events that can cause transient or even permanent faults. These faults can lead to wrong information processed or, in the worst case, to a dangerous system failure. In this context, the reconfigurable nature of FPGA devices can be exploited to increase the system reliability and robustness by leveraging Dynamic Partial Reconfiguration features. The research work presented in this thesis focuses on the development of techniques for implementing efficient and robust real-time embedded image processing hardware accelerators and systems for mission-critical applications. Three main challenges have been faced and will be discussed, along with proposed solutions, throughout the thesis: (i) achieving real-time performances, (ii) enhancing algorithm robustness, and (iii) increasing overall system's dependability. In order to ensure real-time performances, efficient FPGA-based hardware accelerators implementing selected image processing algorithms have been developed. Functionalities offered by the target technology, and algorithm's characteristics have been constantly taken into account while designing such accelerators, in order to efficiently tailor algorithm's operations to available hardware resources. On the other hand, the key idea for increasing image processing algorithms' robustness is to introduce self-adaptivity features at algorithm level, in order to maintain constant, or improve, the quality of results for a wide range of input conditions, that are not always fully predictable at design-time (e.g., noise level variations). This has been accomplished by measuring at run-time some characteristics of the input images, and then tuning the algorithm parameters based on such estimations. Dynamic reconfiguration features of modern reconfigurable FPGA have been extensively exploited in order to integrate run-time adaptivity into the designed hardware accelerators. Tools and methodologies have been also developed in order to increase the overall system dependability during reconfiguration processes, thus providing safe run-time adaptation mechanisms. In addition, taking into account the target technology and the environments in which the developed hardware accelerators and systems may be employed, dependability issues have been analyzed, leading to the development of a platform for quickly assessing the reliability and characterizing the behavior of hardware accelerators implemented on reconfigurable FPGAs when they are affected by such faults

    Simulation and implementation of novel deep learning hardware architectures for resource constrained devices

    Get PDF
    Corey Lammie designed mixed signal memristive-complementary metal–oxide–semiconductor (CMOS) and field programmable gate arrays (FPGA) hardware architectures, which were used to reduce the power and resource requirements of Deep Learning (DL) systems; both during inference and training. Disruptive design methodologies, such as those explored in this thesis, can be used to facilitate the design of next-generation DL systems

    Energy analysis and optimisation techniques for automatically synthesised coprocessors

    Get PDF
    The primary outcome of this research project is the development of a methodology enabling fast automated early-stage power and energy analysis of configurable processors for system-on-chip platforms. Such capability is essential to the process of selecting energy efficient processors during design-space exploration, when potential savings are highest. This has been achieved by developing dynamic and static energy consumption models for the constituent blocks within the processors. Several optimisations have been identified, specifically targeting the most significant blocks in terms of energy consumption. Instruction encoding mechanism reduces both the energy and area requirements of the instruction cache; modifications to the multiplier unit reduce energy consumption during inactive cycles. Both techniques are demonstrated to offer substantial energy savings. The aforementioned techniques have undergone detailed evaluation and, based on the positive outcomes obtained, have been incorporated into Cascade, a system-on-chip coprocessor synthesis tool developed by Critical Blue, to provide automated analysis and optimisation of processor energy requirements. This thesis details the process of identifying and examining each method, along with the results obtained. Finally, a case study demonstrates the benefits of the developed functionality, from the perspective of someone using Cascade to automate the creation of an energy-efficient configurable processor for system-on-chip platforms

    New Techniques for On-line Testing and Fault Mitigation in GPUs

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Addressing concerns in performance prediction : the impact of data dependencies and denormal arithmetic in scientific codes

    Get PDF
    To meet the increasing computational requirements of the scientific community, the use of parallel programming has become commonplace, and in recent years distributed applications running on clusters of computers have become the norm. Both parallel and distributed applications face the problem of predictive uncertainty and variations in runtime. Modern scientific applications have varying I/O, cache, and memory profiles that have significant and difficult to predict effects on their runtimes. Data-dependent sensitivities such as the costs of denormal floating point calculations introduce more variations in runtime, further hindering predictability. Applications with unpredictable performance or which have highly variable runtimes can cause several problems. If the runtime of an application is unknown or varies widely, workflow schedulers cannot e�ciently allocate them to compute nodes, leading to the under-utilisation of expensive resources. Similarly, a lack of accurate knowledge of the performance of an application on new hardware can lead to misguided procurement decisions. In heavily parallel applications, minor variations in runtime on individual nodes can have disproportionate effects on the overall application runtime. Even on a smaller scale, a lack of certainty about an application's runtime can preclude its use in real-time or time-critical applications such as clinical diagnosis. This thesis investigates two sources of data-dependent performance variability. The first source is algorithmic and is seen in a state-of-the-art C++ biomedical imaging application. It identifies the cause of the variability in the application and develops a means of characterising the variability. This 'probe task' based model is adapted for use with a workflow scheduler, and the scheduling improvements it brings are examined. The second source of variability is more subtle as it is micro-architectural in nature. Depending on the input data, two runs of an application executing exactly the same sequence of instructions and with exactly the same memory access patterns can have large differences in runtime due to deficiencies in common hardware implementations of denormal arithmetic1. An exception-based profiler is written to detect occurrences of denormal arithmetic and it is shown how this is insufficient to isolate the sources of denormal arithmetic in an application. A novel tool based on theValgrind binary instrumentation framework is developed which can trace the origins of denormal values and the frequency of their occurrence in an application's data structures. This second tool is used to isolate and remove the cause of denormal arithmetic both from a simple numerical code, and then from a face recognition application
    corecore