65 research outputs found
Development of Low Power Image Compression Techniques
Digital camera is the main medium for digital photography. The basic operation performed by a simple digital camera is, to convert the light energy to electrical energy, then the energy is converted to digital format and a compression algorithm is used to reduce memory requirement for storing the image. This compression algorithm is frequently called for capturing and storing the images. This leads us to develop an efficient compression algorithm which will give the same result as that of the existing algorithms with low power consumption. As a result the new algorithm implemented camera can be used for capturing more images then the previous one. 1) Discrete Cosine Transform (DCT) based JPEG is an accepted standard for lossy compression of still image. Quantisation is mainly responsible for the amount loss in the image quality in the process of lossy compression. A new Energy Quantisation (EQ) method proposed for speeding up the coding and decoding procedure while preserving image qu..
Realtime image noise reduction FPGA implementation with edge detection
The purpose of this dissertation was to develop and implement, in a Field
Programmable Gate Array (FPGA), a noise reduction algorithm for real-time
sensor acquired images. A Moving Average filter was chosen due to its
fulfillment of a low demanding computational expenditure nature, speed, good
precision and low to medium hardware resources utilization. The technique is
simple to implement, however, if all pixels are indiscriminately filtered, the result
will be a blurry image which is undesirable.
Since human eye is more sensitive to contrasts, a technique was
introduced to preserve sharp contour transitions which, in the author’s opinion,
is the dissertation contribution. Synthetic and real images were tested.
Synthetic, composed both with sharp and soft tone transitions, were generated
with a developed algorithm, while real images were captured with an 8-kbit
(8192 shades) high resolution sensor scaled up to 10 × 103 shades.
A least-squares polynomial data smoothing filter, Savitzky-Golay, was
used as comparison. It can be adjusted using 3 degrees of freedom ─ the
window frame length which varies the filtering relation size between pixels’
neighborhood, the derivative order, which varies the curviness and the
polynomial coefficients which change the adaptability of the curve. Moving
Average filter only permits one degree of freedom, the window frame length.
Tests revealed promising results with 2 and 4ℎ polynomial orders. Higher
qualitative results were achieved with Savitzky-Golay’s better signal
characteristics preservation, especially at high frequencies.
FPGA algorithms were implemented in 64-bit integer registers serving
two purposes: increase precision, hence, reducing the error comparatively as if
it were done in floating-point registers; accommodate the registers’ growing
cumulative multiplications. Results were then compared with MATLAB’s double
precision 64-bit floating-point computations to verify the error difference
between both. Used comparison parameters were Mean Squared Error, Signalto-Noise Ratio and Similarity coefficient.O objetivo desta dissertação foi desenvolver e implementar, em FPGA,
um algoritmo de redução de ruído para imagens adquiridas em tempo real.
Optou-se por um filtro de Média Deslizante por não exigir uma elevada
complexidade computacional, ser rápido, ter boa precisão e requerer moderada
utilização de recursos. A técnica é simples, mas se abordada como filtragem
monotónica, o resultado é uma indesejável imagem desfocada.
Dado o olho humano ser mais sensível ao contraste, introduziu-se uma
técnica para preservar os contornos que, na opinião do autor, é a sua principal
contribuição. Utilizaram-se imagens sintéticas e reais nos testes. As sintéticas,
compostas por fortes e suaves contrastes foram geradas por um algoritmo
desenvolvido. As reais foram capturadas com um sensor de alta resolução de
8-kbit (8192 tons) e escalonadas a 10 × 103 tons.
Um filtro com suavização polinomial de mínimos quadrados, SavitzkyGolay, foi usado como comparação. Possui 3 graus de liberdade: o tamanho da
janela, que varia o tamanho da relação de filtragem entre os pixels vizinhos; a
ordem da derivada, que varia a curvatura do filtro e os coeficientes polinomiais,
que variam a adaptabilidade da curva aos pontos a suavizar. O filtro de Média
Deslizante é apenas ajustável no tamanho da janela. Os testes revelaram-se
promissores nas 2ª e 4ª ordens polinomiais. Obtiveram-se resultados
qualitativos com o filtro Savitzky-Golay que detém melhores características na
preservação do sinal, especialmente em altas frequências.
Os algoritmos em FPGA foram implementados em registos de vírgula
fixa de 64-bits, servindo dois propósitos: aumentar a precisão, reduzindo o erro
comparativamente ao terem sido em vírgula flutuante; acomodar o efeito
cumulativo das multiplicações. Os resultados foram comparados com os
cálculos de 64-bits obtidos pelo MATLAB para verificar a diferença de erro
entre ambos. Os parâmetros de medida foram MSE, SNR e coeficiente de
Semelhança
Hardware / Software Architectural and Technological Exploration for Energy-Efficient and Reliable Biomedical Devices
Nowadays, the ubiquity of smart appliances in our everyday lives is increasingly strengthening the links between humans and machines. Beyond making our lives easier and more convenient, smart devices are now playing an important role in personalized healthcare delivery. This technological breakthrough is particularly relevant in a world where population aging and unhealthy habits have made non-communicable diseases the first leading cause of death worldwide according to international public health organizations. In this context, smart health monitoring systems termed Wireless Body Sensor Nodes (WBSNs), represent a paradigm shift in the healthcare landscape by greatly lowering the cost of long-term monitoring of chronic diseases, as well as improving patients' lifestyles. WBSNs are able to autonomously acquire biological signals and embed on-node Digital Signal Processing (DSP) capabilities to deliver clinically-accurate health diagnoses in real-time, even outside of a hospital environment. Energy efficiency and reliability are fundamental requirements for WBSNs, since they must operate for extended periods of time, while relying on compact batteries. These constraints, in turn, impose carefully designed hardware and software architectures for hosting the execution of complex biomedical applications. In this thesis, I develop and explore novel solutions at the architectural and technological level of the integrated circuit design domain, to enhance the energy efficiency and reliability of current WBSNs. Firstly, following a top-down approach driven by the characteristics of biomedical algorithms, I perform an architectural exploration of a heterogeneous and reconfigurable computing platform devoted to bio-signal analysis. By interfacing a shared Coarse-Grained Reconfigurable Array (CGRA) accelerator, this domain-specific platform can achieve higher performance and energy savings, beyond the capabilities offered by a baseline multi-processor system. More precisely, I propose three CGRA architectures, each contributing differently to the maximization of the application parallelization. The proposed Single, Multi and Interleaved-Datapath CGRA designs allow the developed platform to achieve substantial energy savings of up to 37%, when executing complex biomedical applications, with respect to a multi-core-only platform. Secondly, I investigate how the modeling of technology reliability issues in logic and memory components can be exploited to adequately adjust the frequency and supply voltage of a circuit, with the aim of optimizing its computing performance and energy efficiency. To this end, I propose a novel framework for workload-dependent Bias Temperature Instability (BTI) impact analysis on biomedical application results quality. Remarkably, the framework is able to determine the range of safe circuit operating frequencies without introducing worst-case guard bands. Experiments highlight the possibility to safely raise the frequency up to 101% above the maximum obtained with the classical static timing analysis. Finally, through the study of several well-known biomedical algorithms, I propose an approach allowing energy savings by dynamically and unequally protecting an under-powered data memory in a new way compared to regular error protection schemes. This solution relies on the Dynamic eRror compEnsation And Masking (DREAM) technique that reduces by approximately 21% the energy consumed by traditional error correction codes
Enhancing Real-time Embedded Image Processing Robustness on Reconfigurable Devices for Critical Applications
Nowadays, image processing is increasingly used in several application fields, such as biomedical, aerospace, or automotive. Within these fields, image processing is used to serve both non-critical and critical tasks. As example, in automotive, cameras are becoming key sensors in increasing car safety, driving assistance and driving comfort. They have been employed for infotainment (non-critical), as well as for some driver assistance tasks (critical), such as Forward Collision Avoidance, Intelligent Speed Control, or Pedestrian Detection.
The complexity of these algorithms brings a challenge in real-time image processing systems, requiring high computing capacity, usually not available in processors for embedded systems. Hardware acceleration is therefore crucial, and devices such as Field Programmable Gate Arrays (FPGAs) best fit the growing demand of computational capabilities. These devices can assist embedded processors by significantly speeding-up computationally intensive software algorithms.
Moreover, critical applications introduce strict requirements not only from the real-time constraints, but also from the device reliability and algorithm robustness points of view. Technology scaling is highlighting reliability problems related to aging phenomena, and to the increasing sensitivity of digital devices to external radiation events that can cause transient or even permanent faults. These faults can lead to wrong information processed or, in the worst case, to a dangerous system failure. In this context, the reconfigurable nature of FPGA devices can be exploited to increase the system reliability and robustness by leveraging Dynamic Partial Reconfiguration features.
The research work presented in this thesis focuses on the development of techniques for implementing efficient and robust real-time embedded image processing hardware accelerators and systems for mission-critical applications. Three main challenges have been faced and will be discussed, along with proposed solutions, throughout the thesis: (i) achieving real-time performances, (ii) enhancing algorithm robustness, and (iii) increasing overall system's dependability.
In order to ensure real-time performances, efficient FPGA-based hardware accelerators implementing selected image processing algorithms have been developed. Functionalities offered by the target technology, and algorithm's characteristics have been constantly taken into account while designing such accelerators, in order to efficiently tailor algorithm's operations to available hardware resources.
On the other hand, the key idea for increasing image processing algorithms' robustness is to introduce self-adaptivity features at algorithm level, in order to maintain constant, or improve, the quality of results for a wide range of input conditions, that are not always fully predictable at design-time (e.g., noise level variations). This has been accomplished by measuring at run-time some characteristics of the input images, and then tuning the algorithm parameters based on such estimations. Dynamic reconfiguration features of modern reconfigurable FPGA have been extensively exploited in order to integrate run-time adaptivity into the designed hardware accelerators.
Tools and methodologies have been also developed in order to increase the overall system dependability during reconfiguration processes, thus providing safe run-time adaptation mechanisms. In addition, taking into account the target technology and the environments in which the developed hardware accelerators and systems may be employed, dependability issues have been analyzed, leading to the development of a platform for quickly assessing the reliability and characterizing the behavior of hardware accelerators implemented on reconfigurable FPGAs when they are affected by such faults
Simulation and implementation of novel deep learning hardware architectures for resource constrained devices
Corey Lammie designed mixed signal memristive-complementary metal–oxide–semiconductor (CMOS) and field programmable gate arrays (FPGA) hardware architectures, which were used to reduce the power and resource requirements of Deep Learning (DL) systems; both during inference and training. Disruptive design methodologies, such as those explored in this thesis, can be used to facilitate the design of next-generation DL systems
Energy analysis and optimisation techniques for automatically synthesised coprocessors
The primary outcome of this research project is the development of a methodology enabling fast automated early-stage power and energy analysis of configurable processors for system-on-chip platforms. Such capability is essential to the process of selecting energy efficient processors during design-space exploration, when potential savings are highest. This has been achieved by developing dynamic and static energy consumption models for the constituent blocks within the processors.
Several optimisations have been identified, specifically targeting the most significant blocks in terms of energy consumption. Instruction encoding mechanism reduces both the energy and area requirements of the instruction cache; modifications to the multiplier unit reduce energy consumption during inactive cycles. Both techniques are demonstrated to offer substantial energy savings.
The aforementioned techniques have undergone detailed evaluation and, based on the positive outcomes obtained, have been incorporated into Cascade, a system-on-chip coprocessor synthesis tool developed by Critical Blue, to provide automated analysis and optimisation of processor energy requirements. This thesis details the process of identifying and examining each method, along with the results obtained. Finally, a case study demonstrates the benefits of the developed functionality, from the perspective of someone using Cascade to automate the creation of an energy-efficient configurable processor for system-on-chip platforms
Recommended from our members
The application of artificial neural networks to interpret acoustic emissions from submerged arc welding
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Automated fusion welding processes play a fundamental role in modern manufacturing industries. The proliferation of joint geometries together with the large permutation of associated process variable configurations has given rise to research into complex system modelling and control strategies. Many of these techniques have involved monitoring of not only the electrical characteristics of the process but visual and acoustic information. Acoustic information derived from certain welding processes is well documented as it is an established fact that skilled manual welders utilise such information as an aid to creating an optimum weld. The experimental investigation presented in this thesis is dedicated to the feasibility of monitoring airborne acoustic emissions of Submerged Arc Welding (SAW) for diagnostic and real time control purposes. The experimental method adopted for this research takes a cybernetic approach to data processing and interpretation in an attempt to replicate the robustness of human biological functions. A custom designed audio hardware system was used to analyse signals obtained from bead on mild steel plate fusion welds. Time and frequency domains were used in an attempt to establish salient characteristics or identify the signatures associated with changes of the process variables. The featured parameters were voltage / current and weld travel speed, due to their ease of validation. However, consideration has also been given to weld defect prediction due to process instabilities. As the data proved to be highly correlated and erratic when subjected to off line statistical analysis, extensive investigation was given to the application of artificial neural networks to signal processing and real time control scenarios. As a consequence, a dedicated neural based software system was developed, utilising supervised and unsupervised neural techniques to monitor the process. The research was aimed at proving the feasibility of monitoring the electrical process parameters and stability of the welding process in real time. It was shown to be possible, by the exploitation of artificial neural networks, to generate a number of monitoring parameters indicative of the welding process state. The limitations of the present neural method and proposed developments are discussed, together with an overview of applied neural network technology and its impact on artificial intelligence and robotic control. Further developments are considered together with recommendations for future areas of research
New Techniques for On-line Testing and Fault Mitigation in GPUs
L'abstract è presente nell'allegato / the abstract is in the attachmen
Addressing concerns in performance prediction : the impact of data dependencies and denormal arithmetic in scientific codes
To meet the increasing computational requirements of the scientific community, the use of parallel programming has become commonplace, and in recent years distributed applications running on clusters of computers have become the norm.
Both parallel and distributed applications face the problem of predictive uncertainty and variations in runtime. Modern scientific applications have varying I/O, cache, and memory profiles that have significant and difficult to predict effects on their runtimes. Data-dependent sensitivities such as the costs of denormal floating point calculations introduce more variations in runtime, further hindering predictability.
Applications with unpredictable performance or which have highly variable runtimes can cause several problems. If the runtime of an application is unknown or varies widely, workflow schedulers cannot e�ciently allocate them to compute nodes, leading to the under-utilisation of expensive resources. Similarly, a lack of accurate knowledge of the performance of an application on new hardware can lead to misguided procurement decisions. In heavily parallel applications, minor variations in runtime on individual nodes can have disproportionate effects on the overall application runtime. Even on a smaller scale, a lack of certainty about an application's runtime can preclude its use in real-time or time-critical applications such as clinical diagnosis.
This thesis investigates two sources of data-dependent performance variability. The first source is algorithmic and is seen in a state-of-the-art C++ biomedical imaging application. It identifies the cause of the variability in the application and develops a means of characterising the variability. This 'probe task' based model is adapted for use with a workflow scheduler, and the scheduling improvements it brings are examined.
The second source of variability is more subtle as it is micro-architectural in nature. Depending on the input data, two runs of an application executing exactly the same sequence of instructions and with exactly the same memory access patterns can have large differences in runtime due to deficiencies in common hardware implementations of denormal arithmetic1. An exception-based profiler is written to detect occurrences of denormal arithmetic and it is shown how this is insufficient to isolate the sources of denormal arithmetic in an application. A novel tool based on theValgrind binary instrumentation framework is developed which can trace the origins of denormal values and the frequency of their occurrence in an application's data structures. This second tool is used to isolate and remove the cause of denormal arithmetic both from a simple numerical code, and then from a face recognition application
- …