1,551 research outputs found
Systolic Array Implementations With Reduced Compute Time.
The goal of the research is the establishment of a formal methodology to develop computational structures more suitable for the changing nature of real-time signal processing and control applications. A major effort is devoted to the following question: Given a systolic array designed to execute a particular algorithm, what other algorithms can be executed on the same array? One approach for answering this question is based on a general model of array operations using graph-theoretic techniques. As a result, a systematic procedure is introduced that models array operations as a function of the compute cycle. As a consequence of the analysis, the dissertation develops the concept of fast algorithm realizations. This concept characterizes specific realizations that can be evaluated in a reduced number of cycles. It restricts the operations to remain in the same class but with reduced execution time. The concept takes advantage of the data dependencies of the algorithm at hand. This feature allows the modification of existing structures by reordering the input data. Applications of the principle allows optimum time band and triangular matrix product on arrays designed for dense matrices. A second approach for analyzing the families of algorithms implementable in an array, is based on the concept of array time constrained operation. The principle uses the number of compute cycle as an additional degree of freedom to expand the class of transformations generated by a single array. A mathematical approach, based on concepts from multilinear algebra, is introduced to model the recursive transformations implemented in linear arrays at each compute cycle. The proposed representation is general enough to encompass a large class of signal processing and control applications. A complete analytical model of the linear maps implementable by the array at each compute cycle is developed. The proposed methodology results in arrays that are more adaptable to the changing nature of operations. Lessons learned from analyzing existing arrays are used to design smart arrays for special algorithm realizations. Applications of the methodology include the design of flexible time structures and the ability to decompose a full size array into subarrays implementing smaller size problems
Generalized Methodology for Array Processor Design of Real-time Systems
Many techniques and design tools have been developed for mapping algorithms to array processors. Linear mapping is usually used for regular algorithms. Large and complex problems are not regular by nature and regularization may cause a computational overhead which prevents the ability to meet real-time deadlines. In this paper, a systematic design methodology for mapping partially-regular as well as regular Dependence Graphs is presented. In this approach the set of all optimal solutions is generated under the given constraints. Due to nature of the problem and the tight timing constraints of real-time systems the set of alternative solutions is limited. An image processing example is discusse
Blood flow velocity prediction in aorto-iliac stent grafts using computational fluid dynamics and Taguchi method.
Covered Endovascular Reconstruction of Aortic Bifurcation (CERAB) is a new technique to treat extensive aortoiliac occlusive disease with covered expandable stent grafts to rebuild the aortoiliac bifurcation. Post stenting Doppler ultrasound (DUS) measurement of maximum peak systolic velocity (PSVmax) in the stented segment is widely used to determine patency and for follow up surveillance due to the portability, affordability and ease of use. Anecdotally, changes in hemodynamics created by CERAB can lead to falsely high PSVmax requiring CT angiography (CTA) for further assessment. Therefore, the importance of DUS would be enhanced with a proposed PSVmax prediction tool to ascertain whether PSVmax falls within the acceptable range of prediction. We have developed a prediction tool based on idealized models of aortoiliac bifurcations with various infra-renal PSV (PSVin), iliac to aortic area ratios (R) and aortoiliac bifurcation angles (a). Taguchi method with orthogonal arrays (OA) was utilized to minimize the number of Computational Fluid Dynamics (CFD) simulations performed under physiologically realistic conditions. Analysis of Variance (ANOVA) and Multiple Linear Regression (MLR) analyses were performed to assess Goodness of fit and to predict PSVmax. PSVin and R were found to contribute 94.06% and 3.36% respectively to PSVmax. The Goodness of fit based on adjusted R2 improved from 99.1% to 99.9% based on linear and exponential functions. The PSVmax predictor based on the exponential model was evaluated with sixteen patient specific cases with a mean prediction error of 9.9% and standard deviation of 6.4%. Eleven out of sixteen cases (69%) in our current retrospective studies would have avoided CTA if the proposed predictor was used to screen out DUS measured PSVmax with prediction error greater than 15%. The predictor therefore has the potential to be used as a clinical tool to detect PSVmax more accurately post aortoiliac stenting and might reduce diagnostic errors and avoid unnecessary expense and risk from CTA follow-up imaging
Recommended from our members
Efficient FPGA implementation and power modelling of image and signal processing IP cores
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Field Programmable Gate Arrays (FPGAs) are the technology of choice in a number ofimage
and signal processing application areas such as consumer electronics, instrumentation,
medical data processing and avionics due to their reasonable energy consumption, high performance, security, low design-turnaround time and reconfigurability. Low power FPGA
devices are also emerging as competitive solutions for mobile and thermally constrained platforms. Most computationally intensive image and signal processing algorithms also consume a lot of power leading to a number of issues including reduced mobility, reliability concerns and increased design cost among others. Power dissipation has become one of the most important challenges, particularly for FPGAs. Addressing this problem requires optimisation and awareness at all levels in the design flow. The key achievements of the
work presented in this thesis are summarised here. Behavioural level optimisation strategies have been used for implementing matrix product and inner product through the use of mathematical techniques such as Distributed Arithmetic (DA) and its variations including offset binary coding, sparse factorisation and novel vector level transformations. Applications to test the impact of these algorithmic and arithmetic transformations include the fast Hadamard/Walsh transforms and Gaussian mixture models. Complete design space exploration has been performed on these cores, and where appropriate, they have been shown to clearly outperform comparable existing implementations. At the architectural level, strategies such as parallelism, pipelining and systolisation have been successfully applied for the design and optimisation of a number of
cores including colour space conversion, finite Radon transform, finite ridgelet transform and circular convolution. A pioneering study into the influence of supply voltage scaling for FPGA based designs, used in conjunction with performance enhancing strategies such as parallelism and pipelining has been performed. Initial results are very promising and indicated significant potential for future research in this area.
A key contribution of this work includes the development of a novel high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called Functional Level Power Analysis and Modelling (FLPAM). FLPAM
is scalable, platform independent and compares favourably with existing approaches. A hybrid, top-down design flow paradigm integrating FLPAM with commercially available design tools for systematic optimisation of IP cores has also been developed
A 2D DWT architecture suitable for the Embedded Zerotree Wavelet Algorithm
Digital Imaging has had an enormous impact on industrial applications such as the Internet and video-phone systems. However, demand for industrial applications is growing enormously. In particular, internet application users are, growing at a near exponential rate. The sharp increase in applications using digital images has caused much emphasis on the fields of image coding, storage, processing and communications. New techniques are continuously developed with the main aim of increasing efficiency. Image coding is in particular a field of great commercial interest. A digital image requires a large amount of data to be created. This large amount of data causes many problems when storing, transmitting or processing the image. Reducing the amount of data that can be used to represent an image is the main objective of image coding. Since the main objective is to reduce the amount of data that represents an image, various techniques have been developed and are continuously developed to increase efficiency. The JPEG image coding standard has enjoyed widespread acceptance, and the industry continues to explore its various implementation issues. However, recent research indicates multiresolution based image coding is a far superior alternative.
A recent development in the field of image coding is the use of Embedded Zerotree Wavelet (EZW) as the technique to achieve image compression. One of The aims of this theses is to explain how this technique is superior to other current coding standards. It will be seen that an essential part orthis method of image coding is the use of multi resolution analysis, a subband system whereby the subbands arc logarithmically spaced in frequency and represent an octave band decomposition. The block structure that implements this function is termed the two dimensional Discrete Wavelet Transform (2D-DWT). The 20 DWT is achieved by several architectures and these are analysed in order to choose the best suitable architecture for the EZW coder. Finally, this architecture is implemented and verified using the Synopsys Behavioural Compiler and recommendations are made based on experimental findings
- …