8 research outputs found

    BioThreads: a novel VLIW-based chip multiprocessor for accelerating biomedical image processing applications

    Get PDF
    We discuss BioThreads, a novel, configurable, extensible system-on-chip multiprocessor and its use in accelerating biomedical signal processing applications such as imaging photoplethysmography (IPPG). BioThreads is derived from the LE1 open-source VLIW chip multiprocessor and efficiently handles instruction, data and thread-level parallelism. In addition, it supports a novel mechanism for the dynamic creation, and allocation of software threads to uncommitted processor cores by implementing key POSIX Threads primitives directly in hardware, as custom instructions. In this study, the BioThreads core is used to accelerate the calculation of the oxygen saturation map of living tissue in an experimental setup consisting of a high speed image acquisition system, connected to an FPGA board and to a host system. Results demonstrate near-linear acceleration of the core kernels of the target blood perfusion assessment with increasing number of hardware threads. The BioThreads processor was implemented on both standard-cell and FPGA technologies; in the first case and for an issue width of two, full real-time performance is achieved with 4 cores whereas on a mid-range Xilinx Virtex6 device this is achieved with 10 dual-issue cores. An 8-core LE1 VLIW FPGA prototype of the system achieved 240 times faster execution time than the scalar Microblaze processor demonstrating the scalability of the proposed solution to a state-of-the-art FPGA vendor provided soft CPU core

    Parameterized Hardware Design on Reconfigurable Computers: An Image Processing Case Study

    Get PDF
    Reconfigurable Computers (RCs) with hardware (FPGA) co-processors can achieve significant performance improvement compared with traditional microprocessor (μP)-based computers for many scientific applications. The potential amount of speedup depends on the intrinsic parallelism of the target application as well as the characteristics of the target platform. In this work, we use image processing applications as a case study to demonstrate how hardware designs are parameterized by the co-processor architecture, particularly the data I/O, i.e., the local memory of the FPGA device and the interconnect between the FPGA and the μP. The local memory has to be used by applications that access data randomly. A typical case belonging to this category is image registration. On the other hand, an application such as edge detection can directly read data through the interconnect in a sequential fashion. Two different algorithms of image registration, the exhaustive search algorithm and the Discrete Wavelet Transform (DWT)-based search algorithm, are implemented on hardware, i.e., Xilinx Vertex-IIPro 50 on the Cray XD1 reconfigurable computer. The performance improvements of hardware implementations are 10× and 2×, respectively. Regarding the category of applications that directly access the interconnect, the hardware implementation of Canny edge detection can achieve 544× speedup

    Opto-physiological modeling applied to photoplethysmographic cardiovascular assessment

    Get PDF
    This paper presents opto-physiological (OP) modeling and its application in cardiovascular assessment techniques based on photoplethysmography (PPG). Existing contact point measurement techniques, i.e., pulse oximetry probes, are compared with the next generation noncontact and imaging implementations, i.e., non-contact reflection and camera-based PPG. The further development of effective physiological monitoring techniques relies on novel approaches to OP modeling that can better inform the design and development of sensing hardware and applicable signal processing procedures. With the help of finite-element optical simulation, fundamental research into OP modeling of photoplethysmography is being exploited towards the development of engineering solutions for practical biomedical systems. This paper reviews a body of research comprising two OP models that have led to significant progress in the design of transmission mode pulse oximetry probes, and approaches to 3D blood perfusion mapping for the interpretation of cardiovascular performance

    Hardware accelerated image processing to enable real-time adaptive radiotherapy

    Get PDF
    The accuracy of radiotherapy is constrained by organ motion and deformation occurring between the acquisition of CT and MR images used to plan the treatment and the time at which the treatment is delivered. Adaptive radiotherapy uses image data acquired at the time of treatment to adapt the original treatment plan to match the current patient anatomy. Currently, the image processing and dose calculation algorithms required to perform this plan adaptation cannot be executed in a clinically acceptable timeframe. Hardware acceleration has the potential to speedup these algorithms, making real-time adaptive radiotherapy a clinical possibility[1]. Hardware acceleration is a technique where an algorithm is implemented using hardware that is better suited to the specific algorithm than more general purpose processors in order to reduce the execution time of the algorithm. This can be achieved using field programmable gate arrays (FPGA), which are devices consisting of reconfigurable hardware, allowing their function to be customised for a specific application. These devices have been shown to be able to accelerate image processing algorithms pertinent to adaptive radiotherapy[2]. In this study a global thresholding algorithm based on Otsu’s method combined with a three dimensional mean filter was used to segment a series of CT images of a Modus QUASAR respiratory motion phantom into three unique classes. A Xilinx Zynq Z-7020 device consisting of a dual-core ARM Cortex-A9 central processing unit (CPU) coupled to an 85 000 logic cell FPGA was used to accelerate the algorithm by implementing sections of it in the reconfigurable hardware. The execution time of this implementation was compared to an implementation running on an ARM CPU and Intel Core-i5 CPU. The execution times of the implementations are shown in table 1. The hardware accelerated implementation was found to execute nearly sixty times as fast as the un-accelerated algorithm. The hardware accelerated implementation was also found to run around 14% faster than on the more powerful Intel Core-i5 CPU. Figure 1 shows an example of the segmentation results where the blue contour represents the boundary between two of the classes. In the algorithms presented here the overhead of transferring data to the hardware represents a significant proportion of the algorithm execution time. It is anticipated that greater acceleration will be possible for algorithms with greater computational complexity because the data transfer overhead will represent a smaller proportion of the overall execution time. The requirement for fast processing in radiotherapy is likely to increase as the amount of data available to more accurately guide treatment increases through the use of techniques such as 4D CT and image-guided radiotherapy. FPGA have been shown to be effective at accelerating certain algorithms required for real-time adaptive radiotherapy, however, more research is required to establish which will execute faster on other types of hardware, such as CPU and graphical processing units (GPU). It is likely that heterogeneous computing platforms, composed of a mixture of hardware architectures, will be used in the future implementation of real-time adaptive radiotherapy. References: 1.K. Østergaard Noe, B.D. De Senneville, U.V. Elstrøm, K. Tanderup, T.S. Sørensen, “Acceleration and validation of optical flow based deformable registration for image-guided radiotherapy,” Acta Oncologica, vol. 47, no. 7, pp.1286-1293, 2008 2.O. Dandekar, R. Shekhar, “FPGA-Accelerated Deformable Image Registration for Improved Target-Delineation During CT-Guided Interventions,” IEEE Trans. Biomed. Circuits Syst., vol. 1, no. 2, pp.116-127, 200

    Parallelization of Non-Rigid Image Registration

    Get PDF
    Non-rigid image registration finds use in a wide range of medical applications ranging from diagnostics to minimally invasive image-guided interventions. Automatic non-rigid image registration algorithms are computationally intensive in that they can take hours to register two images. Although hierarchical volume subdivision-based algorithms are inherently faster than other non-rigid registration algorithms, they can still take a long time to register two images. We show a parallel implementation of one such previously reported and well tested algorithm on a cluster of thirty two processors which reduces the registration time from hours to a few minutes. Mutual information (MI) is one of the most commonly used image similarity measures used in medical image registration and also in the mentioned algorithm. In addition to parallel implementation, we propose a new concept based on bit-slicing to accelerate computation of MI on the cluster and, more generally, on any parallel computing platform such as the Graphics processor units (GPUs). GPUs are becoming increasingly common for general purpose computing in the area of medical imaging as they can execute algorithms faster by leveraging the parallel processing power they offer. However, the standard implementation of MI does not map well to the GPU architecture, leading earlier investigators to compute only an inexact version of MI on the GPU to achieve speedup. The bit-slicing technique we have proposed enables us to demonstrate an exact implementation of MI on the GPU without adversely affecting the speedup

    Multiobjective Optimization for Reconfigurable Implementation of Medical Image Registration

    Get PDF
    In real-time signal processing, a single application often has multiple computationally intensive kernels that can benefit from acceleration using custom or reconfigurable hardware platforms, such as field-programmable gate arrays (FPGAs). For adaptive utilization of resources at run time, FPGAs with capabilities for dynamic reconfiguration are emerging. In this context, it is useful for designers to derive sets of efficient configurations that trade off application performance with fabric resources. Such sets can be maintained at run time so that the best available design tradeoff is used. Finding a single, optimized configuration is difficult, and generating a family of optimized configurations suitable for different run-time scenarios is even more challenging. We present a novel multiobjective wordlength optimization strategy developed through FPGA-based implementation of a representative computationally intensive image processing application: medical image registration. Tradeoffs between FPGA resources and implementation accuracy are explored, and Pareto-optimized wordlength configurations are systematically identified. We also compare search methods for finding Pareto-optimized design configurations and demonstrate the applicability of search based on evolutionary techniques for identifying superior multiobjective tradeoff curves. We demonstrate feasibility of this approach in the context of FPGA-based medical image registration; however, it may be adapted to a wide range of signal processing applications

    FPGA-Accelerated Deformable Image Registration for Improved Target-Delineation During CT-Guided Interventions

    No full text

    Evaluation of the results of orthodontic treatment by non-rigid image registration and deformation-based morphometry

    Get PDF
    The goal of this research was to find out, whether the non-rigid registration of dental casts can be used in the evaluation of orthodontic treatment and to develop a program, which would at least partially automatize the evaluation process of images. The aim was also to experiment the evaluation of three-dimensional models of the casts. This research was delimited to cover only the evaluation of malocclusions within the dental arch. The relationships between the dental arches were not considered. This thesis was done in the University of Vaasa at the Department of Electrical Engineering and Energy Technology as a part of the HammasSkanneri research project, whose aim is to automatize the digitization and archiving of dental casts. This research used two-dimensional images of dental casts which were taken of orthodontically treated patients before and after orthodontic treatment. Non-rigid registration was performed by using a registration tool of Fiji software. The evaluation of the accuracy of the registration was performed by measuring distances between manually inserted landmarks, and by comparing the linear and angular parameters of the registered images and the original target images. The displacements of the teeth were approximated with the help of deformation-based morphometry. The accuracy of registration is within reasonable error limits, if the image is taken straight from above of the cast and the registration is performed with the help of landmarks inserted by a human. Estimation of the changes showed that the movement of teeth can be coarsely measured by using deformation-based morphometry based on change estimates that resemble the Jacobian estimates. A set of programs, which partially automatize the evaluation of the accuracy and the changes, were developed. Three-dimensional imaging of the casts was unsuccessful, and thus the development of 3D evaluation system was left as a future research topic.fi=Opinnäytetyö kokotekstinä PDF-muodossa.|en=Thesis fulltext in PDF format.|sv=Lärdomsprov tillgängligt som fulltext i PDF-format
    corecore