296 research outputs found

    Experimental Benchmarks and Initial Evaluation of the Performance of the PASM System Prototype

    Get PDF
    The work reported here represents experiences with the PASM parallel processing system prototype during its first operational year. Most of the experiments were performed by students in the Fall semester of 1987. The first programming, and the first timing measurements, were made during the summer of 1987 by Sam Fineberg. The goal of the collection of experiments presented here was to undertake an Application-driven Architecture Study of the PASM system as a paradigm for parallel architecture evaluation in general. PASM was an excellent vehicle for experimenting with this evaluation technique due to its unique architectural features. Among these are: 1. A reconfigurable, partitionable multistage circuit-switched network. 2. Support for both SIMD and MIMD programs. 3. Ability to execute hybrid SIMD/MIMD programs. 4. An instruction queue which allows overlap of control-flow and data manipulation between micro-control (MC) units and processing elements (PE). It had been hypothesized that superlinear speed-up over the number of PEs could be attained with this feature, and experimental results verified this. 5. Support for barrier synchronization of MIMD tasks. This feature was exploited in some non-standard ways to show the ability to decouple variant length SIMD instructions into multiple MIMD streams for an overall performance benefit. This type of study is expected to continue in the future on PASM and other parallel machines at Purdue. This report should serve as a guide for this future work as well

    Advancements and Breakthroughs in Ultrasound Imaging

    Get PDF
    Ultrasonic imaging is a powerful diagnostic tool available to medical practitioners, engineers and researchers today. Due to the relative safety, and the non-invasive nature, ultrasonic imaging has become one of the most rapidly advancing technologies. These rapid advances are directly related to the parallel advancements in electronics, computing, and transducer technology together with sophisticated signal processing techniques. This book focuses on state of the art developments in ultrasonic imaging applications and underlying technologies presented by leading practitioners and researchers from many parts of the world

    Comparison of diffusion tensor imaging by cardiovascular magnetic resonance and gadolinium enhanced 3D image intensity approaches to investigation of structural anisotropy in explanted rat hearts

    Get PDF
    Background: Cardiovascular magnetic resonance (CMR) can through the two methods 3D FLASH and diffusion tensor imaging (DTI) give complementary information on the local orientations of cardiomyocytes and their laminar arrays. Methods: Eight explanted rat hearts were perfused with Gd-DTPA contrast agent and fixative and imaged in a 9.4T magnet by two types of acquisition: 3D fast low angle shot (FLASH) imaging, voxels 50 × 50 × 50 μm, and 3D spin echo DTI with monopolar diffusion gradients of 3.6 ms duration at 11.5 ms separation, voxels 200 × 200 × 200 μm. The sensitivity of each approach to imaging parameters was explored. Results:The FLASH data showed laminar alignments of voxels with high signal, in keeping with the presumed predominance of contrast in the interstices between sheetlets. It was analysed, using structure-tensor (ST) analysis, to determine the most (v 1 ST ), intermediate (v 2 ST ) and least (v 3 ST ) extended orthogonal directions of signal continuity. The DTI data was analysed to determine the most (e 1 DTI ), intermediate (e 2 DTI ) and least (e 3 DTI ) orthogonal eigenvectors of extent of diffusion. The correspondence between the FLASH and DTI methods was measured and appraised. The most extended direction of FLASH signal (v 1 ST ) agreed well with that of diffusion (e 1 DTI ) throughout the left ventricle (representative discrepancy in the septum of 13.3 ± 6.7°: median ± absolute deviation) and both were in keeping with the expected local orientations of the long-axis of cardiomyocytes. However, the orientation of the least directions of FLASH signal continuity (v 3 ST ) and diffusion (e 3 ST ) showed greater discrepancies of up to 27.9 ± 17.4°. Both FLASH (v 3 ST ) and DTI (e 3 DTI ) where compared to directly measured laminar arrays in the FLASH images. For FLASH the discrepancy between the structure-tensor calculated v 3 ST and the directly measured FLASH laminar array normal was of 9 ± 7° for the lateral wall and 7 ± 9° for the septum (median ± inter quartile range), and for DTI the discrepancy between the calculated v 3 DTI and the directly measured FLASH laminar array normal was 22 ± 14° and 61 ± 53.4°. DTI was relatively insensitive to the number of diffusion directions and to time up to 72 hours post fixation, but was moderately affected by b-value (which was scaled by modifying diffusion gradient pulse strength with fixed gradient pulse separation). Optimal DTI parameters were b = 1000 mm/s2 and 12 diffusion directions. FLASH acquisitions were relatively insensitive to the image processing parameters explored. Conclusions: We show that ST analysis of FLASH is a useful and accurate tool in the measurement of cardiac microstructure. While both FLASH and the DTI approaches appear promising for mapping of the alignments of myocytes throughout myocardium, marked discrepancies between the cross myocyte anisotropies deduced from each method call for consideration of their respective limitations

    Data compression techniques applied to high resolution high frame rate video technology

    Get PDF
    An investigation is presented of video data compression applied to microgravity space experiments using High Resolution High Frame Rate Video Technology (HHVT). An extensive survey of methods of video data compression, described in the open literature, was conducted. The survey examines compression methods employing digital computing. The results of the survey are presented. They include a description of each method and assessment of image degradation and video data parameters. An assessment is made of present and near term future technology for implementation of video data compression in high speed imaging system. Results of the assessment are discussed and summarized. The results of a study of a baseline HHVT video system, and approaches for implementation of video data compression, are presented. Case studies of three microgravity experiments are presented and specific compression techniques and implementations are recommended

    Latency and accuracy optimized mobile face detection

    Get PDF
    Abstract. Face detection is a preprocessing step in many computer vision applications. Important factors are accuracy, inference duration, and energy efficiency of the detection framework. Computationally light detectors that execute in real-time are a requirement for many application areas, such as face tracking and recognition. Typical operating platforms in everyday use are smartphones and embedded devices, which have limited computation capacity. The capability of face detectors is comparable to the ability of a human in easy detection tasks. When the conditions change, the challenges become different. Current challenges in face detection include atypically posed and tiny faces. Partially occluded faces and dim or bright environments pose challenges for detection systems. State-of-the-art performance in face detection research employs deep learning methods called neural networks, which loosely imitate the mammalian brain system. The most relevant technologies are convolutional neural networks, which are designed for local feature description. In this thesis, the main computational optimization approach is neural network quantization. The network models were delegated to digital signal processors and graphics processing units. Quantization was shown to reduce the latency of computation substantially. The most energy-efficient inference was achieved through digital signal processor delegation. Multithreading was used for inference acceleration. It reduced the amount of energy consumption per algorithm run.Latenssi- ja tarkkuusoptimoitu kasvontunnistus mobiililaitteilla. Tiivistelmä. Kasvojen ilmaisu on esikäsittelyvaihe monelle konenäön sovellukselle. Tärkeitä kasvoilmaisimen ominaisuuksia ovat tarkkuus, energiatehokkuus ja suoritusnopeus. Monet sovellukset vaativat laskennallisesti kevyitä ilmaisimia, jotka toimivat reaaliajassa. Esimerkkejä sovelluksista ovat kasvojen seuranta- ja tunnistusjärjestelmät. Yleisiä käyttöalustoja ovat älypuhelimet ja sulautetut järjestelmät, joiden laskentakapasiteetti on rajallinen. Kasvonilmaisimien tarkkuus vastaa ihmisen kykyä helpoissa ilmaisuissa. Nykyiset ongelmat kasvojen ilmaisussa liittyvät epätyypillisiin asentoihin ja erityisen pieniin kasvokokoihin. Myös kasvojen osittainen peittyminen, ja pimeät ja kirkkaat ympäristöt, vaikeuttavat ilmaisua. Neuroverkkoja käytetään tekoälyjärjestelmissä, joiden lähtökohtana on ollut mallintaa nisäkkäiden aivojen toimintaa. Konvoluutiopohjaiset neuroverkot ovat erikoistuneet paikallisten piirteiden analysointiin. Tässä opinnäytetyössä käytetty laskennallisen optimoinnin menetelmä on neuroverkkojen kvantisointi. Neuroverkkojen ajo delegoitiin digitaalisille signaalinkäsittely- ja grafiikkasuorittimille. Kvantisoinnin osoitettiin vähentävän laskenta-aikaa huomattavasti ja suurin energiatehokkuus saavutettiin digitaalisen signaaliprosessorin avulla. Suoritusnopeutta lisättiin monisäikeistyksellä, jonka havaittiin vähentävän energiankulutusta

    Simulating Developmental Cardiac Morphology in Virtual Reality Using a Deformable Image Registration Approach

    Get PDF
    While virtual reality (VR) has potential in enhancing cardiovascular diagnosis and treatment, prerequisite labor-intensive image segmentation remains an obstacle for seamlessly simulating 4-dimensional (4-D, 3-D + time) imaging data in an immersive, physiological VR environment. We applied deformable image registration (DIR) in conjunction with 3-D reconstruction and VR implementation to recapitulate developmental cardiac contractile function from light-sheet fluorescence microscopy (LSFM). This method addressed inconsistencies that would arise from independent segmentations of time-dependent data, thereby enabling the creation of a VR environment that fluently simulates cardiac morphological changes. By analyzing myocardial deformation at high spatiotemporal resolution, we interfaced quantitative computations with 4-D VR. We demonstrated that our LSFM-captured images, followed by DIR, yielded average dice similarity coefficients of 0.92 ± 0.05 (n = 510) and 0.93 ± 0.06 (n = 240) when compared to ground truth images obtained from Otsu thresholding and manual segmentation, respectively. The resulting VR environment simulates a wide-angle zoomed-in view of motion in live embryonic zebrafish hearts, in which the cardiac chambers are undergoing structural deformation throughout the cardiac cycle. Thus, this technique allows for an interactive micro-scale VR visualization of developmental cardiac morphology to enable high resolution simulation for both basic and clinical science

    Indexed dependence metadata and its applications in software performance optimisation

    No full text
    To achieve continued performance improvements, modern microprocessor design is tending to concentrate an increasing proportion of hardware on computation units with less automatic management of data movement and extraction of parallelism. As a result, architectures increasingly include multiple computation cores and complicated, software-managed memory hierarchies. Compilers have difficulty characterizing the behaviour of a kernel in a general enough manner to enable automatic generation of efficient code in any but the most straightforward of cases. We propose the concept of indexed dependence metadata to improve application development and mapping onto such architectures. The metadata represent both the iteration space of a kernel and the mapping of that iteration space from a given index to the set of data elements that iteration might use: thus the dependence metadata is indexed by the kernel’s iteration space. This explicit mapping allows the compiler or runtime to optimise the program more efficiently, and improves the program structure for the developer. We argue that this form of explicit interface specification reduces the need for premature, architecture-specific optimisation. It improves program portability, supports intercomponent optimisation and enables generation of efficient data movement code. We offer the following contributions: an introduction to the concept of indexed dependence metadata as a generalisation of stream programming, a demonstration of its advantages in a component programming system, the decoupled access/execute model for C++ programs, and how indexed dependence metadata might be used to improve the programming model for GPU-based designs. Our experimental results with prototype implementations show that indexed dependence metadata supports automatic synthesis of double-buffered data movement for the Cell processor and enables aggressive loop fusion optimisations in image processing, linear algebra and multigrid application case studies

    Mathematics and Digital Signal Processing

    Get PDF
    Modern computer technology has opened up new opportunities for the development of digital signal processing methods. The applications of digital signal processing have expanded significantly and today include audio and speech processing, sonar, radar, and other sensor array processing, spectral density estimation, statistical signal processing, digital image processing, signal processing for telecommunications, control systems, biomedical engineering, and seismology, among others. This Special Issue is aimed at wide coverage of the problems of digital signal processing, from mathematical modeling to the implementation of problem-oriented systems. The basis of digital signal processing is digital filtering. Wavelet analysis implements multiscale signal processing and is used to solve applied problems of de-noising and compression. Processing of visual information, including image and video processing and pattern recognition, is actively used in robotic systems and industrial processes control today. Improving digital signal processing circuits and developing new signal processing systems can improve the technical characteristics of many digital devices. The development of new methods of artificial intelligence, including artificial neural networks and brain-computer interfaces, opens up new prospects for the creation of smart technology. This Special Issue contains the latest technological developments in mathematics and digital signal processing. The stated results are of interest to researchers in the field of applied mathematics and developers of modern digital signal processing systems
    corecore