467 research outputs found

    DPP-PMRF: Rethinking Optimization for a Probabilistic Graphical Model Using Data-Parallel Primitives

    Full text link
    We present a new parallel algorithm for probabilistic graphical model optimization. The algorithm relies on data-parallel primitives (DPPs), which provide portable performance over hardware architecture. We evaluate results on CPUs and GPUs for an image segmentation problem. Compared to a serial baseline, we observe runtime speedups of up to 13X (CPU) and 44X (GPU). We also compare our performance to a reference, OpenMP-based algorithm, and find speedups of up to 7X (CPU).Comment: LDAV 2018, October 201

    Primitives and design of the intelligent pixel multimedia communicator

    Get PDF
    Communication systems arc an ever more essential component of our modern global society. Mobile communications systems are still in a state of rapid advancement and growth. Technology is constantly evolving at a rapid pace in ever more diverse areas and the emerging mobile multimedia based communication systems offer new challenges for both current and future technologies. To realise the full potential of mobile multimedia communication systems there is a need to explore new options to solve some of the fundamental problems facing the technology. In particular, the complexity of such a system within an infrastructure framework that is inherently limited by its power sources and has very restricted transmission bandwidth demands new methodologies and approaches

    Enabling human physiological sensing by leveraging intelligent head-worn wearable systems

    Get PDF
    This thesis explores the challenges of enabling human physiological sensing by leveraging head-worn wearable computer systems. In particular, we want to answer a fundamental question, i.e., could we leverage head-worn wearables to enable accurate and socially-acceptable solutions to improve human healthcare and prevent life-threatening conditions in our daily lives? To that end, we will study the techniques that utilise the unique advantages of wearable computers to (1) facilitate new sensing capabilities to capture various biosignals from the brain, the eyes, facial muscles, sweat glands, and blood vessels, (2) address motion artefacts and environmental noise in real-time with signal processing algorithms and hardware design techniques, and (3) enable long-term, high-fidelity biosignal monitoring with efficient on-chip intelligence and pattern-driven compressive sensing algorithms. We first demonstrate the ability to capture the activities of the user's brain, eyes, facial muscles, and sweat glands by proposing WAKE, a novel behind-the-ear biosignal sensing wearable. By studying the human anatomy in the ear area, we propose a wearable design to capture brain waves (EEG), eye movements (EOG), facial muscle contractions (EMG), and sweat gland activities (EDA) with a minimal number of sensors. Furthermore, we introduce a Three-fold Cascaded Amplifying (3CA) technique and signal processing algorithms to tame the motion artefacts and environmental noises for capturing high-fidelity signals in real time. We devise a machine-learning model based on the captured signals to detect microsleep with a high temporal resolution. Second, we will discuss our work on developing an efficient Pattern-dRiven Compressive Sensing framework (PROS) to enable long-term biosignal monitoring on low-power wearables. The system introduces tiny on-chip pattern recognition primitives (TinyPR) and a novel pattern-driven compressive sensing technique (PDCS) that exploits the sparsity of biosignals. They provide the ability to capture high-fidelity biosignals with an ultra-low power footprint. This development will unlock long-term healthcare applications on wearable computers, such as epileptic seizure monitoring, microsleep detection, etc. These applications were previously impractical on energy and resource-constrained wearable computers due to the limited battery lifetime, slow response rate, and inadequate biosignal quality. Finally, we will further explore the possibility of capturing the activities of a blood vessel (i.e., superficial temporal artery) lying deep inside the user's ear using an ear-worn wearable computer. The captured optical pulse signals (PPG) are used to develop a frequent and comfortable blood pressure monitoring system called eBP. In contrast to existing devices, eBP introduces a novel in-ear wearable system design and algorithms to eliminate the need to block the blood flow inside the ear, alleviating the user's discomfort

    Multiple dataset visualization (MDV) framework for scalar volume data

    Get PDF
    Many applications require comparative analysis of multiple datasets representing different samples, conditions, time instants, or views in order to develop a better understanding of the scientific problem/system under consideration. One effective approach for such analysis is visualization of the data. In this PhD thesis, we propose an innovative multiple dataset visualization (MDV) approach in which two or more datasets of a given type are rendered concurrently in the same visualization. MDV is an important concept for the cases where it is not possible to make an inference based on one dataset, and comparisons between many datasets are required to reveal cross-correlations among them. The proposed MDV framework, which deals with some fundamental issues that arise when several datasets are visualized together, follows a multithreaded architecture consisting of three core components, data preparation/loading, visualization and rendering. The visualization module - the major focus of this study, currently deals with isosurface extraction and texture-based rendering techniques. For isosurface extraction, our all-in-memory approach keeps datasets under consideration and the corresponding geometric data in the memory. Alternatively, the only-polygons- or points-in-memory only keeps the geometric data in memory. To address the issues related to storage and computation, we develop adaptive data coherency and multiresolution schemes. The inter-dataset coherency scheme exploits the similarities among datasets to approximate the portions of isosurfaces of datasets using the isosurface of one or more reference datasets whereas the intra/inter-dataset multiresolution scheme processes the selected portions of each data volume at varying levels of resolution. The graphics hardware-accelerated approaches adopted for MDV include volume clipping, isosurface extraction and volume rendering, which use 3D textures and advanced per fragment operations. With appropriate user-defined threshold criteria, we find that various MDV techniques maintain a linear time-N relationship, improve the geometry generation and rendering time, and increase the maximum N that can be handled (N: number of datasets). Finally, we justify the effectiveness and usefulness of the proposed MDV by visualizing 3D scalar data (representing electron density distributions in magnesium oxide and magnesium silicate) from parallel quantum mechanical simulation

    Hardware Acceleration of the Embedded Zerotree Wavelet Algorithm

    Get PDF
    The goal of this project was to gain experience in designing and implementing a microelectronic system to acclerate the execution of a time-consuming software algorithm, the Embedded Zerotree Wavelet (EZW), which is used in multimedia applications. The algorithm was implemented using MATLAB to be certain it was fully understood and to serve as a validation reference. Then, the algorithm was mapped into a hardware description language, VHDL, and its resulting implementation verified with the golden reference. The hardware description was then targeted to a field-programmable gate array (FPGA). Significant acceleration was achieved since the hardware implementation in a FPGA (Xilinx Virtex-1000E using a 8.315 MHz clock) ran 10,000 times faster than the MATLAB implementation on a SUN-220 workstation. Additional speedup exploiting the parallel capabilities of the FPGA was not achieved since the EZW algorithm utilizes only sequential operations

    GaAs Implementation of FIR Filter

    Get PDF
    This thesis discusses the findings of the final year project involving Gallium Arsenide implementation of a triangular FIR filter to perform discrete wavelet transforms. The overall characteristics of Gallium Arsenide technology- its construction, behaviour and electrical charactersitics as they apply to VLSI technology - were investigated in this project. In depth understanding of its architecture is required to be able to understand the various design techniques employed. A comparison of Silicon and GaAs performance and other characteristics has also been made to fully justify the choice of this material for system implementation. A lot of research and active interest has gone into the field of image and video compression. Wavelet-based image transformation is one of the very efficient compression techniques used. An analysis of discrete wavelet transformations and the required triangular FIR filter was done to be able to produce a transform algorithm and the related filter architecture. Finally, the filter architecture was implemented as a VLSI design and layout. A variety of functional blocks required for the architecture were designed, tested and analysed. All these blocks were integrated to produce a model of a complete filter cell. The filter implementation was designed to be self-timed - without a system clock. Self-timed systems have considerable advantages over clocked architectures. Various design styles and handshaking mechanisms involved in designing a self-timed system were analysed and designed. There are many avenues still to explore. One of them is the VHDL analysis of filter architecture. Further development on this project would involve integration of higher-level logic and formation of a complete filter array

    Performance Optimization Strategies for Transactional Memory Applications

    Get PDF
    This thesis presents tools for Transactional Memory (TM) applications that cover multiple TM systems (Software, Hardware, and hybrid TM) and use information of all different layers of the TM software stack. Therefore, this thesis addresses a number of challenges to extract static information, information about the run time behavior, and expert-level knowledge to develop these new methods and strategies for the optimization of TM applications

    Efficient reconfigurable architectures for 3D medical image compression

    Get PDF
    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Recently, the more widespread use of three-dimensional (3-D) imaging modalities, such as magnetic resonance imaging (MRI), computed tomography (CT), positron emission tomography (PET), and ultrasound (US) have generated a massive amount of volumetric data. These have provided an impetus to the development of other applications, in particular telemedicine and teleradiology. In these fields, medical image compression is important since both efficient storage and transmission of data through high-bandwidth digital communication lines are of crucial importance. Despite their advantages, most 3-D medical imaging algorithms are computationally intensive with matrix transformation as the most fundamental operation involved in the transform-based methods. Therefore, there is a real need for high-performance systems, whilst keeping architectures exible to allow for quick upgradeability with real-time applications. Moreover, in order to obtain efficient solutions for large medical volumes data, an efficient implementation of these operations is of significant importance. Reconfigurable hardware, in the form of field programmable gate arrays (FPGAs) has been proposed as viable system building block in the construction of high-performance systems at an economical price. Consequently, FPGAs seem an ideal candidate to harness and exploit their inherent advantages such as massive parallelism capabilities, multimillion gate counts, and special low-power packages. The key achievements of the work presented in this thesis are summarised as follows. Two architectures for 3-D Haar wavelet transform (HWT) have been proposed based on transpose-based computation and partial reconfiguration suitable for 3-D medical imaging applications. These applications require continuous hardware servicing, and as a result dynamic partial reconfiguration (DPR) has been introduced. Comparative study for both non-partial and partial reconfiguration implementation has shown that DPR offers many advantages and leads to a compelling solution for implementing computationally intensive applications such as 3-D medical image compression. Using DPR, several large systems are mapped to small hardware resources, and the area, power consumption as well as maximum frequency are optimised and improved. Moreover, an FPGA-based architecture of the finite Radon transform (FRAT)with three design strategies has been proposed: direct implementation of pseudo-code with a sequential or pipelined description, and block random access memory (BRAM)- based method. An analysis with various medical imaging modalities has been carried out. Results obtained for image de-noising implementation using FRAT exhibits promising results in reducing Gaussian white noise in medical images. In terms of hardware implementation, promising trade-offs on maximum frequency, throughput and area are also achieved. Furthermore, a novel hardware implementation of 3-D medical image compression system with context-based adaptive variable length coding (CAVLC) has been proposed. An evaluation of the 3-D integer transform (IT) and the discrete wavelet transform (DWT) with lifting scheme (LS) for transform blocks reveal that 3-D IT demonstrates better computational complexity than the 3-D DWT, whilst the 3-D DWT with LS exhibits a lossless compression that is significantly useful for medical image compression. Additionally, an architecture of CAVLC that is capable of compressing high-definition (HD) images in real-time without any buffer between the quantiser and the entropy coder is proposed. Through a judicious parallelisation, promising results have been obtained with limited resources. In summary, this research is tackling the issues of massive 3-D medical volumes data that requires compression as well as hardware implementation to accelerate the slowest operations in the system. Results obtained also reveal a significant achievement in terms of the architecture efficiency and applications performance.Ministry of Higher Education Malaysia (MOHE), Universiti Tun Hussein Onn Malaysia (UTHM) and the British Counci

    Algorithms and Architectures for Secure Embedded Multimedia Systems

    Get PDF
    Embedded multimedia systems provide real-time video support for applications in entertainment (mobile phones, internet video websites), defense (video-surveillance and tracking) and public-domain (tele-medicine, remote and distant learning, traffic monitoring and management). With the widespread deployment of such real-time embedded systems, there has been an increasing concern over the security and authentication of concerned multimedia data. While several (software) algorithms and hardware architectures have been proposed in the research literature to support multimedia security, these fail to address embedded applications whose performance specifications have tighter constraints on computational power and available hardware resources. The goals of this dissertation research are two fold: 1. To develop novel algorithms for joint video compression and encryption. The proposed algorithms reduce the computational requirements of multimedia encryption algorithms. We propose an approach that uses the compression parameters instead of compressed bitstream for video encryption. 2. Hardware acceleration of proposed algorithms over reconfigurable computing platforms such as FPGA and over VLSI circuits. We use signal processing knowledge to make the algorithms suitable for hardware optimizations and try to reduce the critical path of circuits using hardware-specific optimizations. The proposed algorithms ensures a considerable level of security for low-power embedded systems such as portable video players and surveillance cameras. These schemes have zero or little compression losses and preserve the desired properties of compressed bitstream in encrypted bitstream to ensure secure and scalable transmission of videos over heterogeneous networks. They also support indexing, search and retrieval in secure multimedia digital libraries. This property is crucial not only for police and armed forces to retrieve information about a suspect from a large video database of surveillance feeds, but extremely helpful for data centers (such as those used by youtube, aol and metacafe) in reducing the computation cost in search and retrieval of desired videos

    Multiresolution Techniques for Real–Time Visualization of Urban Environments and Terrains

    Get PDF
    In recent times we are witnessing a steep increase in the availability of data coming from real–life environments. Nowadays, virtually everyone connected to the Internet may have instant access to a tremendous amount of data coming from satellite elevation maps, airborne time-of-flight scanners and digital cameras, street–level photographs and even cadastral maps. As for other, more traditional types of media such as pictures and videos, users of digital exploration softwares expect commodity hardware to exhibit good performance for interactive purposes, regardless of the dataset size. In this thesis we propose novel solutions to the problem of rendering large terrain and urban models on commodity platforms, both for local and remote exploration. Our solutions build on the concept of multiresolution representation, where alternative representations of the same data with different accuracy are used to selectively distribute the computational power, and consequently the visual accuracy, where it is more needed on the base of the user’s point of view. In particular, we will introduce an efficient multiresolution data compression technique for planar and spherical surfaces applied to terrain datasets which is able to handle huge amount of information at a planetary scale. We will also describe a novel data structure for compact storage and rendering of urban entities such as buildings to allow real–time exploration of cityscapes from a remote online repository. Moreover, we will show how recent technologies can be exploited to transparently integrate virtual exploration and general computer graphics techniques with web applications
    • …
    corecore