Search CORE

1,598 research outputs found

A C++-embedded Domain-Specific Language for programming the MORA soft processor array

Author: Chalamalasetti S.R.
Margala M.
Purohit S.
Vanderbauwhede W.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

MORA is a novel platform for high-level FPGA programming of streaming vector and matrix operations, aimed at multimedia applications. It consists of soft array of pipelined low-complexity SIMD processors-in-memory (PIM). We present a Domain-Specific Language (DSL) for high-level programming of the MORA soft processor array. The DSL is embedded in C++, providing designers with a familiar language framework and the ability to compile designs using a standard compiler for functional testing before generating the FPGA bitstream using the MORA toolchain. The paper discusses the MORA-C++ DSL and the compilation route into the assembly for the MORA machine and provides examples to illustrate the programming model and performance

Enlighten

Secure and efficient storage of multimedia: content in public cloud environments using joint compression and encryption

Author: Ferreira André Filipe Prata
Publication venue
Publication date: 01/01/2013
Field of study

The Cloud Computing is a paradigm still with many unexplored areas ranging from the technological component to the de nition of new business models, but that is revolutionizing the way we design, implement and manage the entire infrastructure of information technology. The Infrastructure as a Service is the delivery of computing infrastructure, typically a virtual data center, along with a set of APIs that allow applications, in an automatic way, can control the resources they wish to use. The choice of the service provider and how it applies to their business model may lead to higher or lower cost in the operation and maintenance of applications near the suppliers. In this sense, this work proposed to carry out a literature review on the topic of Cloud Computing, secure storage and transmission of multimedia content, using lossless compression, in public cloud environments, and implement this system by building an application that manages data in public cloud environments (dropbox and meocloud). An application was built during this dissertation that meets the objectives set. This system provides the user a wide range of functions of data management in public cloud environments, for that the user only have to login to the system with his/her credentials, after performing the login, through the Oauth 1.0 protocol (authorization protocol) is generated an access token, this token is generated only with the consent of the user and allows the application to get access to data/user les without having to use credentials. With this token the framework can now operate and unlock the full potential of its functions. With this application is also available to the user functions of compression and encryption so that user can make the most of his/her cloud storage system securely. The compression function works using the compression algorithm LZMA being only necessary for the user to choose the les to be compressed. Relatively to encryption it will be used the encryption algorithm AES (Advanced Encryption Standard) that works with a 128 bit symmetric key de ned by user. We build the research into two distinct and complementary parts: The rst part consists of the theoretical foundation and the second part is the development of computer application where the data is managed, compressed, stored, transmitted in various environments of cloud computing. The theoretical framework is organized into two chapters, chapter 2 - Background on Cloud Storage and chapter 3 - Data compression. Sought through theoretical foundation demonstrate the relevance of the research, convey some of the pertinent theories and input whenever possible, research in the area. The second part of the work was devoted to the development of the application in cloud environment. We showed how we generated the application, presented the features, advantages, and safety standards for the data. Finally, we re ect on the results, according to the theoretical framework made in the rst part and platform development. We think that the work obtained is positive and that ts the goals we set ourselves to achieve. This research has some limitations, we believe that the time for completion was scarce and the implementation of the platform could bene t from the implementation of other features.In future research it would be appropriate to continue the project expanding the capabilities of the application, test the operation with other users and make comparative tests.A Computação em nuvem é um paradigma ainda com muitas áreas por explorar que vão desde a componente tecnológica à definição de novos modelos de negócio, mas que está a revolucionar a forma como projetamos, implementamos e gerimos toda a infraestrutura da tecnologia da informação. A Infraestrutura como Serviço representa a disponibilização da infraestrutura computacional, tipicamente um datacenter virtual, juntamente com um conjunto de APls que permitirá que aplicações, de forma automática, possam controlar os recursos que pretendem utilizar_ A escolha do fornecedor de serviços e a forma como este aplica o seu modelo de negócio poderão determinar um maior ou menor custo na operacionalização e manutenção das aplicações junto dos fornecedores. Neste sentido, esta dissertação propôs· se efetuar uma revisão bibliográfica sobre a temática da Computação em nuvem, a transmissão e o armazenamento seguro de conteúdos multimédia, utilizando a compressão sem perdas, em ambientes em nuvem públicos, e implementar um sistema deste tipo através da construção de uma aplicação que faz a gestão dos dados em ambientes de nuvem pública (dropbox e meocloud). Foi construída uma aplicação no decorrer desta dissertação que vai de encontro aos objectivos definidos. Este sistema fornece ao utilizador uma variada gama de funções de gestão de dados em ambientes de nuvem pública, para isso o utilizador tem apenas que realizar o login no sistema com as suas credenciais, após a realização de login, através do protocolo Oauth 1.0 (protocolo de autorização) é gerado um token de acesso, este token só é gerado com o consentimento do utilizador e permite que a aplicação tenha acesso aos dados / ficheiros do utilizador ~em que seja necessário utilizar as credenciais. Com este token a aplicação pode agora operar e disponibilizar todo o potencial das suas funções. Com esta aplicação é também disponibilizado ao utilizador funções de compressão e encriptação de modo a que possa usufruir ao máximo do seu sistema de armazenamento cloud com segurança. A função de compressão funciona utilizando o algoritmo de compressão LZMA sendo apenas necessário que o utilizador escolha os ficheiros a comprimir. Relativamente à cifragem utilizamos o algoritmo AES (Advanced Encryption Standard) que funciona com uma chave simétrica de 128bits definida pelo utilizador. Alicerçámos a investigação em duas partes distintas e complementares: a primeira parte é composta pela fundamentação teórica e a segunda parte consiste no desenvolvimento da aplicação informática em que os dados são geridos, comprimidos, armazenados, transmitidos em vários ambientes de computação em nuvem. A fundamentação teórica encontra-se organizada em dois capítulos, o capítulo 2 - "Background on Cloud Storage" e o capítulo 3 "Data Compression", Procurámos, através da fundamentação teórica, demonstrar a pertinência da investigação. transmitir algumas das teorias pertinentes e introduzir, sempre que possível, investigações existentes na área. A segunda parte do trabalho foi dedicada ao desenvolvimento da aplicação em ambiente "cloud". Evidenciámos o modo como gerámos a aplicação, apresentámos as funcionalidades, as vantagens. Por fim, refletimos sobre os resultados , de acordo com o enquadramento teórico efetuado na primeira parte e o desenvolvimento da plataforma. Pensamos que o trabalho obtido é positivo e que se enquadra nos objetivos que nos propusemos atingir. Este trabalho de investigação apresenta algumas limitações, consideramos que o tempo para a sua execução foi escasso e a implementação da plataforma poderia beneficiar com a implementação de outras funcionalidades. Em investigações futuras seria pertinente dar continuidade ao projeto ampliando as potencialidades da aplicação, testar o funcionamento com outros utilizadores e efetuar testes comparativos.Fundação para a Ciência e a Tecnologia (FCT

UBibliorum repositorio digital da ubi

Efficient architectures of heterogeneous fpga-gpu for 3-d medical image compression

Author: Muharam Azlan
Publication venue
Publication date: 01/04/2019
Field of study

The advent of development in three-dimensional (3-D) imaging modalities have generated a massive amount of volumetric data in 3-D images such as magnetic resonance imaging (MRI), computed tomography (CT), positron emission tomography (PET), and ultrasound (US). Existing survey reveals the presence of a huge gap for further research in exploiting reconfigurable computing for 3-D medical image compression. This research proposes an FPGA based co-processing solution to accelerate the mentioned medical imaging system. The HWT block implemented on the sbRIO-9632 FPGA board is Spartan 3 (XC3S2000) chip prototyping board. Analysis and performance evaluation of the 3-D images were been conducted. Furthermore, a novel architecture of context-based adaptive binary arithmetic coder (CABAC) is the advanced entropy coding tool employed by main and higher profiles of H.264/AVC. This research focuses on GPU implementation of CABAC and comparative study of discrete wavelet transform (DWT) and without DWT for 3-D medical image compression systems. Implementation results on MRI and CT images, showing GPU significantly outperforming single-threaded CPU implementation. Overall, CT and MRI modalities with DWT outperform in term of compression ratio, peak signal to noise ratio (PSNR) and latency compared with images without DWT process. For heterogeneous computing, MRI images with various sizes and format, such as JPEG and DICOM was implemented. Evaluation results are shown for each memory iteration, transfer sizes from GPU to CPU consuming more bandwidth or throughput. For size 786, 486 bytes JPEG format, both directions consumed bandwidth tend to balance. Bandwidth is relative to the transfer size, the larger sizing will take more latency and throughput. Next, OpenCL implementation for concurrent task via dedicated FPGA. Finding from implementation reveals, OpenCL on batch procession mode with AOC techniques offers substantial results where the amount of logic, area, register and memory increased proportionally to the number of batch. It is because of the kernel will copy the kernel block refer to batch number. Therefore memory bank increased periodically related to kernel block. It was found through comparative study that the tree balance and unroll loop architecture provides better achievement, in term of local memory, latency and throughput

UTHM Institutional Repository

Design and Implementation of IDCT/IDST-Specific Accelerators for HEVC Standard on Heterogeneous Accelerator-Rich Platform

Author: Pourabed Mohammad Ali
Publication venue
Publication date: 08/05/2019
Field of study

Having High Efficiency Video Coding (HEVC) is important for image processing, reducing bandwidth, and increasing video quality. There are different methods that can be used to implement HEVC. This thesis focuses on design and implementation of application-specific accelerators for IDCT/IDST algorithms dedicated for HEVC standard. Those algorithms are parallel-in-nature tasks which makes them suitable to be executed by heterogeneous multicore platforms. This is done using accelerators which are required for power efficient processing. In this study, Coarse-Grained Reconfigurable Arrays (CGRAs) are used for making a template for an accelerator. CGRA has one of the major roles in a Heterogeneous Accelerator-Rich Platforms (HARP) as it is capable of accelerating non-parallel loops with lower loop counts. This thesis includes various algorithms for the use of IDCT and IDST with different designs and templates, reaching a unique final architecture. The final output intended is to reach 4 points IDST together with a 4/8 points IDCT. Another feature added to the hypothesis is the use of different dimensions for the CGRA template in order to have a different type of accelerator. The many CGRAs are combined together in successive arrangement with Reduced Instructions Set Computers (RISC) over the Network-on-Chip (NoC). The aim is to study the performance of the accelerator used for the IDCT and the IDST. This can be evaluated as the data movement through NoC network along with comparison of performance of accelerator with clock cycles in order to calculate the efficiency of the system. The results show that a four point IDST and IDCT can be computed in 56 clock cycles. In addition, the 8 point IDCT can be implemented in 64 cycles. One important factor to consider during the study is the power and energy consumption which is important in this century. The dynamic power dissipation usage for the routing of data has reached a value of 4.03 mW. Whereas, the energy consumption was 1.76

\mu

J for the 4 points system (IDCT and IDST) and 3.06

\mu

J for the 8 points (IDCT). Processing Elements (PEs) are used for implementing the transform algorithm and units were operated at 200 MHz. Finally, these results show that 1080P image at 30 frames per second can be attained by using FPGA

Trepo - Institutional Repository of Tampere University

High Level Design of adaptive distributed controller for Partial Dynamic reconfiguration in FPGA

Author: Cherif Sana
Dekeyser Jean-Luc
Meftali Samy
Trabelsi Chiraz
Publication venue: HAL CCSD
Publication date: 02/11/2011
Field of study

International audienceControlling dynamic and partial reconfigurations becomes one of the most important key issues in modern embedded systems design. In fact, in such systems, the reconfiguration controller can significantly affect the system performances. Indeed, the controller has to handle efficiently three major tasks during runtime: observation (monitoring), taking reconfiguration decisions and notify decisions to the rest of the system in order to realize it. We present in this paper a novel high level approach permitting to model, using MARTE UML profile, modular and flexible distributed controllers for dynamic reconfiguration management. This approach permits components/ models reuse and allows systematic code generation. It consequently makes reconfigurable systems design less tedious and reduces time to market

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

Efficient reconfigurable architectures for 3D medical image compression

Author: Afandi Ahmad
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2010
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Recently, the more widespread use of three-dimensional (3-D) imaging modalities, such as magnetic resonance imaging (MRI), computed tomography (CT), positron emission tomography (PET), and ultrasound (US) have generated a massive amount of volumetric data. These have provided an impetus to the development of other applications, in particular telemedicine and teleradiology. In these fields, medical image compression is important since both efficient storage and transmission of data through high-bandwidth digital communication lines are of crucial importance. Despite their advantages, most 3-D medical imaging algorithms are computationally intensive with matrix transformation as the most fundamental operation involved in the transform-based methods. Therefore, there is a real need for high-performance systems, whilst keeping architectures exible to allow for quick upgradeability with real-time applications. Moreover, in order to obtain efficient solutions for large medical volumes data, an efficient implementation of these operations is of significant importance. Reconfigurable hardware, in the form of field programmable gate arrays (FPGAs) has been proposed as viable system building block in the construction of high-performance systems at an economical price. Consequently, FPGAs seem an ideal candidate to harness and exploit their inherent advantages such as massive parallelism capabilities, multimillion gate counts, and special low-power packages. The key achievements of the work presented in this thesis are summarised as follows. Two architectures for 3-D Haar wavelet transform (HWT) have been proposed based on transpose-based computation and partial reconfiguration suitable for 3-D medical imaging applications. These applications require continuous hardware servicing, and as a result dynamic partial reconfiguration (DPR) has been introduced. Comparative study for both non-partial and partial reconfiguration implementation has shown that DPR offers many advantages and leads to a compelling solution for implementing computationally intensive applications such as 3-D medical image compression. Using DPR, several large systems are mapped to small hardware resources, and the area, power consumption as well as maximum frequency are optimised and improved. Moreover, an FPGA-based architecture of the finite Radon transform (FRAT)with three design strategies has been proposed: direct implementation of pseudo-code with a sequential or pipelined description, and block random access memory (BRAM)- based method. An analysis with various medical imaging modalities has been carried out. Results obtained for image de-noising implementation using FRAT exhibits promising results in reducing Gaussian white noise in medical images. In terms of hardware implementation, promising trade-offs on maximum frequency, throughput and area are also achieved. Furthermore, a novel hardware implementation of 3-D medical image compression system with context-based adaptive variable length coding (CAVLC) has been proposed. An evaluation of the 3-D integer transform (IT) and the discrete wavelet transform (DWT) with lifting scheme (LS) for transform blocks reveal that 3-D IT demonstrates better computational complexity than the 3-D DWT, whilst the 3-D DWT with LS exhibits a lossless compression that is significantly useful for medical image compression. Additionally, an architecture of CAVLC that is capable of compressing high-definition (HD) images in real-time without any buffer between the quantiser and the entropy coder is proposed. Through a judicious parallelisation, promising results have been obtained with limited resources. In summary, this research is tackling the issues of massive 3-D medical volumes data that requires compression as well as hardware implementation to accelerate the slowest operations in the system. Results obtained also reveal a significant achievement in terms of the architecture efficiency and applications performance.Ministry of Higher Education Malaysia (MOHE), Universiti Tun Hussein Onn Malaysia (UTHM) and the British Counci

UTHM Institutional Repository

Brunel University Research Archive

Energy efficient hardware acceleration of multimedia processing tools

Author: Kinane Andrew
Publication venue: Dublin City University. School of Electronic Engineering
Publication date: 01/01/2006
Field of study

The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores. To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature. The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings

Irish Universities

DCU Online Research Access Service

Improving the Efficacy of Context-Aware Applications

Author: Hammer Jon C
Publication venue: ScholarWorks@UARK
Publication date: 01/05/2018
Field of study

In this dissertation, we explore methods for enhancing the context-awareness capabilities of modern computers, including mobile devices, tablets, wearables, and traditional computers. Advancements include proposed methods for fusing information from multiple logical sensors, localizing nearby objects using depth sensors, and building models to better understand the content of 2D images. First, we propose a system called Unagi, designed to incorporate multiple logical sensors into a single framework that allows context-aware application developers to easily test new ideas and create novel experiences. Unagi is responsible for collecting data, extracting features, and building personalized models for each individual user. We demonstrate the utility of the system with two applications: adaptive notification filtering and a network content prefetcher. We also thoroughly evaluate the system with respect to predictive accuracy, temporal delay, and power consumption. Next, we discuss a set of techniques that can be used to accurately determine the location of objects near a user in 3D space using a mobile device equipped with both depth and inertial sensors. Using a novel chaining approach, we are able to locate objects farther away than the standard range of the depth sensor without compromising localization accuracy. Empirical testing shows our method is capable of localizing objects 30m from the user with an error of less than 10cm. Finally, we demonstrate a set of techniques that allow a multi-layer perceptron (MLP) to learn resolution-invariant representations of 2D images, including the proposal of an MCMC-based technique to improve the selection of pixels for mini-batches used for training. We also show that a deep convolutional encoder could be trained to output a resolution-independent representation in constant time, and we discuss several potential applications of this research, including image resampling, image compression, and security

ScholarWorks@UARK

UARK (University of Arkansas )