71,047 research outputs found

    Introducing the Quantum Research Kernels: Lessons from Classical Parallel Computing

    Full text link
    Quantum computing represents a paradigm shift for computation requiring an entirely new computer architecture. However, there is much that can be learned from traditional classical computer engineering. In this paper, we describe the Parallel Research Kernels (PRK), a tool that was very useful for designing classical parallel computing systems. The PRK are simple kernels written to expose bottlenecks that limit classical parallel computing performance. We hypothesize that an analogous tool for quantum computing, Quantum Research Kernels (QRK), may similarly aid the co-design of software and hardware for quantum computing systems, and we give a few examples of representative QRKs.Comment: 2 page

    Hard real-time, pixel-parallel rendering of light field videos using steered mixture-of-experts

    Get PDF
    Steered Mixture-of-Experts (SMoE) is a novel framework for the approximation, coding, and description of image modalities such as light field images and video. The future goal is to arrive at a representation for Six Degrees-of-Freedom (6DoF) image data. Previous research has shown the feasibility of real-time pixel-parallel rendering of static light field images. Each pixel is independently reconstructed by kernels that lay in its vicinity. The number of kernels involved forms the bottleneck on the achievable framerate. The goal of this paper is twofold. Firstly, we introduce pixel-level rendering of light field video, as previous work only rendered static content. Secondly, we investigate rendering using a predefined number of most significant kernels. As such, we can deliver hard real-time constraints by trading off the reconstruction quality

    Static partitioning and mapping of kernel-based applications over modern heterogeneous architectures

    Get PDF
    Heterogeneous Architectures Are Being Used Extensively To Improve System Processing Capabilities. Critical Functions Of Each Application (Kernels) Can Be Mapped To Different Computing Devices (I.E. Cpus, Gpgpus, Accelerators) To Maximize Performance. However, Best Performance Can Only Be Achieved If Kernels Are Accurately Mapped To The Right Device. Moreover, In Some Cases Those Kernels Could Be Split And Executed Over Several Devices At The Same Time To Maximize The Use Of Compute Resources On Heterogeneous Parallel Architectures. In This Paper, We Define A Static Partitioning Model Based On Profiling Information From Previous Executions. This Model Follows A Quantitative Model Approach Which Computes The Optimal Match According To User-Defined Constraints. We Test Different Scenarios To Evaluate Our Model: Single Kernel And Multi-Kernel Applications. Experimental Results Show That Our Static Partitioning Model Could Increase Performance Of Parallel Applications By Deploying Not Only Different Kernels Over Different Devices But A Single Kernel Over Multiple Devices. This Allows To Avoid Having Idle Compute Resources On Heterogeneous Platforms, As Well As Enhancing The Overall Performance. (C) 2015 Elsevier B.V. All Rights Reserved.The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007–2013) under grant agreement n. 609666 [24]

    Steered mixture-of-experts for light field images and video : representation and coding

    Get PDF
    Research in light field (LF) processing has heavily increased over the last decade. This is largely driven by the desire to achieve the same level of immersion and navigational freedom for camera-captured scenes as it is currently available for CGI content. Standardization organizations such as MPEG and JPEG continue to follow conventional coding paradigms in which viewpoints are discretely represented on 2-D regular grids. These grids are then further decorrelated through hybrid DPCM/transform techniques. However, these 2-D regular grids are less suited for high-dimensional data, such as LFs. We propose a novel coding framework for higher-dimensional image modalities, called Steered Mixture-of-Experts (SMoE). Coherent areas in the higher-dimensional space are represented by single higher-dimensional entities, called kernels. These kernels hold spatially localized information about light rays at any angle arriving at a certain region. The global model consists thus of a set of kernels which define a continuous approximation of the underlying plenoptic function. We introduce the theory of SMoE and illustrate its application for 2-D images, 4-D LF images, and 5-D LF video. We also propose an efficient coding strategy to convert the model parameters into a bitstream. Even without provisions for high-frequency information, the proposed method performs comparable to the state of the art for low-to-mid range bitrates with respect to subjective visual quality of 4-D LF images. In case of 5-D LF video, we observe superior decorrelation and coding performance with coding gains of a factor of 4x in bitrate for the same quality. At least equally important is the fact that our method inherently has desired functionality for LF rendering which is lacking in other state-of-the-art techniques: (1) full zero-delay random access, (2) light-weight pixel-parallel view reconstruction, and (3) intrinsic view interpolation and super-resolution

    FASTCUDA: Open Source FPGA Accelerator & Hardware-Software Codesign Toolset for CUDA Kernels

    Get PDF
    Using FPGAs as hardware accelerators that communicate with a central CPU is becoming a common practice in the embedded design world but there is no standard methodology and toolset to facilitate this path yet. On the other hand, languages such as CUDA and OpenCL provide standard development environments for Graphical Processing Unit (GPU) programming. FASTCUDA is a platform that provides the necessary software toolset, hardware architecture, and design methodology to efficiently adapt the CUDA approach into a new FPGA design flow. With FASTCUDA, the CUDA kernels of a CUDA-based application are partitioned into two groups with minimal user intervention: those that are compiled and executed in parallel software, and those that are synthesized and implemented in hardware. A modern low power FPGA can provide the processing power (via numerous embedded micro-CPUs) and the logic capacity for both the software and hardware implementations of the CUDA kernels. This paper describes the system requirements and the architectural decisions behind the FASTCUDA approach

    HeteroCore GPU to exploit TLP-resource diversity

    Get PDF

    Kernel-based Image Reconstruction from Scattered Radon Data

    Get PDF
    Computerized tomography requires suitable numerical methods for the approximation of a bivariate function f from a finite set of discrete Radon data, each of whose data samples represents one line integral of f . In standard reconstruction methods, specific assumptions concerning the geometry of the Radon lines are usually made. In relevant applications of image reconstruction, however, such assumptions are often too restrictive. In this case, one would rather prefer to work with reconstruction methods allowing for arbitrary distributions of scattered Radon lines. This paper proposes a novel image reconstruction method for scattered Radon data, which combines kernel-based scattered data approximation with a well-adapted regularization of the Radon transform. This results in a very flexible numerical algorithm for image reconstruction, which works for arbitrary distributions of Radon lines. This is in contrast to the classical filtered back projection, which essentially relies on a regular distribution of the Radon lines, e.g. parallel beam geometry. The good performance of the kernel-based image reconstruction method is illustrated by numerical examples and comparisons

    Acceleration computing process in wavelength scanning interferometry

    Get PDF
    The optical interferometry has been widely explored for surface measurement due to the advantages of non-contact and high accuracy interrogation. Eventually, some interferometers are used to measure both rough and smooth surfaces such as white light interferometry and wavelength scanning interferometry (WSI). The WSI can be used to measure large discontinuous surface profiles without the phase ambiguity problems. However, the WSI usually needs to capture hundreds of interferograms at different wavelength in order to evaluate the surface finish for a sample. The evaluating process for this large amount of data needs long processing time if CPUs traditional programming is used. This paper presents a parallel programming model to achieve the data parallelism for accelerating the computing analysis of the captured data. This parallel programming is based on CUDATM C program structure that developed by NVIDIA. Additionally, this paper explains the mathematical algorithm that has been used for evaluating the surface profiles. The computing time and accuracy obtained from CUDA program, using GeForce GTX 280 graphics processing unit (GPU), were compared to those obtained from sequential execution Matlab program, using Intel¼ Coreℱ2 Duo CPU. The results of measuring a step height sample shows that the parallel programming capability of the GPU can highly accelerate the floating point calculation throughput compared to multicore CPU
    • 

    corecore