1,467 research outputs found

    High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures

    Get PDF
    This article presents two high-efficient parallel realizations of the context-based adaptive variable length coding (CAVLC) based on heterogeneous multicore processors. By optimizing the architecture of the CAVLC encoder, three kinds of dependences are eliminated or weaken, including the context-based data dependence, the memory accessing dependence and the control dependence. The CAVLC pipeline is divided into three stages: two scans, coding, and lag packing, and be implemented on two typical heterogeneous multicore architectures. One is a block-based SIMD parallel CAVLC encoder on multicore stream processor STORM. The other is a component-oriented SIMT parallel encoder on massively parallel architecture GPU. Both of them exploited rich data-level parallelism. Experiments results show that compared with the CPU version, more than 70 times of speedup can be obtained for STORM and over 50 times for GPU. The implementation of encoder on STORM can make a real-time processing for 1080p @30fps and GPU-based version can satisfy the requirements for 720p real-time encoding. The throughput of the presented CAVLC encoders is more than 10 times higher than that of published software encoders on DSP and multicore platforms

    Space-division multiplexing for fiber-wireless communications

    Full text link
    We envision the application of optical Space-division Multiplexing (SDM) to the next generation fiber-wireless communications as a firm candidate to increase the end user capacity and provide adaptive radiofrequency-photonic interfaces. This approach relies on the concept of fiber-distributed signal processing, where the SDM fiber provides not only radio access distribution but also broadband microwave photonics signal processing. In particular, we present two different SDM fiber technologies: dispersion-engineered heterogeneous multicore fiber links and multicavity devices built upon the selective inscription of gratings in homogeneous multicore fibers.Comment: 4 pages, 20th International Conference on Transparent Optical Networks (ICTON), Girona (Spain), 2017. arXiv admin note: text overlap with arXiv:1810.1213

    Evaluating Cache Coherent Shared Virtual Memory for Heterogeneous Multicore Chips

    Full text link
    The trend in industry is towards heterogeneous multicore processors (HMCs), including chips with CPUs and massively-threaded throughput-oriented processors (MTTOPs) such as GPUs. Although current homogeneous chips tightly couple the cores with cache-coherent shared virtual memory (CCSVM), this is not the communication paradigm used by any current HMC. In this paper, we present a CCSVM design for a CPU/MTTOP chip, as well as an extension of the pthreads programming model, called xthreads, for programming this HMC. Our goal is to evaluate the potential performance benefits of tightly coupling heterogeneous cores with CCSVM

    Toward optimised skeletons for heterogeneous parallel architecture with performance cost model

    Get PDF
    High performance architectures are increasingly heterogeneous with shared and distributed memory components, and accelerators like GPUs. Programming such architectures is complicated and performance portability is a major issue as the architectures evolve. This thesis explores the potential for algorithmic skeletons integrating a dynamically parametrised static cost model, to deliver portable performance for mostly regular data parallel programs on heterogeneous archi- tectures. The rst contribution of this thesis is to address the challenges of program- ming heterogeneous architectures by providing two skeleton-based programming libraries: i.e. HWSkel for heterogeneous multicore clusters and GPU-HWSkel that enables GPUs to be exploited as general purpose multi-processor devices. Both libraries provide heterogeneous data parallel algorithmic skeletons including hMap, hMapAll, hReduce, hMapReduce, and hMapReduceAll. The second contribution is the development of cost models for workload dis- tribution. First, we construct an architectural cost model (CM1) to optimise overall processing time for HWSkel heterogeneous skeletons on a heterogeneous system composed of networks of arbitrary numbers of nodes, each with an ar- bitrary number of cores sharing arbitrary amounts of memory. The cost model characterises the components of the architecture by the number of cores, clock speed, and crucially the size of the L2 cache. Second, we extend the HWSkel cost model (CM1) to account for GPU performance. The extended cost model (CM2) is used in the GPU-HWSkel library to automatically nd a good distribution for both a single heterogeneous multicore/GPU node, and clusters of heteroge- neous multicore/GPU nodes. Experiments are carried out on three heterogeneous multicore clusters, four heterogeneous multicore/GPU clusters, and three single heterogeneous multicore/GPU nodes. The results of experimental evaluations for four data parallel benchmarks, i.e. sumEuler, Image matching, Fibonacci, and Matrix Multiplication, show that our combined heterogeneous skeletons and cost models can make good use of resources in heterogeneous systems. Moreover using cores together with a GPU in the same host can deliver good performance either on a single node or on multiple node architectures

    ECG Signal Reconstruction on the IoT-Gateway and Efficacy of Compressive Sensing Under Real-time Constraints

    Get PDF
    Remote health monitoring is becoming indispensable, though, Internet of Things (IoTs)-based solutions have many implementation challenges, including energy consumption at the sensing node, and delay and instability due to cloud computing. Compressive sensing (CS) has been explored as a method to extend the battery lifetime of medical wearable devices. However, it is usually associated with computational complexity at the decoding end, increasing the latency of the system. Meanwhile, mobile processors are becoming computationally stronger and more efficient. Heterogeneous multicore platforms (HMPs) offer a local processing solution that can alleviate the limitations of remote signal processing. This paper demonstrates the real-time performance of compressed ECG reconstruction on ARM's big.LITTLE HMP and the advantages they provide as the primary processing unit of the IoT architecture. It also investigates the efficacy of CS in minimizing power consumption of a wearable device under real-time and hardware constraints. Results show that both the orthogonal matching pursuit and subspace pursuit reconstruction algorithms can be executed on the platform in real time and yield optimum performance on a single A15 core at minimum frequency. The CS extends the battery life of wearable medical devices up to 15.4% considering ECGs suitable for wellness applications and up to 6.6% for clinical grade ECGs. Energy consumption at the gateway is largely due to an active internet connection; hence, processing the signals locally both mitigates system's latency and improves gateway's battery life. Many remote health solutions can benefit from an architecture centered around the use of HMPs, a step toward better remote health monitoring systems.Peer reviewedFinal Published versio

    Optimizing soft error reliability through scheduling on heterogeneous multicore processors

    Get PDF
    Reliability to soft errors is an increasingly important issue as technology continues to shrink. In this paper, we show that applications exhibit different reliability characteristics on big, high-performance cores versus small, power-efficient cores, and that there is significant opportunity to improve system reliability through reliability-aware scheduling on heterogeneous multicore processors. We monitor the reliability characteristics of all running applications, and dynamically schedule applications to the different core types in a heterogeneous multicore to maximize system reliability. Reliability-aware scheduling improves reliability by 25.4 percent on average (and up to 60.2 percent) compared to performance-optimized scheduling on a heterogeneous multicore processor with two big cores and two small cores, while degrading performance by 6.3 percent only. We also introduce a novel system-level reliability metric for multiprogram workloads on (heterogeneous) multicores. We provide a trade-off analysis among reliability-, power- and performance-optimized scheduling, and evaluate reliability-aware scheduling under performance constraints and for unprotected L1 caches. In addition, we also extend our scheduling mechanisms to multithreaded programs. The hardware cost in support of our reliability-aware scheduler is limited to 296 bytes per core

    Demonstration of distributed radiofrequency signal processing on heterogeneous multicore fibres

    Full text link
    [EN] We experimentally demonstrate for the first-time to our knowledge distributed radiofrequency signal processing performed by a heterogeneous multicore fibre link. A trench-assisted 7-core fibre, where each core presents a different chromatic dispersion behaviour, is custom-engineered to operate as a 2D sampled true time delay line.This research was supported by the ERC Consolidator Grant 724663, Spanish MINECO Project TEC2016-80150-R, Spanish MINECO scholarship BES-2015-073359 for S. García and Spanish MINECO fellowship RYC-2014-16247 for I. Gasulla.García-Cortijo, S.; Ureña-Gisbert, M.; Gasulla Mestre, I. (2019). Demonstration of distributed radiofrequency signal processing on heterogeneous multicore fibres. IEEE. 1-4. https://doi.org/10.1049/cp.2019.0749S1
    corecore