746 research outputs found

    IMPLEMENTATION OF MOTION ESTIMATION BASED ON HETEROGENEOUS PARALLEL COMPUTING SYSTEM WITH OPENC

    Get PDF
    International audienceHeterogeneous computing system increases the performance of parallel computing in many domain of general purpose computing with CPU, GPU and other accelerators. Open Computing Language (OpenCL) is the first open, royaltyfree standard for heterogenous computing on multi hardware platforms. In this paper, we propose a parallel Motion Estimation (ME) algorithm implemented using OpenCL and present several optimization strategies applied in our OpenCL implementation of the motion estimation. In the same time, we implement the proposed algorithm on our heterogeneous computing system which contains one CPU and one GPU, and propose one method to determine the balance to distribute the workload in heterogeneous computing system with OpenCL. According to experiments, our motion estimator with achieves 100 to 150 speed-up compared with its implementation with C code executed by single CPU core and our proposed method obtains obviously enhancement of performance in based on our heterogeneous computing system

    Adsorption of MultiLamellar tubes with a temperature tunable diameter at the air-water interface: a process driven by the bulk properties

    Get PDF
    The behavior at the air/water interface of multilamellar tubes made of the ethanolamine salt of the 12-hydroxy stearic acid as a function of the temperature has been investigated using Neutron Reflectivity. Those tubes are known to exhibit a temperature tunable diameter in the bulk. We have observed multilamellar tubes adsorbed at the air/water interface by specular neutron reflectivity. Interestingly, at the interface, the adsorbed tubes exhibit the same behavior than in the bulk upon heating. There is however a peculiar behavior at around 50\degree for which the increase of the diameter of the tubes at the interface yields an unfolding of those tubes into a multilamellar layer. Upon further heating, the tubes re-fold and their diameter re-decrease after what they melt as observed in the bulk. All structural transitions at the interface are nevertheless shown to be quasi-completely reversible. This provides to the system a high interest for its interfacial properties because the structure at the air/water interface can be tuned easily by the temperature

    Advanced list scheduling heuristic for task scheduling with communication contention for parallel embedded systems

    No full text
    WOSInternational audienceModern embedded systems tend to use multiple cores or processors for processing parallel applications. This paper indeed aims at task scheduling with communication contention for parallel embedded systems and proposes three advanced techniques to improve the list scheduling heuristic. Five groups of node levels (two existing groups and three new groups) are firstly used as node priorities to generate node lists. Then the critical child technique improves the selection of a processor in the scheduling process. Finally, the communication delay technique enlarges the idle time intervals on communication links. We also propose an advanced dynamic list scheduling heuristic by combining the three techniques. Experimental results show that the combined advanced dynamic heuristic is efficient to shorten the schedule length for most of the randomly generated DAGs in the cases of medium and high communication. Our method accelerates an application up to 80% in the case of high communication and can also reduce the use of hardware resources

    A List Scheduling Heuristic with New Node Priorities and Critical Child Technique for Task Scheduling with Communication Contention

    Get PDF
    International audienceTask scheduling is an important aspect for parallel programming. In this paper, the program to be scheduled is modeled as a Directed Acyclic Graph (DAG), and we target parallel embedded systems of multiple processors connected by buses and switches. This paper presents improvements for list scheduling heuristics with communication contention. We use new node priorities (top level and bottom level) to sort nodes and use an advanced technique of critical child to select a processor to execute a node. Experimental results show that our method is effective to reduce the schedule length, and the performance is greatly improved in the cases of medium and high communication. Since the communication cost is increasing from medium to high in modern applications like digital communication and video compression, our method will work well for scheduling these applications on parallel embedded systems

    A List Scheduling Heuristic with New Node Priorities and Critical Child Technique for Task Scheduling with Communication Contention

    Get PDF
    International audienceTask scheduling is an important aspect for parallel programming. In this paper, the program to be scheduled is modeled as a Directed Acyclic Graph (DAG), and we target parallel embedded systems of multiple processors connected by buses and switches. This paper presents improvements for list scheduling heuristics with communication contention. We use new node priorities (top level and bottom level) to sort nodes and use an advanced technique of critical child to select a processor to execute a node. Experimental results show that our method is effective to reduce the schedule length, and the performance is greatly improved in the cases of medium and high communication. Since the communication cost is increasing from medium to high in modern applications like digital communication and video compression, our method will work well for scheduling these applications on parallel embedded systems

    Heuristique statique améliorée d'ordonnancement de tâches: impact sur le tri des tâches et sur l'allocation de processeur

    Get PDF
    National audienceL'ordonnancement de tâches est une étape importante dans le prototypage rapide d'applications de traitement d'images sur des systèmes parallèles embarqués. Nous présentons ainsi dans cet article une heuristique statique améliorée d'ordonnancement par liste : d'une part, cette heuristique intègre de nouvelles règles de priorité de tâches, tenant compte de la contention sur les communications entre tâches ; d'autre part, cette heuristique affine l'allocation d'un processeur à une tâche courante, en impactant le choix du processeur par un ordonnancement partiel de la tâche successeur critique (" critical child ") à la tâche courante. Nos résultats expérimentaux soulignent une accélération effective de l'application implantée, dans un contexte de moyenne comme de forte communication

    Heuristique statique améliorée d'ordonnancement de tâches: impact sur le tri des tâches et sur l'allocation de processeur

    Get PDF
    National audienceL'ordonnancement de tâches est une étape importante dans le prototypage rapide d'applications de traitement d'images sur des systèmes parallèles embarqués. Nous présentons ainsi dans cet article une heuristique statique améliorée d'ordonnancement par liste : d'une part, cette heuristique intègre de nouvelles règles de priorité de tâches, tenant compte de la contention sur les communications entre tâches ; d'autre part, cette heuristique affine l'allocation d'un processeur à une tâche courante, en impactant le choix du processeur par un ordonnancement partiel de la tâche successeur critique (" critical child ") à la tâche courante. Nos résultats expérimentaux soulignent une accélération effective de l'application implantée, dans un contexte de moyenne comme de forte communication

    Selection of SNP subsets for association studies in candidate genes: comparison of the power of different strategies to detect single disease susceptibility locus effects

    Get PDF
    BACKGROUND: The recent advances in genotyping and molecular techniques have greatly increased the knowledge of the human genome structure. Millions of polymorphisms are reported and freely available in public databases. As a result, there is now a need to identify among all these data, the relevant markers for genetic association studies. Recently, several methods have been published to select subsets of markers, usually Single Nucleotide Polymorphisms (SNPs), that best represent genetic polymorphisms in the studied candidate gene or region. RESULTS: In this paper, we compared four of these selection methods, two based on haplotype information and two based on pairwise linkage disequilibrium (LD). The methods were applied to the genotype data on twenty genes with different patterns of LD and different numbers of SNPs. A measure of the efficiency of the different methods to select SNPs was obtained by comparing, for each gene and under several single disease susceptibility models, the power to detect an association that will be achieved with the selected SNP subsets. CONCLUSION: None of the four selection methods stands out systematically from the others. Methods based on pairwise LD information turn out to be the most interesting methods in a context of association study in candidate gene. In a context where the number of SNPs to be tested in a given region needs to be more limited, as in large-scale studies or wide genome scans, one of the two methods based on haplotype information, would be more suitable

    Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration

    Get PDF
    International audienceHeterogeneous computing system increases the performance of parallel computing in many domain of general purpose computing with CPU, GPU and other accelerators. With Hardware developments, the software developments like Compute Unified Device Architecture(CUDA) and Open Computing Language (OpenCL) try to offer a simple and visualized tool for parallel computing. But it turn out to be more difficult than programming on CPU platform for optimization of performance. For one kind of parallel computing application, there are different configuration and parameters for various hardware platforms. In this paper, we apply the Hybrid Multi-cores Parallel Programming(HMPP) to automatic-generates tunable code for GPU platform and show the result of implementation of Stereo Matching with detailed comparison with C code version and manual CUDA version. The experimental results show that the default and optimized HMPP have the approximative 1 compared with CUDA implementation. And the HMPP workbench can greatly reduce the time of application development using parallel computing device
    • …
    corecore