39 research outputs found

    Elaboración de estrategias paralelas para búsquedas por similitud en espacios métricos

    Get PDF
    Realizar una búsqueda secuencial de elementos específicos sobre grandes volúmenes de información no es una aproximación apropiada. Esto se debe a que los tiempos de respuesta obtenidos no se adecuan a la exigencia de los usuarios. Una forma de abordar la problemática previamente mencionada consiste en aprovechar los avances de la tecnología, en lo que a software y hardware se refiere, para elaborar e implementar nuevos algoritmos que realicen búsquedas más eficientes, como lo son las búsquedas por similitud paralelas. Llevar a cabo la tarea antes mencionada no es fácil debido a que implica: i) El estudio de diferentes estructuras de datos que puedan ser paralelizables, ii) La selección de un modelo de computación paralela que posea una infraestructura que facilite la tarea del programador y iii) La elaboración de algoritmos que utilicen eficientemente las estructura de datos elegidas y que aprovechen al máximo las capacidades del modelo de computación paralela seleccionado y su infraestructura asociada. En este artículo se describe una línea de investigación que aborda las temáticas previamente mencionadas y cuyo principal objetivo es: La elaboración de algoritmos de búsqueda por similitud paralelos, basados en estructuras de indexación eficientes, cuya eficiencia escala en forma lineal respecto del número de procesadores disponibles en redes de tamaño moderado.Eje: Procesamiento Distribuido y ParaleloRed de Universidades con Carreras en Informática (RedUNCI

    Um Algoritmo Inter-Procedural para Análise de Largura de Variáveis

    Get PDF
    Durante este projeto foi desenvolvido um algoritmo inter-procedural que é capaz de processar programas com milhões de instruções assembly. Ao contrário de muitos trabalhos anteriores, nosso algoritmo trata comparações entre variáveis sem recorrer a algoritmos custosos. Nós obtemos sensibilidade ao fluxo de execução usando como representação intermediária o formato e-SSA (Extended Static Single Assignment) descrito por Bodik. Nós também mostramos que processar os componentes fortemente conexos do grafo em ordem topológica não só reduz o tempo de execução do programa, mas também aumenta sua precisão. Nós implementamos nossa técnica em LLVM, um compilador industrial, e fomos capazes de processar cerca de quatro milhões de instruções assembly em poucos segundos

    Adding Discretionary Access to Remote Method Invocation

    No full text
    This paper describes the implementation of an object oriented middleware that allows the application developer to regulate the use of individual remote methods by means of access control lists. Such platform has been implemented as an instance of Arcademis, a framework for middleware development. The objective of this case study is twofold. Firstly, to demonstrate how frameworks and design patterns can be synergistically combined in order to facilitate the implementation of distributed software. Secondly, to point similarities between the architecture of object oriented middleware, such as Java RMI, and distributed authentication systems, such as Kerberos, in order to argue that discretionary access control can be added to the commercial middleware platforms as a natural extension of the remote method invocation paradigm.

    Divergence Analysis with Affine Constraints

    No full text
    The rise of graphics processing units in high-performance computing is bringing renewed interest in code optimization techniques that target SIMD processors. Many of these optimizations rely on divergence analyses, which classify variables as uniform, if they have the same value on every thread, or divergent, if they might not. This paper introduces a new kind of divergence analysis, that is able to represent variables as affine functions of thread identifiers. We have implemented our divergence analysis with affine constraints on top of Ocelot, an open source compiler, and use it to analyze a suite of 177 CUDA kernels from well-known benchmarks. These experiments show that our algorithm reports 4% less divergent variables than the previous state-of-the-art algorithm of Coutinho et al. Furthermore, we can mark about one fourth of all divergent variables as affine functions of thread identifiers. In addition to the novel divergence analysis, we also introduce the notion of a divergence aware register allocator. This allocator uses information from our analysis to either rematerialize affine variables, or to move uniform variables to shared memory. As a testimony of its effectiveness, our divergence aware allocator produces GPU code that is 29.70% faster than the code produced by Ocelot's register allocator

    Divergence Analysis with Affine Constraints

    No full text
    The rise of graphics processing units in high-performance computing is bringing renewed interest in code optimization techniques that target SIMD processors. Many of these optimizations rely on divergence analyses, which classify variables as uniform, if they have the same value on every thread, or divergent, if they might not. This paper introduces a new kind of divergence analysis, that is able to represent variables as affine functions of thread identifiers. We have implemented our divergence analysis with affine constraints on top of Ocelot, an open source compiler, and use it to analyze a suite of 177 CUDA kernels from well-known benchmarks. These experiments show that our algorithm reports 4% less divergent variables than the previous state-of-the-art algorithm of Coutinho et al. Furthermore, we can mark about one fourth of all divergent variables as affine functions of thread identifiers. In addition to the novel divergence analysis, we also introduce the notion of a divergence aware register allocator. This allocator uses information from our analysis to either rematerialize affine variables, or to move uniform variables to shared memory. As a testimony of its effectiveness, our divergence aware allocator produces GPU code that is 29.70% faster than the code produced by Ocelot's register allocator

    Divergence Analysis with Affine Constraints

    No full text
    The rise of graphics processing units in high-performance computing is bringing renewed interest in code optimization techniques that target SIMD processors. Many of these optimizations rely on divergence analyses, which classify variables as uniform, if they have the same value on every thread, or divergent, if they might not. This paper introduces a new kind of divergence analysis, that is able to represent variables as affine functions of thread identifiers. We have implemented our divergence analysis with affine constraints on top of Ocelot, an open source compiler, and use it to analyze a suite of 177 CUDA kernels from well-known benchmarks. These experiments show that our algorithm reports 4% less divergent variables than the previous state-of-the-art algorithm of Coutinho et al. Furthermore, we can mark about one fourth of all divergent variables as affine functions of thread identifiers. In addition to the novel divergence analysis, we also introduce the notion of a divergence aware register allocator. This allocator uses information from our analysis to either rematerialize affine variables, or to move uniform variables to shared memory. As a testimony of its effectiveness, our divergence aware allocator produces GPU code that is 29.70% faster than the code produced by Ocelot's register allocator

    Register Allocation after Classical SSA Elimination is NP-complete

    No full text
    Abstract. Chaitin proved that register allocation is equivalent to graph coloring and hence NP-complete. Recently, Bouchez, Brisk, and Hack have proved independently that the interference graph of a program in static single assignment (SSA) form is chordal and therefore colorable in linear time. Can we use the result of Bouchez et al. to do register allocation in polynomial time by first transforming the program to SSA form, then performing register allocation, and finally doing the classical SSA elimination that replaces φ-functions with copy instructions? In this paper we show that the answer is no, unless P = NP: register allocation after classical SSA elimination is NP-complete. Chaitin’s proof technique does not work for programs after classical SSA elimination; instead we use a reduction from the graph coloring problem for circular arc graphs.

    Data and Instruction Uniformity in Minimal Multi-Threading

    Get PDF
    International audienceSimultaneous Multi-Threading (SMT) is a hardware model in which different threads share the same instruction fetching unit. This model is a compromise between high parallelism and low hardware cost. Minimal Multi-Threading (MMT) is a technique recently proposed to share instructions and execution between threads in a SMT machine. In this paper we propose new ways to explore redundancies in the MMT execution model. First, we propose and evaluate a new thread reconvergence heuristics that handles function calls better than previous approaches. Second, we demonstrate the existence of substantial regularity in inter-thread memory access patterns. We validate our results on the four data-parallel applications present in the PARSEC benchmark suite. The new thread reconvergence heuristics is, on the average, 82% more efficient than MMT's original reconvergence method. Furthermore, about 69% to 87% of all the memory addresses are either the same for all the threads, or are affine expressions of the thread identifier. This observation motivates the design of newly proposed hardware that benefits from regularity in inter-thread memory accesses

    Software for Static Prediction of Silent Stores

    No full text
    A Store operation is called silent if it writes in memory a value that is already there. The ability to detect silent stores is important, because they might indicate performance bugs, might enable code optimizations, and might reveal opportunities of automatic parallelization, for instance. Silent stores are traditionally detected via profiling tools. In this project, we depart from this methodology, and, instead, explore the following question: is it possible to predict silentness by analyzing the syntax of programs? The process of building an answer to this question is interesting in itself, given the stochastic nature of silent stores, which depend on data and coding style. To build such an answer, we have developed a methodology to classify store operations in terms of syntactic features of programs. Based on such features, we develop different kinds of predictors, some of which go much beyond what any trivial approach could achieve. To illustrate how static prediction can be employed in practice, we use it to optimize programs running on non-volatile memory systems.Webpage: http://www.lirmm.fr/continuum-project/pages/s3a.htm
    corecore