Search CORE

5 research outputs found

Cómputo paralelo en clusters herramienta de evaluación de rendimiento de las comunicaciones

Author: Barbieri Andrés
Tinetti Fernando Gustavo
Publication venue
Publication date: 26/10/2012
Field of study

Las redes de computadoras utilizadas para cómputo paralelo (clusters) se están aplicando de manera satisfactoria en múltiples áreas. Uno de los grandes problemas que deben enfrentar las aplicaciones paralelas en este entorno es el del rendimiento de las comunicaciones, dado que el tiempo de comunicación es muy grande con respecto a la cantidad de operaciones que se pueden llevar a cabo de manera local en cualquier computadora. Por otro lado, tanto la topología de la red local, como las herramientas de programación por pasaje de mensajes que se disponen, tienen su propia sobrecarga de comunicaciones que no siempre es sencilla de evaluar y/o predecible a priori. En este artículo se presenta una herramienta que estima el rendimiento de las comunicaciones entre procesos de una aplicación paralela utilizando las bibliotecas de pasaje de mensajes más conocidas y de uso libre. En particular, la atención se centra en las comunicaciones punto a punto (un proceso se comunica con otro, básicamente utilizando las primitivas send-receive) y en una de las comunicaciones colectivas más utilizadas, el mensaje broadcast, que implica enviar datos desde un proceso a n otros procesos de la aplicación paralela.Eje: LenguajesRed de Universidades con Carreras en Informática (RedUNCI

Servicio de Difusión de la Creación Intelectual

Cómputo paralelo en clusters herramienta de evaluación de rendimiento de las comunicaciones

Author: Barbieri Andrés
Tinetti Fernando Gustavo
Publication venue
Publication date: 01/10/2002
Field of study

Centro de Servicios en Gestión de Información

Hierarchical Implementation of Aggregate Functions

Author: Quevedo Pablo
Publication venue: UKnowledge
Publication date: 01/01/2017
Field of study

Most systems in HPC make use of hierarchical designs that allow multiple levels of parallelism to be exploited by programmers. The use of multiple multi-core/multi-processor computers to form a computer cluster supports both fine-grain and large-grain parallel computation. Aggregate function communications provide an easy to use and efficient set of mechanisms for communicating and coordinating between processing elements, but the model originally targeted only fine grain parallel hardware. This work shows that a hierarchical implementation of aggregate functions is a viable alternative to MPI (the standard Message Passing Interface library) for programming clusters that provide both fine grain and large grain execution. Performance of a prototype implementation is evaluated and compared to that of MPI

University of Kentucky

KENTUCKY\u27S ADAPTER FOR PARALLEL EXECUTION AND RAPID SYNCHRONIZATION

Author: Mitta Swetha
Publication venue: UKnowledge
Publication date: 01/01/2007
Field of study

As network hardware has become faster, inefficient communication and synchronization mechanisms often have proven to be fast enough but better models are needed in order to support future systems. The aggregate function communication model, and the KAPERS design and implementation presented in this thesis, provide more efficient ways to implement a wide range of higher-level communication and synchronization operations. The main contributions of this work center on a new way to use FPGA-based memory in an aggregate function network (AFN). The basic functions were designed and implemented with modal encoding to create a global memory that allows variable length objects and object addresses. New and enhanced algorithms were written for use with the new AFN architecture. This thesis also details the KAPERS prototype hardware implementation

University of Kentucky

A CUSTOM ARCHITECTURE FOR DIGITAL LOGIC SIMULATION

Author: Ahn Jiyong
Publication venue
Publication date: 22/03/2002
Field of study

As VLSI technology advances, designers can pack larger circuits into a single chip. According to the International Technology Roadmap for Semiconductors, in the year 2005, VLSI circuit technology will produce chips with 200 million transistors in total, 40 million logic gates, 2 to 3.5 GHz clock rates, and 160 watts of power-consumption. Recently, Intel announced that they will produce a billion-transistor processor before 2010. However, current design methodologies can only handle tens of millions of transistors in a single design. In this thesis, we focus on the problem of simulating large digital devices at the gate level. While many software solutions to gate-level simulation exist, their performance is limited by the underlying general-purpose workstation architecture. This research defines an architecture that is specifically designed for gate-level logic simulation that is at least an order of magnitude faster than software running on a workstation. We present a custom processor and memory architecture design that can simulate a gate level design orders of magnitude faster than the software simulation, while maintaining 4-levels of signal strength. New primitives are presented and shown to significantly reduce the complexity of simulation. Unlike most simulators, which only use zero or unit time delay models, this research provides a mechanism to handle more complex full-timing delay model at pico-second accuracy. Experimental results and a working prototype will also be presented

D-Scholarship@Pitt