14 research outputs found

    PM-PVM: a portable multithreaded PVM

    Get PDF
    PM-PVM is a portable implementation of PVM designed to work on SMP architectures supporting multithreading. PM-PVM portability is achieved through the implementation of the PVM functionality on top of a reduced set of parallel programming primitives. Within PM-PVM; PVM tasks are mapped onto threads and the message passing functions are implemented using shared memory. Three implementation appproaches of the PVM message passing functions have been adopted. In the first one, a single message copy in memory is shared by alI destination tasks. The second one replicates the message for every destination task but requires less synchronization. Finally, the third approach uses a combination of features from the two previous ones. Experimental results comparing the performance of PM-PVM and PVM applications running on a 4-processor Sparcstation 20 under Solaris 2.5 show that PM-PVM can produce execution times up to 54% smaller than PVM

    Avaliação do desempenho de um procedimento de extração de atributos texturais baseado em análise de Fourier

    Get PDF
    This paper presents a performance analysis of a Fourier-based textural feature extraction procedure. The textural classifications are based in the Euclidian distance method and are accomplished in two experiments. This analysis was done using a software called " Ambiente de Processamento de Sinais e Imagens - APSI" developed for this purpose

    A comparative analysis of cache memory architectures for the multiplus multiprocessor

    Get PDF
    This paper analyses some design altematives for the MULTIPLUS cache memory subsystem architecture. MUL TIPLUS is a high performance multiprocessor system under development at NCE/UFRJ. The analysis is carried out using a simu1ator which supports different cache configurations. The simulator experiments have been done under three different situations: a non-cache system and the use of write back and write through control policies. The graphical results show the system behaviour in relation to the average ratio of bus occupation and the average processor cycle length.Este trabalho analisa algumas alternativas de projeto para a arquitetura do sub-sistema de memória cache para o multiprocessador MULTIPLUS. O MULTIPLUS é um multiprocessador de alto desempenho em desenvolvimento no Núcleo de Computação Eletrônica da Universidade Federal do Rio de Janeiro (NCE/UFRJ). A análise foi realizada utilizando-se um simulador que suporta diferentes configurações de memórias cache. A simulação foi realizada utilizando-se três diferentes sistemas: sem memória cache, com cache utilizando políticas de controle do tipo write through e write back. Os resultados gráficos mostram o desempenho do sistema em relação a taxa média de ocupação dos barramentos e o tempo médio de duração do ciclo do processador

    Uma avaliação do impacto das operações de E/S no desempenho do multiprocessador Multiplus

    Get PDF
    This paper discusses the impact of I/O operations on the performance of the processing nodes within Multiplus, a shared -memory multiprocessor under development at NCE/UFRJ. The different aspects of such impact are evaluated through simulation experiments which try to model the I/O system operation within the Multiplus architecture.Este trabalho apresenta um estudo sobre o impacto das operações de E/S no desempenho dos elementos processadores do Multiplus, um multiprocessador com memória global compartilhada em desenvolvimento no NCE/UFRJ. As questões são avaliadas a partir de dados oriundos de simulações que procuraram retratar a operacionalidade do subsistema de E/S dentro do contexto do Multiplus

    O subsistema de memória de massa do multiprocessador multiplus

    Get PDF
    This paper presents the architecture of the I/O subsystem for mass storage which has been proposed for MULTIPLUS, a high performance scientific multiprocessor under development at NCE/UFRJ. After studying the I/O bottleneck problem, which results from an existing gap in the technological evolution of processor and mass storage devices, and identifying the parameters which define the I/O subsystem performance, the paper discusses the advantages and drawbacks of the use of a concentrated or distributed I/O subsystem architecture. Finally, a distributed architecture, which is suitable for scientific parallel application, is proposed for the MULTIPLUS I/O subsystem and the internal organization of the I/O processosr is presented and justified through numerical analysis.Este artigo descreve a arquitetura de E/S de memória de massa proposta para o MULTIPLUS, um multiprocessador científico de alto desempenho em desenvolvimento no NCE/UFRJ. Após caracterizar o gargalo de E/S, decorrente de um descompasso existente na evolução tecnológica de microprocessadores e dos dispositivos de memória de massa, e de identificar os parâmetros que determinam o desempenho de um sistema de E/S, o artigo discute as vantagens e desvantagens de um subsistema de E/S concentrado ou distribuído. Finalmente, uma arquitetura distribuída capaz de facilitar a exploração de paralelismo em aplicações científicas, é proposta para o subsistema de E/S do MULTIPLUS e a organização interna dos processadores de E/S é apresentada e justificada através de uma análise quantitativa

    Avaliação de uma arquitetura SPARC com cache de desvio e barramento tipo Harvard

    Get PDF
    Variations in the SPARC architecture are studied in this paper, with particular emphasis to the use of a branch target cache and a Harvard bus. A simulator that works in a cycle per cycle basis has been developed to conduct performance measurements of some configurations. The results obtained are reported in this paper.Variações na arquitetura SPARC são estudadas neste artigo, com particular ênfase no uso de uma cache de desvio e de um barramento Harvard. Um simulador que funciona em um modo ciclo foi desenvolvido para realizar medidas de desempenho em várias configurações. Os resultados obtidos são apresentados neste artigo

    NCESPARC+: an implementation of a SPARC architecture with hardware support to multithreading for the multiplus multiprocessor

    Get PDF
    NCESP ARC + is an implementation of the SP ARC v: 8 architecture with hardware support to a variable number of thread contexts, which is under development for use within the framework of the Multiplus distributed shared-memory multiprocessor. It is expected to provide an efficient and automatic mechanism to hide the latency of busy-waiting synchronization loops, cachecoherence protocol and remote memory access operations within the Multiplus multiprocessor. NCESPARC + performs context-switching in at most four processor cycles whenever there is an instruction cache miss, a data dependency in relation to the destination operand of a pending load instruction or a busy-waiting synchronization loop. It has a decoupled architecture which allows the main pipeline to process instructions from a given context while the Memory Interface Unit performs memory access operations related to that same context or to any other context. Results of simulation experiments show the impact of some architectural parameters on the NCESPARC + processor performance and demonstrate that the use of multiple thread contexts can e.ffectively produce a much better utilization of the processor when long latency operations are performed In addition, NCESPARC + processor performance with a single context is superior to that of a standard implementation of the SPARC architecture due to its decoupled architecture

    Multiplus: a modular high-performance multiprocessor

    Get PDF
    The MULTIPLUS project is currently under development at NCE/UFRJ, Brazil, aims at the study of parallel processing problems in MIMD environments. The project includes the development of a parallel shared-memory architecture and a UNIX-like operating operating system called MULTIPLIX. The MULTIPLUS achitecture uses an inverted n-cube multistage network to interconnect clusters of processing nodes designed around a double-bus system. As a consequence, the architecture is partitionable and modular. It cas easily and efficiently supportconfigurations ranging from workstations to powerful parallel supercomputers with up to 2048 processing nodes. The MULTIPLix operating system provides MULTIPLUS with an efficient computing environment for parallel scientific applications. MULTIPLIX uses the concept of thread, implements busy-waiting synchronization primitives very efficiently and carefully considers data locality and scientific processing requirements in the policies adopted for memory management and thread scheduling.O projeto MULTIPLUS, que está atualmente em desenvolvimento no NCE/UFRJ, objetiva o estudo de problemas de processamento paralelo em ambiente MIMD. O projeto inclui o desenvolvimento de uma arquitetura paralela com memória compartilhada e um sistema operacional tipo UNIX chamado MULTIPLIX. A arquitetura do MULTIPLUS usa uma rede de interconexão multiestágio do tipo n-cubo invertido para interligar clusters de nós de processamento projetados em torno de um sistema de barramento duplo. Como consequência a arquitetura é patrocinável e modular. Ela pode suportar eficientemente configurações cobrindo um espectro que vai desde estações de trabalho até poderosos supercomputadores contendo 2048 nós de processamento trabalhando em paralelo. O sistema operacional MULTIPLIX provê o MULTIPLUS com um ambiente eficiente de computação para aplicações científicas paralelas.O MULTIPLIX usa o conceito de "thread", implementa primitivas de sincronização de espera ocupara muito eficientemente e considera fortemente aspectos de localidade dos dados e requisitos de processamento científico nas políticas adotadas para gerenciamento de memória e escalonamento de "threads"

    Uma proposta de implementação do algoritmo de Lee no multiprocessador Multiplus

    Full text link
    This paper discusses some issues concerning the implementation of Lee's routing algorithm on MULTIPLUS, a multiprocessor under development at NCE/UFRJ. MULTIPLUS supports up to 2048 processing nodes which can be organized into 32 clusters consisting of up to 8 processing nodes. The overall global memory address space is 32 Gbytes, which is physically distributed into modules of up to 32 Mbytes that are local to the processing nodes. The proposed implementation takes advantage of the memory hierarchy and partition within MULTIPLUS architecture to efficiently exploit the intrinsic parallelism of the expansion and reset phases of the algorithm. In addition, the proposed implementation is also able to perform in parallel the routing of nets constrained to non-overlapping layout areas.Este trabalho discute a implementação do algoritmo de roteamento proposto por Lee no multiprocessador MULTIPLUS em desenvolvimento no NCE/UFRJ. O MULTIPLUS suporta até 2048 nós de processamento organizados em até 32 "clusters" com 8 nós de processamento. O espaço de endereçamento global de memória é de 32 Gbytes, fisicamente distribuído em módulos de até 32 Mbytes locais a cada nó de processamento. A proposta de implementação apresentada tira partido das características de hierarquia e partição de memória presentes na arquitetura do MULTIPLUS para explorar de forma mais eficiente o paralelismo inerente às fases de expansão e reinicilaização do algoritmo. Além disso, a implementação proposta é capaz de realizar concorrentemente o roteamento de conexões situadas em áreas disjuntas do layout

    The multiplus/mulplix project: current status and perspectives

    Full text link
    The MULTIPLUS/MULPLIX project aims at the development of a modular distributed shared-memory parallel architecture able to support up to 1024 processing elements based on SP ARC microprocessors and at the implementation of MULPLIX, a Unix-like operating system which provides a suitable parallel programming environment for the MULTIPLUS architecture. The project includes research effort in five areas: parallel architectures, operating systems, CMOS lC design, parallel programming environments and parallel algorithms. This technical report firstly presents an overview of the MULTIPLUS architecture and describes in detail the cu"ent implementation of its four basic hardware modules: the Processing Element, the 1/0 Processor, the Multistage lnteconnection Network and the Network lnterface. Secondly, the MULPLIX operating system definition is reviewed and the parallel programming primitives available within MULPUX are presented Following, developments in the area of CMOS lC designs for use within the MULTIPLUS architecture are described The implementations of PVM and Pthreads parallel programming libraries within the MULPLIX system are also discussed. Finally the main results achieved with the parallelization of Simulated Annealing and Genetic algorithms are commented
    corecore