23 research outputs found

    The RDT network Router Chip

    Get PDF
    Design Outline The RDT network Router chip is a versatile router for the massively parallel computer prototype JUMP-1, which is currently under development by collaboration between 7 Japanese universities The major goal of this project is to establish techniques for building an ecient distributed shared memory on a massively parallel processor. For this purpose, the reduced hierarchical bit-map directory (RHBD) schemes In order to implement (RHBD) schemes eciently, we proposed a novel interconnection network RDT (Recursive Diagonal Torus) By using the 0.5BiCMOS SOG technology, it can transfer all packets synchronized with a unique CPU clock(60MHz). Long coaxial cables(4m at maximum) are directly driven with the ECL interface of this chip. Using the dual port RAM, packet buers allow to push and pull a it of the packet simultaneously. The mixed design approach with schematic and VHDL permits the development of the complicated chip with 90,522 gates in a year. 2 JUMP-1 and the RDT JUMP-1[1] consists of clusters connected with an interconnection network RDT 2.1 Interconnection network RD

    Prospects and tendencies of multiprocessor systems development

    Get PDF
    Prospects and main tendencies of development and design of modern multiprocessor systems are considered. It is shown that for today the promising solutions are personal computing clusters, which represent a strategic direction for the computing development

    Concurrency in Blockchain Based Smartpool with Transactional Memory

    Full text link
    Blockchain is the buzzword in today\u27s modern technological world. It is an undeniably ingenious invention of the 21st century. Blockchain was first coined and used by a cryptocurrency namedBitcoin. Since then bitcoin and blockchain are so popular that every single person is taking on bitcoin these days and the price of bitcoin has leaped to a staggering price in the last year and so.Today several other cryptocurrencies have adapted the blockchain technology. Blockchain in cryptocurrencies is formed by chaining of blocks. These blocks are created by the nodes called miners through the process called Proof of Work(PoW). Mining Pools are formed as a collection of miners which collectively tries to solve a puzzle. However, most of the mining pools are centralized. P2Pool is the first decentralized mining pool in Bitcoin but is not that popular as the number of messages exchanged among the miners is a scalar multiple of the number of shares. SmartPool is a decentralized mining pool with the throughput equal to that of the traditional pool. However, the verification of blocks is done in a sequential manner. We propose a non-blocking concurrency mechanism in a decentralized mining pool for the verification of blocks in a blockchain. Smart contract in SmartPool is concurrently executed using a transactional memory approach without the use of locks. Since the SmartPool mining implemented in ethereum can be applied to Bitcoin, this concurrency method proposed in ethereum smart contracts can be applicable in Bitcoin as well

    Factores de rendimiento en entornos multicore

    Get PDF
    Este documento refleja el estudio de investigación para la detección de factores que afectan al rendimiento en entornos multicore. Debido a la gran diversidad de arquitecturas multicore se ha definido un marco de trabajo, que consiste en la adopción de una arquitectura específica, un modelo de programación basado en paralelismo de datos, y aplicaciones del tipo Single Program Multiple Data. Una vez definido el marco de trabajo, se han evaluado los factores de rendimiento con especial atención al modelo de programación. Por este motivo, se ha analizado la librería de threads y la API OpenMP para detectar aquellas funciones sensibles de ser sintonizadas al permitir un comportamiento adaptativo de la aplicación al entorno, y que dependiendo de su adecuada utilización han de mejorar el rendimiento de la aplicación.Aquest document reflexa l'estudi d'investigació per a la detecció de factors que afecten al rendiment en entorns multicore. Degut a la gran quantitat d'arquitectures multicore s'ha definit un marc de treball acotat, que consisteix en la adopció d'una arquitectura específica, un model de programació basat en paral·lelisme de dates, i aplicacions del tipus Single Program Multiple Data. Una vegada definit el marc de treball, s'han avaluat els factors de rendiment amb especial atenció al model de programació. Per aquest motiu, s'ha analitzat la llibreria de thread i la API OpenMP per a detectar aquelles funcions sensibles de ser sintonitzades, al permetre un comportament adaptatiu de l'aplicació a l'entorn, i que, depenent de la seva adequada utilització s'aconsegueix una millora en el rendiment de la aplicació.This work reflects research studies for the detection of factors that affect performance in multicore environments. Due to the wide variety of multicore architectures we have defined a framework, consisting of a specific architecture, a programming model based on data parallelism, and Single Program Multiple Data applications. Having defined the framework, we evaluate the performance factors with special attention to programming model. For this reason, we have analyzed threaad libreary and OpenMP API to detect thos candidates functions to be tuned, allowin applications to beave adaptively to the computing environment, and based on their propper use will improve performance

    Characterization of applications in new architectures

    Get PDF
    English: Computer science is continuously evolving to improve the development of applications, and to allow programmers to achieve better productivity. One key issue to solve is the ability to reuse the work previously done by others. Currently, the same or similar algorithms and libraries are used in many different kinds of applications (weather forecasting, physics simulations, artificial intelligence decision, entertainment programs, etc.), despite of which kind of input data is processed and how. The increasing number of requirements are the reason that only one processor is not enough for high performance applications, forcing processors to cooperate in a synchronized way. For this, there is the idea of granularity: a program must be divided in subtasks (a.k.a. threads) which need to communicate to exchange data and coordinate their activities in order to distribute workload and improve application performance. So, when those requirements are high enough, migration of the application to a new and more powerful platform is needed, with inappreciable modifications or without any changes either in the algorithm or in the source code. Throughout this document we show how to achieve that goal and what techniques, resources and steps have been chosen around application and platform Kratos and Mare Nostrum would be the main elements in this document in order to help make easier future migrations, either with others applications on similar or different platforms

    ADAM : a decentralized parallel computer architecture featuring fast thread and data migration and a uniform hardware abstraction

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002.Includes bibliographical references (p. 247-256).The furious pace of Moore's Law is driving computer architecture into a realm where the the speed of light is the dominant factor in system latencies. The number of clock cycles to span a chip are increasing, while the number of bits that can be accessed within a clock cycle is decreasing. Hence, it is becoming more difficult to hide latency. One alternative solution is to reduce latency by migrating threads and data, but the overhead of existing implementations has previously made migration an unserviceable solution so far. I present an architecture, implementation, and mechanisms that reduces the overhead of migration to the point where migration is a viable supplement to other latency hiding mechanisms, such as multithreading. The architecture is abstract, and presents programmers with a simple, uniform fine-grained multithreaded parallel programming model with implicit memory management. In other words, the spatial nature and implementation details (such as the number of processors) of a parallel machine are entirely hidden from the programmer. Compiler writers are encouraged to devise programming languages for the machine that guide a programmer to express their ideas in terms of objects, since objects exhibit an inherent physical locality of data and code. The machine implementation can then leverage this locality to automatically distribute data and threads across the physical machine by using a set of high performance migration mechanisms.(cont.) An implementation of this architecture could migrate a null thread in 66 cycles - over a factor of 1000 improvement over previous work. Performance also scales well; the time required to move a typical thread is only 4 to 5 times that of a null thread. Data migration performance is similar, and scales linearly with data block size. Since the performance of the migration mechanism is on par with that of an L2 cache, the implementation simulated in my work has no data caches and relies instead on multithreading and the migration mechanism to hide and reduce access latencies.by Andrew "bunnie" Huang.Ph.D

    Υψηλής Απόδοσης Υπολογιστές (HPC) και Υψηλού Επιπέδου Ανάλυση Δεδομένων (HDA), Ανάλυση και Σύγκριση των Δύο Κόσμων.

    Get PDF
    Η εργασία αυτή, πραγματεύεται το αντικείμενο των παράλληλων συστημάτων και πιο συγκεκριμένα την ανάλυση και τη σύγκριση δύο διαφορετικών κόσμων στον τομέα αυτό, αυτόν της υψηλής απόδοσης υπολογιστών (high performance computing) και της υψηλού επιπέδου ανάλυση δεδομένων (high end data analysis).Με τον πρώτο όρο εννοούμε την γρήγορη λύση πολύπλοκων προβλημάτων που απαιτούν μεγάλη υπολογιστική ισχύ ενώ με το δεύτερο τη γρήγορη και αποτελεσματική διαχείριση τεράστιων ποσοτήτων από δεδομένα που αποτελούν φαινόμενο της εποχής. Σκοπός της εργασίας αυτής πέρα από το να δώσει μία εκτεταμένη ανάλυση των δύο κόσμων που περιεγράφηκαν παραπάνω είναι να εξετάσει αν υπάρχει ανάγκη σύγκλισης. Στην εργασία αυτή παρουσιάζεται επίσης αναλυτικά ο τρόπος λειτουργίας των δύο αυτών κόσμων. Όσο αναφορά τους υπολογιστές υψηλής απόδοσης αυτό γίνεται μέσα από κώδικα που έτρεξε σε πραγματική συστάδα (cluster) με πραγματικά εργαλεία σε πραγματικές συνθήκες. Για την υψηλού επιπέδου ανάλυση δεδομένων επιλέχθηκε το Hadoop framework, το οποίο αναλύεται στην εργασία, και εγκαταστάθηκε από το μηδέν σε προσωπικό μηχάνημα. Τέλος τα συμπεράσματα της εργασίας είναι ότι πρέπει να θεωρείται δεδομένη η ανάγκη για σύγκλιση και ότι ήδη έχουν αρχίσει να γίνονται πολλά βήματα προς την κατεύθυνση αυτή.The main subject of this project is parallel systems and especially the comparison and analysis between the two different worlds of this section, high performance computing and high-end data analysis. With the first term we mean the fast solution of really complex problems demanding huge computational power while with the second term we mean the fast and effective management of huge amounts of data which is a phenomenon of our ages. The main purpose of this project is not only to give a full analysis of the two worlds described above but also to examine if there is a need for convergence between these two worlds. In this project is also presented in depth the way these two worlds work in real life. In high performance computing this is achieved through code running on a real cluster using real parallel tools. In high-end data analysis Hadoop framework was chosen and analyzed in depth. It is also installed from scratch in personal machine. Last but not least, concerning the results of this project, those are that the need for convergence is necessary and many steps are already been made towards that direction
    corecore