222 research outputs found

    Composite Iterative Algorithm and Architecture for q-th Root Calculation

    Get PDF
    An algorithm for the q-th root extraction, being q any integer, is presented in this paper. The algorithm is based on an optimized implementation of X^{1/q} by a sequence of parallel and/or overlapped operations: (1) reciprocal, (2) digit-recurrence logarithm, (3) left-to-right carry-free multiplication and (4) on-line exponential. A detailed error analysis and two architectures are proposed, for low precision q and for higher precision q. The execution time and hardware requirements are estimated for single and double precision floating-point computations for several radices; this helps to determine which radices result in the most efficient implementations. The architectures proposed improve the features of other architectures for q-th root extraction.Dans cet article, nous présentons un algorithme matériel pour l'extraction de la racine q-ième d'un nombre X, où q est un entier naturel non nul. Cet algorithme est basé sur une implantation optimisée de la fonction X^{1/q} par une séquence d'opérations parallèles et/ou superposées: (1) réciproque, (2) logarithme chiffre par chiffre, (3) multiplication de gauche-à-droite sans propagation de retenue et (4) exponentielle en ligne. Une analyse détaillée des erreurs et deux architectures sont proposées, pour q de basse précision et pour q de précision plus haute. Le temps d'exécution et les composants matériels à utiliser sont estimés pour des calculs en virgule flottante simple et double précision et pour plusieurs bases. Cette étude aide à déterminer quelles bases mènent aux implantations les plus efficaces. Les architectures proposées améliorent les caractéristiques d'architectures précédentes destinées à l'extraction des racines

    Computing cube root of a positive number

    Get PDF
    Proposed here is a new algorithm to compute the cube root of large positive integer. The algorithm is based on the implementation of long division method also known as manual method we usually use to find the square root of a number. To implement the long division method, the given number is first represented in a radix-10 representa and then Bino’s Model of Multiplication is used to systematically implement the long division method. A representa is a special array to represent a number in the form of an array so as to enable us to treat the representas in the same way as we treat numbers. This simplifies the difficulty of dealing large numbers in a computer. Also, at the same time it simplifies the implementation of long division method to find the cube root of positive number, ranging from single digit number to arbitrarily large positive number such as RSA challenge numbers. The algorithm can be used to compute cube root of a non-perfect cube number up to desired precision and each computed digit of cube root gives the best precision. Cube root of 2, 5, 10 up to 30 digits and integer parts of cube roots of first few and last few RSA challenge numbers are also provided in the experimental result to show that the algorithm works perfectly to compute the cube root of any positive integer, however small or large it may be.Keywords:Bino‘s Model of Multiplication, Convolution, Cube of a large number, Large number manipulation, Long division method, RSA challenge numbers, Representa, Cube root computatio

    Evaluating the communications capabilities of the generalized hypercube interconnection network

    Get PDF
    This thesis presents results of evaluating the communications capabilities of the generalized hypercube interconnection network. The generalized hypercube has outstanding topological properties, but it has not been implemented in a large scale because of its very high wiring complexity. For this reason, this network has not been studied extensively in the past. However, recent and expected technological advancements will soon render this network viable for massively parallel systems. We first present implementations of randomized many-to-all broadcasting and multicasting on generalized hypercubes, using as the basis the one-to-all broadcast algorithm presented in [3]. We test the proposed implementations under realistic communication traffic patterns and message generations, for the all-port model of communication. Our results show that the size of the intermediate message buffers has a significant effect on the total communication time, and this effect becomes very dramatic for large systems with large numbers of dimensions. We also propose a modification of this multicast algorithm that applies congestion control to improve its performance. The results illustrate a significant improvement in the total execution time and a reduction in the number of message contentions, and also prove that the generalized hypercube is a very versatile interconnection network

    Performance analysis of wormhole routing in multicomputer interconnection networks

    Get PDF
    Perhaps the most critical component in determining the ultimate performance potential of a multicomputer is its interconnection network, the hardware fabric supporting communication among individual processors. The message latency and throughput of such a network are affected by many factors of which topology, switching method, routing algorithm and traffic load are the most significant. In this context, the present study focuses on a performance analysis of k-ary n-cube networks employing wormhole switching, virtual channels and adaptive routing, a scenario of especial interest to current research. This project aims to build upon earlier work in two main ways: constructing new analytical models for k-ary n-cubes, and comparing the performance merits of cubes of different dimensionality. To this end, some important topological properties of k-ary n-cubes are explored initially; in particular, expressions are derived to calculate the number of nodes at/within a given distance from a chosen centre. These results are important in their own right but their primary significance here is to assist in the construction of new and more realistic analytical models of wormhole-routed k-ary n-cubes. An accurate analytical model for wormhole-routed k-ary n-cubes with adaptive routing and uniform traffic is then developed, incorporating the use of virtual channels and the effect of locality in the traffic pattern. New models are constructed for wormhole k-ary n-cubes, with the ability to simulate behaviour under adaptive routing and non-uniform communication workloads, such as hotspot traffic, matrix-transpose and digit-reversal permutation patterns. The models are equally applicable to unidirectional and bidirectional k-ary n-cubes and are significantly more realistic than any in use up to now. With this level of accuracy, the effect of each important network parameter on the overall network performance can be investigated in a more comprehensive manner than before. Finally, k-ary n-cubes of different dimensionality are compared using the new models. The comparison takes account of various traffic patterns and implementation costs, using both pin-out and bisection bandwidth as metrics. Networks with both normal and pipelined channels are considered. While previous similar studies have only taken account of network channel costs, our model incorporates router costs as well thus generating more realistic results. In fact the results of this work differ markedly from those yielded by earlier studies which assumed deterministic routing and uniform traffic, illustrating the importance of using accurate models to conduct such analyses

    A computer-aided design for digital filter implementation

    Get PDF
    Imperial Users onl

    An algorithm for redundant binary bit-pipelined rational arithmetic

    Full text link

    Distributed Duplicate Removal

    Get PDF
    Ziel der verteilten Duplikaterkennung ist die Identifikation von Elementen, welche mehrfach in einer großen, über mehrere Rechenknoten verteilten Datenmenge vorkommen. Sanders et al. [48] präsentieren einen verteilten Algorithmus, welcher dieses Problem in einer besonders kommunikationseffizienten Art und Weise löst. In einer Vorverarbeitungsphase werden mit Hilfe eines verteilten, platz-effizienten Bloom Filters zunächst möglichst viele distinkte Elemente als solche identifiziert und somit die Gesamtmenge der noch zu betrachtenden Elemente stark reduziert. Da hierbei jedoch auch falsch positive Ergebnisse auftreten, müssen alle als potentiell nicht distinkt erkannten Elemente in einer zweiten Phase noch einmal überprüft werden. Hierzu wird ein klassischer Hash-basierter Algorithmus zur verteilten Duplikaterkennung angewendet. Die vorliegende Arbeit ergänzt die theoretische Analyse durch eine praktische Evaluation. Wir erarbeiten hierzu eine effiziente Implementierung für Shared-Nothing Systeme. Besonders rechenintensive Schritte des Algorithmus werden zusätzlich durch Shared-Memory-Programmierung innerhalb eines Knotens parallelisiert. Die Ergebnisse unserer experimentellen Untersuchung untermauern die durch die Theorie vorhergesagten Vorteile des Algorithmus. Unsere Implementierung ist signifikant schneller als der am besten geeignete klassische Ansatz solange die Eingabedaten zu weniger als 50% aus Duplikaten bestehen. Wird der Algorithmus auf Datensätzen ausgeführt, die zu weniger als 10% aus Duplikaten bestehen, so ist das gesamte Kommunikationsvolumen zudem mehr als eine Größenordnung kleiner als das des klassischen Konkurrenten

    Serial-data computation in VLSI

    Get PDF
    corecore