235 research outputs found

    Performance evaluation of an open distributed platform for realistic traffic generation

    Get PDF
    Network researchers have dedicated a notable part of their efforts to the area of modeling traffic and to the implementation of efficient traffic generators. We feel that there is a strong demand for traffic generators capable to reproduce realistic traffic patterns according to theoretical models and at the same time with high performance. This work presents an open distributed platform for traffic generation that we called distributed internet traffic generator (D-ITG), capable of producing traffic (network, transport and application layer) at packet level and of accurately replicating appropriate stochastic processes for both inter departure time (IDT) and packet size (PS) random variables. We implemented two different versions of our distributed generator. In the first one, a log server is in charge of recording the information transmitted by senders and receivers and these communications are based either on TCP or UDP. In the other one, senders and receivers make use of the MPI library. In this work a complete performance comparison among the centralized version and the two distributed versions of D-ITG is presented

    Parallel symbolic state-space exploration is difficult, but what is the alternative?

    Full text link
    State-space exploration is an essential step in many modeling and analysis problems. Its goal is to find the states reachable from the initial state of a discrete-state model described. The state space can used to answer important questions, e.g., "Is there a dead state?" and "Can N become negative?", or as a starting point for sophisticated investigations expressed in temporal logic. Unfortunately, the state space is often so large that ordinary explicit data structures and sequential algorithms cannot cope, prompting the exploration of (1) parallel approaches using multiple processors, from simple workstation networks to shared-memory supercomputers, to satisfy large memory and runtime requirements and (2) symbolic approaches using decision diagrams to encode the large structured sets and relations manipulated during state-space generation. Both approaches have merits and limitations. Parallel explicit state-space generation is challenging, but almost linear speedup can be achieved; however, the analysis is ultimately limited by the memory and processors available. Symbolic methods are a heuristic that can efficiently encode many, but not all, functions over a structured and exponentially large domain; here the pitfalls are subtler: their performance varies widely depending on the class of decision diagram chosen, the state variable order, and obscure algorithmic parameters. As symbolic approaches are often much more efficient than explicit ones for many practical models, we argue for the need to parallelize symbolic state-space generation algorithms, so that we can realize the advantage of both approaches. This is a challenging endeavor, as the most efficient symbolic algorithm, Saturation, is inherently sequential. We conclude by discussing challenges, efforts, and promising directions toward this goal

    YADL : a general purpose SDSM system

    Full text link
    Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal

    ATCOM: Automatically tuned collective communication system for SMP clusters.

    Get PDF
    Conventional implementations of collective communications are based on point-to-point communications, and their optimizations have been focused on efficiency of those communication algorithms. However, point-to-point communications are not the optimal choice for modern computing clusters of SMPs due to their two-level communication structure. In recent years, a few research efforts have investigated efficient collective communications for SMP clusters. This dissertation is focused on platform-independent algorithms and implementations in this area;There are two main approaches to implementing efficient collective communications for clusters of SMPs: using shared memory operations for intra-node communications, and over-lapping inter-node/intra-node communications. The former fully utilizes the hardware based shared memory of an SMP, and the latter takes advantage of the inherent hierarchy of the communications within a cluster of SMPs. Previous studies focused on clusters of SMP from certain vendors. However, the previously proposed methods are not portable to other systems. Because the performance optimization issue is very complicated and the developing process is very time consuming, it is highly desired to have self-tuning, platform-independent implementations. As proven in this dissertation, such an implementation can significantly outperform the other point-to-point based portable implementations and some platform-specific implementations;The dissertation describes in detail the architecture of the platform-independent implementation. There are four system components: shared memory-based collective communications, overlapping mechanisms for inter-node and intra-node communications, a prediction-based tuning module and a micro-benchmark based tuning module. Each component is carefully designed with the goal of automatic tuning in mind

    Factores de rendimiento asociados a SPMD

    Get PDF
    Actualmente existen muchas aplicaciones paralelas/distribuidas en las cuales SPMD es el paradigma más usado. Obtener un buen rendimiento en una aplicación paralela de este tipo es uno de los principales desafíos dada la gran cantidad de aplicaciones existentes. Este objetivo no es fácil de resolver ya que existe una gran variedad de configuraciones de hardware, y también la naturaleza de los problemas pueden ser variados así como la forma de implementarlos. En consecuencia, si no se considera adecuadamente la combinación "software/hardware" pueden aparecer problemas inherentes a una aplicación iterativa sin una jerarquía de control definida de acuerdo a este paradigma. En SPMD todos los procesos ejecutan el mismo código pero computan una sección diferente de los datos de entrada. Una solución a un posible problema del rendimiento es proponer una estrategia de balance de carga para homogeneizar el cómputo entre los diferentes procesos. En este trabajo analizamos el benchmark CG con cargas heterogéneas con la finalidad de detectar los posibles problemas de rendimiento en una aplicación real. Un factor que determina el rendimiento en esta aplicación es la cantidad de elementos nonzero contenida en la sección de matriz asignada a cada proceso. Determinamos que es posible definir una estrategia de balance de carga que puede ser implementada de forma dinámica y demostramos experimentalmente que el rendimiento de la aplicación puede mejorarse de forma significativa con dicha estrategia.There currently are many 'parallel/distributed' applications that use the SPMD paradigm. Getting a good performance in a parallel application of this type is a major challenge because of the large number of existing applications. This objective is not easily achieved because there are many hardware configurations possible, and also the nature of the problems can be varied as well as its implementation. Consequently, if not adequately consider the combination 'software/hardware' inherent problems can occur without an iterative application defined control hierarchy according to this paradigm. In SPMD all processes execute the same code but they compute a different section of the input data. In this paper we analyze the benchmark CG with heterogeneous loads in order to detect possible performance problems in a real application. One factor that determines the performance in this application is the number of elements nonzero contained in the array section assigned to each process. We determined that it is possible to define a load balancing strategy, which can be implemented dynamically, and we demonstrate experimentally that the application performance can be significantly improved with this approach.Actualment existeixen moltes aplicacions paral·leles/distribuïdes en les quals SPMD és el paradigma més emprat. Obtenir un bon rendiment en una aplicació paral·lela d'aquest tipus és un dels principals reptes donada la gran quantitat d'aplicacions existents. Aquest objectiu no és fàcil de resoldre donat que existeixen una gran varietat de configuracions de hardware, i també la naturalesa dels problemes pot ser variada així com la forma d'implementar-los. En conseqüència, si no es considera adequadament la combinació "software/hardware" poden aparèixer problemes inherents a una aplicació iterativa sense una jerarquia de control definida d'acord a aquest paradigma. En SPMD tots els processos executen el mateix codi però computen una secció diferent de les dades d'entrada. Una solució a un possible problema de rendiment es proposar una estratègia de balanceig de càrrega per homogeneïtzar el còmput entre els diferents processos. En aquest treball analitzem el benchmark CG amb càrregues heterogènies amb la finalitat de detectar els possibles problemes de rendiment en una aplicació real. Un factor que determina el rendiment en aquesta aplicació és la quantitat d'elements nonzero continguda en la secció de la matriu assignada a cada procés. Es determina que és possible definir una estratègia de balanceig de càrrega que pot ser implementada de forma dinàmica i es demostra de forma experimental que el rendiment de la aplicació pot millorar-se de forma significativa amb aquesta estratègia

    PERFORMANCE OPTIMIZATION OF A STRUCTURED CFD CODE - GHOST ON COMMODITY CLUSTER ARCHITECTURES

    Get PDF
    This thesis focuses on optimizing the performance of an in-house, structured, 2D CFD code – GHOST, on commodity cluster architectures. The basic philosophy of the work is to optimize the cache usage of the code by implementing efficient coding techniques without changing the underlying numerical algorithm. Various optimization techniques that were implemented and the resulting changes in performance have been presented. Two techniques, external and internal blocking that were implemented earlier to tune the performance of this code have been reviewed. What follows is further tuning effort in order to circumvent the problems associated with using the blocking techniques. Later, to establish the universality of the optimization techniques, testing has been done on more complicated test case. All the techniques presented in this thesis have been tested on steady, laminar test cases. It has been proved that optimized versions of the code achieve better performances on variety of commodity cluster architectures chosen in this study

    Postprocesamiento CAM-ROBOTICA orientado al prototipado y mecanizado en células robotizadas complejas

    Full text link
    The main interest of this thesis consists of the study and implementation of postprocessors to adapt the toolpath generated by a Computer Aided Manufacturing (CAM) system to a complex robotic workcell of eight joints, devoted to the rapid prototyping of 3D CAD-defined products. It consists of a 6R industrial manipulator mounted on a linear track and synchronized with a rotary table. To accomplish this main objective, previous work is required. Each task carried out entails a methodology, objective and partial results that complement each other, namely: - It is described the architecture of the workcell in depth, at both displacement and joint-rate levels, for both direct and inverse resolutions. The conditioning of the Jacobian matrix is described as kinetostatic performance index to evaluate the vicinity to singular postures. These ones are analysed from a geometric point of view. - Prior to any machining, the additional external joints require a calibration done in situ, usually in an industrial environment. A novel Non-contact Planar Constraint Calibration method is developed to estimate the external joints configuration parameters by means of a laser displacement sensor. - A first control is originally done by means of a fuzzy inference engine at the displacement level, which is integrated within the postprocessor of the CAM software. - Several Redundancy Resolution Schemes (RRS) at the joint-rate level are compared for the configuration of the postprocessor, dealing not only with the additional joints (intrinsic redundancy) but also with the redundancy due to the symmetry on the milling tool (functional redundancy). - The use of these schemes is optimized by adjusting two performance criterion vectors related to both singularity avoidance and maintenance of a preferred reference posture, as secondary tasks to be done during the path tracking. Two innovative fuzzy inference engines actively adjust the weight of each joint in these tasks.Andrés De La Esperanza, FJ. (2011). Postprocesamiento CAM-ROBOTICA orientado al prototipado y mecanizado en células robotizadas complejas [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/10627Palanci

    An analysis of key generation efficiency of RSA cryptosystem in distributed environments

    Get PDF
    Thesis (Master)--Izmir Institute of Technology, Computer Engineering, Izmir, 2005Includes bibliographical references (leaves: 68)Text in English Abstract: Turkish and Englishix, 74 leavesAs the size of the communication through networks and especially through Internet grew, there became a huge need for securing these connections. The symmetric and asymmetric cryptosystems formed a good complementary approach for providing this security. While the asymmetric cryptosystems were a perfect solution for the distribution of the keys used by the communicating parties, they were very slow for the actual encryption and decryption of the data flowing between them. Therefore, the symmetric cryptosystems perfectly filled this space and were used for the encryption and decryption process once the session keys had been exchanged securely. Parallelism is a hot research topic area in many different fields and being used to deal with problems whose solutions take a considerable amount of time. Cryptography is no exception and, computer scientists have discovered that parallelism could certainly be used for making the algorithms for asymmetric cryptosystems go faster and the experimental results have shown a good promise so far. This thesis is based on the parallelization of a famous public-key algorithm, namely RSA
    corecore