27 research outputs found

    Scalable parallel communications

    Get PDF
    Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth service to a single application); and (3) coarse grain parallelism will be able to incorporate many future improvements from related work (e.g., reduced data movement, fast TCP, fine-grain parallelism) also with near linear speed-ups

    Multilevel Parallel Communications

    Get PDF
    The research reported in this thesis investigates the use of parallelism at multiple levels to realize high-speed networks that offer advantages in throughput, cost, reliability, and flexibility over alternative approaches. This research specifically considers use of parallelism at two levels: the upper level and the lower level. At the upper level, N protocol processors perform functions included in the transport and network layers. At the lower level, M channels provide data and physical layer functions. The resulting system provides very high bandwidth to an application. A key concept of this research is the use of replicated channels to provide a single, high bandwidth channel to a single application. The parallelism provided by the network is transparent to communicating applications, thus differentiating this strategy from schemes that provide a collection of disjoint channels between applications on different nodes. Another innovative aspect of this research is that parallelism is exploited at multiple layers of the network to provide high throughput not only at the physical layer, but also at upper protocol layers. Schedulers are used to distribute data from a single stream to multiple channels and to merge data from multiple channels to reconstruct a single coherent stream. High throughput is possible by providing the combined bandwidth of multiple channels to a single source and destination through use of parallelism at multiple protocol layers. This strategy is cost effective since systems can be built using standard technologies that benefit from the economies of a broad applications base. The exotic and revolutionary components needed in non-parallel approaches to build high speed networks are not required. The replicated channels can be used to achieve high reliability as well. Multilevel parallelism is flexible since the degree of parallelism provided at any level can be matched to protocol processing demands and application requirements

    Scheduling soft real-time jobs over dual non-real-time servers

    Get PDF
    In this paper, we consider soft real-time systems with redundant off-the-shelf processing components (e.g., CPU, disk, network), and show how applications can exploit the redundancy to improve the system's ability of meeting response time goals (soft deadlines). We consider two scheduling policies, one that evenly distributes load (Balance), and one that partitions load according to job slackness (Chop). We evaluate the effectiveness of these policies through analysis and simulation. Our results show that by intelligently distributing jobs by their slackness amount the servers, Chop can significantly improve real-time performance. ©1996 IEEE.published_or_final_versio

    Overview on: sequencing in mixed model flowshop production line with static and dynamic context

    Get PDF
    In the present work a literature overview was given on solution techniques considering basic as well as more advanced and consequently more complex arrangements of mixed model flowshops. We first analyzed the occurrence of setup time/cost; existing solution techniques are mainly focused on permutation sequences. Thereafter we discussed objectives resulting in the introduction of variety of methods allowing resequencing of jobs within the line. The possibility of resequencing within the line ranges from 1) offline or intermittent buffers, 2) parallel stations, namely flexible, hybrid or compound flowshops, 3) merging and splitting of parallel lines, 4) re-entrant flowshops, to 5) change job attributes without physically interchanging the position. In continuation the differences in the consideration of static and dynamic demand was studied. Also intermittent setups are possible, depending on the horizon and including the possibility of resequencing, four problem cases were highlighted: static, semi dynamic, nearly dynamic and dynamic case. Finally a general overview was given on existing solution methods, including exact and approximation methods. The approximation methods are furthermore divided in two cases, know as heuristics and methaheuristic

    Resource Management in Multimedia Networked Systems

    Get PDF
    Error-free multimedia data processing and communication includes providing guaranteed services such as the colloquial telephone. A set of problems have to be solved and handled in the control-management level of the host and underlying network architectures. We discuss in this paper \u27resource management\u27 at the host and network level, and their cooperation to achieve global guaranteed transmission and presentation services, which means end-to-end guarantees. The emphasize is on \u27network resources\u27 (e.g., bandwidth, buffer space) and \u27host resources\u27 (e.g., CPU processing time) which need to be controlled in order to satisfy the Quality of Service (QoS) requirements set by the users of the multimedia networked system. The control of the specified resources involves three actions: (1) properly allocate resources (end-to-end) during the multimedia call establishment, so that traffic can flow according to the QoS specification; (2) control resource allocation during the multimedia transmission; (3) adapt to changes when degradation of system components occurs. These actions imply the necessity of: (a) new services, such as admission services, at the hosts and intermediate network nodes; (b) new protocols for establishing connections which satisfy QoS requirements along the path from send to receiver(s), such as resource reservation protocol; (c) new control algorithms for delay, rate and error control; (d) new resource monitoring protocols for reporting system changes, such as resource administration protocol; (e) new adaptive schemes for dynamic resource allocation to respond to system changes; and (f) new architectures at the hosts and switches to accommodate the resource management entities. This article gives an overview of services, mechanisms and protocols for resource management as outlined above

    Accuracy and Dynamics of Hash-Based Load Balancing Algorithms for Multipath Internet Routing

    Get PDF
    This paper studies load balancing for multipath Internet routing. We focus on hash-based load balancing algorithms that work on the flow level to avoid packet reordering which is detrimental for the throughput of transport layer protocols like TCP. We propose a classification of hash-based load balancing algorithms, review existing ones and suggest new ones. Dynamic algorithms can actively react to load imbalances which causes route changes for some flows and thereby again packet reordering. Therefore, we investigate the load balancing accuracy and flow reassignment rate of load balancing algorithms. Our exhaustive simulation experiments show that these performance measures depend significantly on the traffic properties and on the algorithms themselves. As a consequence, our results should be taken into account for the application of load balancing in practice

    Simulation and analysis of adaptive routing and flow control in wide area communication networks

    Get PDF
    This thesis presents the development of new simulation and analytic models for the performance analysis of wide area communication networks. The models are used to analyse adaptive routing and flow control in fully connected circuit switched and sparsely connected packet switched networks. In particular the performance of routing algorithms derived from the L(_R-I) linear learning automata model are assessed for both types of network. A novel architecture using the INMOS Transputer is constructed for simulation of both circuit and packet switched networks in a loosely coupled multi- microprocessor environment. The network topology is mapped onto an identically configured array of processing centres to overcome the processing bottleneck of conventional Von Neumann architecture machines. Previous analytic work in circuit switched work is extended to include both asymmetrical networks and adaptive routing policies. In the analysis of packet switched networks analytic models of adaptive routing and flow control are integrated to produce a powerful, integrated environment for performance analysis The work concludes that routing algorithms based on linear learning automata have significant potential in both fully connected circuit switched networks and sparsely connected packet switched networks

    FPGA acceleration of sequence analysis tools in bioinformatics

    Full text link
    Thesis (Ph.D.)--Boston UniversityWith advances in biotechnology and computing power, biological data are being produced at an exceptional rate. The purpose of this study is to analyze the application of FPGAs to accelerate high impact production biosequence analysis tools. Compared with other alternatives, FPGAs offer huge compute power, lower power consumption, and reasonable flexibility. BLAST has become the de facto standard in bioinformatic approximate string matching and so its acceleration is of fundamental importance. It is a complex highly-optimized system, consisting of tens of thousands of lines of code and a large number of heuristics. Our idea is to emulate the main phases of its algorithm on FPGA. Utilizing our FPGA engine, we quickly reduce the size of the database to a small fraction, and then use the original code to process the query. Using a standard FPGA-based system, we achieved 12x speedup over a highly optimized multithread reference code. Multiple Sequence Alignment (MSA)--the extension of pairwise Sequence Alignment to multiple Sequences--is critical to solve many biological problems. Previous attempts to accelerate Clustal-W, the most commonly used MSA code, have directly mapped a portion of the code to the FPGA. We use a new approach: we apply prefiltering of the kind commonly used in BLAST to perform the initial all-pairs alignments. This results in a speedup of from 8Ox to 190x over the CPU code (8 cores). The quality is comparable to the original according to a commonly used benchmark suite evaluated with respect to multiple distance metrics. The challenge in FPGA-based acceleration is finding a suitable application mapping. Unfortunately many software heuristics do not fall into this category and so other methods must be applied. One is restructuring: an entirely new algorithm is applied. Another is to analyze application utilization and develop accuracy/performance tradeoffs. Using our prefiltering approach and novel FPGA programming models we have achieved significant speedup over reference programs. We have applied approximation, seeding, and filtering to this end. The bulk of this study is to introduce the pros and cons of these acceleration models for biosequence analysis tools

    Multistage Packet-Switching Fabrics for Data Center Networks

    Get PDF
    Recent applications have imposed stringent requirements within the Data Center Network (DCN) switches in terms of scalability, throughput and latency. In this thesis, the architectural design of the packet-switches is tackled in different ways to enable the expansion in both the number of connected endpoints and traffic volume. A cost-effective Clos-network switch with partially buffered units is proposed and two packet scheduling algorithms are described. The first algorithm adopts many simple and distributed arbiters, while the second approach relies on a central arbiter to guarantee an ordered packet delivery. For an improved scalability, the Clos switch is build using a Network-on-Chip (NoC) fabric instead of the common crossbar units. The Clos-UDN architecture made with Input-Queued (IQ) Uni-Directional NoC modules (UDNs) simplifies the input line cards and obviates the need for the costly Virtual Output Queues (VOQs). It also avoids the need for complex, and synchronized scheduling processes, and offers speedup, load balancing, and good path diversity. Under skewed traffic, a reliable micro load-balancing contributes to boosting the overall network performance. Taking advantage of the NoC paradigm, a wrapped-around multistage switch with fully interconnected Central Modules (CMs) is proposed. The architecture operates with a congestion-aware routing algorithm that proactively distributes the traffic load across the switching modules, and enhances the switch performance under critical packet arrivals. The implementation of small on-chip buffers has been made perfectly feasible using the current technology. This motivated the implementation of a large switching architecture with an Output-Queued (OQ) NoC fabric. The design merges assets of the output queuing, and NoCs to provide high throughput, and smooth latency variations. An approximate analytical model of the switch performance is also proposed. To further exploit the potential of the NoC fabrics and their modularity features, a high capacity Clos switch with Multi-Directional NoC (MDN) modules is presented. The Clos-MDN switching architecture exhibits a more compact layout than the Clos-UDN switch. It scales better and faster in port count and traffic load. Results achieved in this thesis demonstrate the high performance, expandability and programmability features of the proposed packet-switches which makes them promising candidates for the next-generation data center networking infrastructure

    Multistage Packet-Switching Fabrics for Data Center Networks

    Get PDF
    Recent applications have imposed stringent requirements within the Data Center Network (DCN) switches in terms of scalability, throughput and latency. In this thesis, the architectural design of the packet-switches is tackled in different ways to enable the expansion in both the number of connected endpoints and traffic volume. A cost-effective Clos-network switch with partially buffered units is proposed and two packet scheduling algorithms are described. The first algorithm adopts many simple and distributed arbiters, while the second approach relies on a central arbiter to guarantee an ordered packet delivery. For an improved scalability, the Clos switch is build using a Network-on-Chip (NoC) fabric instead of the common crossbar units. The Clos-UDN architecture made with Input-Queued (IQ) Uni-Directional NoC modules (UDNs) simplifies the input line cards and obviates the need for the costly Virtual Output Queues (VOQs). It also avoids the need for complex, and synchronized scheduling processes, and offers speedup, load balancing, and good path diversity. Under skewed traffic, a reliable micro load-balancing contributes to boosting the overall network performance. Taking advantage of the NoC paradigm, a wrapped-around multistage switch with fully interconnected Central Modules (CMs) is proposed. The architecture operates with a congestion-aware routing algorithm that proactively distributes the traffic load across the switching modules, and enhances the switch performance under critical packet arrivals. The implementation of small on-chip buffers has been made perfectly feasible using the current technology. This motivated the implementation of a large switching architecture with an Output-Queued (OQ) NoC fabric. The design merges assets of the output queuing, and NoCs to provide high throughput, and smooth latency variations. An approximate analytical model of the switch performance is also proposed. To further exploit the potential of the NoC fabrics and their modularity features, a high capacity Clos switch with Multi-Directional NoC (MDN) modules is presented. The Clos-MDN switching architecture exhibits a more compact layout than the Clos-UDN switch. It scales better and faster in port count and traffic load. Results achieved in this thesis demonstrate the high performance, expandability and programmability features of the proposed packet-switches which makes them promising candidates for the next-generation data center networking infrastructure
    corecore