91 research outputs found

    Novel techniques in large scaleable ATM switches

    Get PDF
    Bibliography: p. 172-178.This dissertation explores the research area of large scale ATM switches. The requirements for an ATM switch are determined by overviewing the ATM network architecture. These requirements lead to the discussion of an abstract ATM switch which illustrates the components of an ATM switch that automatically scale with increasing switch size (the Input Modules and Output Modules) and those that do not (the Connection Admission Control and Switch Management systems as well as the Cell Switch Fabric). An architecture is suggested which may result in a scalable Switch Management and Connection Admission Control function. However, the main thrust of the dissertation is confined to the cell switch fabric. The fundamental mathematical limits of ATM switches and buffer placement is presented next emphasising the desirability of output buffering. This is followed by an overview of the possible routing strategies in a multi-stage interconnection network. A variety of space division switches are then considered which leads to a discussion of the hypercube fabric, (a novel switching technique). The hypercube fabric achieves good performance with an O(N.log₂N)²) scaling. The output module, resequencing, cell scheduling and output buffering technique is presented leading to a complete description of the proposed ATM switch. Various traffic models are used to quantify the switch's performance. These include a simple exponential inter-arrival time model, a locality of reference model and a self-similar, bursty, multiplexed Variable Bit Rate (VBR) model. FIFO queueing is simple to implement in an ATNI switch, however, more responsive queueing strategies can result in an improved performance. An associative memory is presented which allows the separate queues in the ATM switch to be effectively logically combined into a single FIFO queue. The associative memory is described in detail and its feasibility is shown by laying out the Integrated Circuit masks and performing an analogue simulation of the IC's performance is SPICE3. Although optimisations were required to the original design, the feasibility of the approach is shown with a 15Ƞs write time and a 160Ƞs read time for a 32 row, 8 priority bit, 10 routing bit version of the memory. This is achieved with 2µm technology, more advanced technologies may result in even better performance. The various traffic models and switch models are simulated in a number of runs. This shows the performance of the hypercube which outperforms a Clos network of equivalent technology and approaches the performance of an ideal reference fabric. The associative memory leverages a significant performance advantage in the hypercube network and a modest advantage in the Clos network. The performance of the switches is shown to degrade with increasing traffic density, increasing locality of reference, increasing variance in the cell rate and increasing burst length. Interestingly, the fabrics show no real degradation in response to increasing self similarity in the fabric. Lastly, the appendices present suggestions on how redundancy, reliability and multicasting can be achieved in the hypercube fabric. An overview of integrated circuits is provided. A brief description of commercial ATM switching products is given. Lastly, a road map to the simulation code is provided in the form of descriptions of the functionality found in all of the files within the source tree. This is intended to provide the starting ground for anyone wishing to modify or extend the simulation system developed for this thesis

    Approaches to parallel performance prediction

    Get PDF

    Path switching over multirate Benes network.

    Get PDF
    Mui Sze Wai.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 62-65).Abstracts in English and Chinese.Chapter 1. --- Introduction --- p.1Chapter 1.1 --- Evolution of Multirate Networks --- p.2Chapter 1.2 --- Some Results from Previous Work --- p.2Chapter 1.3 --- Multirate Traffic on Benes Network --- p.5Chapter 1.4 --- Organization --- p.7Chapter 2. --- Background Knowledge on Benes Network and Path Switching --- p.8Chapter 2.1 --- Benes Network --- p.9Chapter 2.1.1 --- Construction of Large Switching Fabrics --- p.9Chapter 2.1.2 --- Routing in Benes Network --- p.11Chapter 2.1.3 --- Performance when Operated as a Large Switch Fabric --- p.13Chapter 2.2 --- Path Switching --- p.14Chapter 2.2.1 --- Basic Concept of Path Switching --- p.14Chapter 2.2.2 --- Capacity Allocation and Route Assignment --- p.15Chapter 3. --- Path Switching over Benes Network --- p.20Chapter 3.1 --- The Model of path-switched Benes Network --- p.21Chapter 3.2 --- Module-to-Module Implementation --- p.21Chapter 3.2.1 --- The First Stage (Input Module) --- p.22Chapter 3.2.2 --- The Middle Stage (Central Module) --- p.23Chapter 3.2.3 --- The Last Stage (Output Module) --- p.24Chapter 3.3 --- Port-to-Port Implementation --- p.24Chapter 3.3.1 --- Uniform Traffic --- p.25Chapter 3.3.2 --- Mult irate Traffic --- p.26Chapter 3.4 --- Closing remarks --- p.29Chapter 4. --- Performance Analysis --- p.31Chapter 4.1 --- Traffic Constraints and Perform- ance Guarantees --- p.32Chapter 4.1.1 --- Arrival Curve and Service Curve --- p.33Chapter 4.1.2 --- Delay Bound and Backlog Bound --- p.36Chapter 4.2 --- Service Guarantees --- p.39Chapter 4.3 --- Deterministic Bounds --- p.42Chapter 4.3.1 --- Delay --- p.42Chapter 4.3.2 --- Backlog at Input Module --- p.44Chapter 4.3.3 --- Backlog at Output Module --- p.47Chapter 5. --- Simulation Results --- p.52Chapter 5.1 --- Uniform Traffic --- p.53Chapter 5.2 --- Multirate Traffic --- p.55Chapter 6. --- Conclusions and Future Research --- p.59Chapter 6.1 --- Suggestions for future research --- p.61Bibliography --- p.6

    A Framework for File Format Fuzzing with Genetic Algorithms

    Get PDF
    Secure software, meaning software free from vulnerabilities, is desirable in today\u27s marketplace. Consumers are beginning to value a product\u27s security posture as well as its functionality. Software development companies are recognizing this trend, and they are factoring security into their entire software development lifecycle. Secure development practices like threat modeling, static analysis, safe programming libraries, run-time protections, and software verification are being mandated during product development. Mandating these practices improves a product\u27s security posture before customer delivery, and these practices increase the difficulty of discovering and exploiting vulnerabilities. Since the 1980\u27s, security researchers have uncovered software defects by fuzz testing an application. In fuzz testing\u27s infancy, randomly generated data could discover multiple defects quickly. However, as software matures and software development companies integrate secure development practices into their development life cycles, fuzzers must apply more sophisticated techniques in order to retain their ability to uncover defects. Fuzz testing must evolve, and fuzz testing practitioners must devise new algorithms to exercise an application in unexpected ways. This dissertation\u27s objective is to create a proof-of-concept genetic algorithm fuzz testing framework to exercise an application\u27s file format parsing routines. The framework includes multiple genetic algorithm variations, provides a configuration scheme, and correlates data gathered from static and dynamic analysis to guide negative test case evolution. Experiments conducted for this dissertation illustrate the effectiveness of a genetic algorithm fuzzer in comparison to standard fuzz testing tools. The experiments showcase a genetic algorithm fuzzer\u27s ability to discover multiple unique defects within a limited number of negative test cases. These experiments also highlight an application\u27s increased execution time when fuzzing with a genetic algorithm. To combat increased execution time, a distributed architecture is implemented and additional experiments demonstrate a decrease in execution time comparable to standard fuzz testing tools. A final set of experiments provide guidance on fitness function selection with a CHC genetic algorithm fuzzer with different population size configurations

    OMICRON : a parallel computer architecture for declarative languages

    Get PDF
    Imperial Users onl

    Research in progress in applied mathematics, numerical analysis, fluid mechanics, and computer science

    Get PDF
    This report summarizes research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, fluid mechanics, and computer science during the period October 1, 1993 through March 31, 1994. The major categories of the current ICASE research program are: (1) applied and numerical mathematics, including numerical analysis and algorithm development; (2) theoretical and computational research in fluid mechanics in selected areas of interest to LaRC, including acoustics and combustion; (3) experimental research in transition and turbulence and aerodynamics involving LaRC facilities and scientists; and (4) computer science

    Optical flow switched networks

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Includes bibliographical references (p. 253-279).In the four decades since optical fiber was introduced as a communications medium, optical networking has revolutionized the telecommunications landscape. It has enabled the Internet as we know it today, and is central to the realization of Network-Centric Warfare in the defense world. Sustained exponential growth in communications bandwidth demand, however, is requiring that the nexus of innovation in optical networking continue, in order to ensure cost-effective communications in the future. In this thesis, we present Optical Flow Switching (OFS) as a key enabler of scalable future optical networks. The general idea behind OFS-agile, end-to-end, all-optical connections-is decades old, if not as old as the field of optical networking itself. However, owing to the absence of an application for it, OFS remained an underdeveloped idea-bereft of how it could be implemented, how well it would perform, and how much it would cost relative to other architectures. The contributions of this thesis are in providing partial answers to these three broad questions. With respect to implementation, we address the physical layer design of OFS in the metro-area and access, and develop sensible scheduling algorithms for OFS communication. Our performance study comprises a comparative capacity analysis for the wide-area, as well as an analytical approximation of the throughput-delay tradeoff offered by OFS for inter-MAN communication. Lastly, with regard to the economics of OFS, we employ an approximate capital expenditure model, which enables a throughput-cost comparison of OFS with other prominent candidate architectures. Our conclusions point to the fact that OFS offers significant advantage over other architectures in economic scalability.(cont.) In particular, for sufficiently heavy traffic, OFS handles large transactions at far lower cost than other optical network architectures. In light of the increasing importance of large transactions in both commercial and defense networks, we conclude that OFS may be crucial to the future viability of optical networking.by Guy E. Weichenberg.Ph.D

    The fast multipole method at exascale

    Get PDF
    This thesis presents a top to bottom analysis on designing and implementing fast algorithms for current and future systems. We present new analysis, algorithmic techniques, and implementations of the Fast Multipole Method (FMM) for solving N- body problems. We target the FMM because it is broadly applicable to a variety of scientific particle simulations used to study electromagnetic, fluid, and gravitational phenomena, among others. Importantly, the FMM has asymptotically optimal time complexity with guaranteed approximation accuracy. As such, it is among the most attractive solutions for scalable particle simulation on future extreme scale systems. We specifically address two key challenges. The first challenge is how to engineer fast code for today’s platforms. We present the first in-depth study of multicore op- timizations and tuning for FMM, along with a systematic approach for transforming a conventionally-parallelized FMM into a highly-tuned one. We introduce novel opti- mizations that significantly improve the within-node scalability of the FMM, thereby enabling high-performance in the face of multicore and manycore systems. The second challenge is how to understand scalability on future systems. We present a new algorithmic complexity analysis of the FMM that considers both intra- and inter- node communication costs. Using these models, we present results for choosing the optimal algorithmic tuning parameter. This analysis also yields the surprising prediction that although the FMM is largely compute-bound today, and therefore highly scalable on current systems, the trajectory of processor architecture designs, if there are no significant changes could cause it to become communication-bound as early as the year 2015. This prediction suggests the utility of our analysis approach, which directly relates algorithmic and architectural characteristics, for enabling a new kind of highlevel algorithm-architecture co-design. To demonstrate the scientific significance of FMM, we present two applications namely, direct simulation of blood which is a multi-scale multi-physics problem and large-scale biomolecular electrostatics. MoBo (Moving Boundaries) is the infrastruc- ture for the direct numerical simulation of blood. It comprises of two key algorithmic components of which FMM is one. We were able to simulate blood flow using Stoke- sian dynamics on 200,000 cores of Jaguar, a peta-flop system and achieve a sustained performance of 0.7 Petaflop/s. The second application we propose as future work in this thesis is biomolecular electrostatics where we solve for the electrical potential using the boundary-integral formulation discretized with boundary element methods (BEM). The computational kernel in solving the large linear system is dense matrix vector multiply which we propose can be calculated using our scalable FMM. We propose to begin with the two dielectric problem where the electrostatic field is cal- culated using two continuum dielectric medium, the solvent and the molecule. This is only a first step to solving biologically challenging problems which have more than two dielectric medium, ion-exclusion layers, and solvent filled cavities. Finally, given the difficulty in producing high-performance scalable code, productivity is a key concern. Recently, numerical algorithms are being redesigned to take advantage of the architectural features of emerging multicore processors. These new classes of algorithms express fine-grained asynchronous parallelism and hence reduce the cost of synchronization. We performed the first extensive performance study of a recently proposed parallel programming model, called Concurrent Collections (CnC). In CnC, the programmer expresses her computation in terms of application-specific operations, partially-ordered by semantic scheduling constraints. The CnC model is well-suited to expressing asynchronous-parallel algorithms, so we evaluate CnC using two dense linear algebra algorithms in this style for execution on state-of-the-art mul- ticore systems. Our implementations in CnC was able to match and in some cases even exceed competing vendor-tuned and domain specific library codes. We combine these two distinct research efforts by expressing FMM in CnC, our approach tries to marry performance with productivity that will be critical on future systems. Looking forward, we would like to extend this to distributed memory machines, specifically implement FMM in the new distributed CnC, distCnC to express fine-grained paral- lelism which would require significant effort in alternative models.Ph.D

    Throughput and Delay on the Packet Switched Internet

    Get PDF
    The Internet has become a vital and essential part of modern everyday life. Services delivered by the Internet are used by people across the planet every moment of every day of the year. The Internet has proven a positive force for good improving the lives of billions of people worldwide. The power of the Internet to deliver this positive good to humanity relies on its ability to deliver life improving services. In my doctorate work culminating in this dissertation I have striven to sustain and increase the Internet's ability to deliver these services and to have a positive good effect upon humanity.The overarching purpose of this dissertation is to improve the Internet's ability to deliver life improving services. I have further divided this purpose into two goals. To improve the ability of applications operating in challenging network conditions to gain their fair share of the bandwidth resources and to reduce the delay with which these services are delivered. Every service delivered by the Internet consists of Internet objects that are delivered through communication paths across the Internet. The delivery of these objects is defined by the two characteristics; Throughput and delay. Throughput determines how much of an object can be delivered over a period of time and delay determines how long it takes to deliver an object.These two characteristics determine the Internet's ability to deliver objects across communication paths. Improving these two characteristics (bandwidth and delay) increase the ability of the Internet to deliver objects and thus improve the Internet's capability to deliver life improving services. To accomplish this goal I present projects along three areas of effort. These three areas of effort are: (1) Increase the ability of applications operating in challenging conditions to achieve their fair share of bandwidth. (2) Synthesize knowledge required to address the effort to reduce delay. (3) Develop protocols that reduce delay encountered in the communications paths of the Internet.In this dissertation I present projects along these three areas of effort that accomplish the two goals (increase bandwidth and reduce delay) to achieve the purpose of improving the Internet's ability to deliver essential and life improving services. These projects and their organization into areas of effort, goals and purpose are my contributions to the networking sciences

    Techniques for Transparent Parallelization of Discrete Event Simulation Models

    Get PDF
    Simulation is a powerful technique to represent the evolution of real-world phenomena or systems over time. It has been extensively used in different research fields (from medicine to biology, to economy, and to disaster rescue) to study the behaviour of complex systems during their evolution (symbiotic simulation) or before their actual realization (what-if analysis). A traditional way to achieve high performance simulations is the employment of Parallel Discrete Event Simulation (PDES) techniques, which are based on the partitioning of the simulation model into Logical Processes (LPs) that can execute events in parallel on different CPUs and/or different CPU cores, and rely on synchronization mechanisms to achieve causally consistent execution of simulation events. As it is well recognized, the optimistic synchronization approach, namely the Time Warp protocol, which is based on rollback for recovering possible timestamp-order violations due to the absence of block-until-safe policies for event processing, is likely to favour speedup in general application/ architectural contexts. However, the optimistic PDES paradigm implicitly relies on a programming model that shifts from traditional sequential-style programming, given that there is no notion of global address space (fully accessible while processing events at any LP). Furthermore, there is the underlying assumption that the code associated with event handlers cannot execute unrecoverable operations given their speculative processing nature. Nevertheless, even though no unrecoverable action is ever executed by event handlers, a means to actually undo the action if requested needs to be devised and implemented within the software stack. On the other hand, sequential-style programming is an easy paradigm for the development of simulation code, given that it does not require the programmer to reason about memory partitioning (and therefore message passing) and speculative (concurrent) processing of the application. In this thesis, we present methodological and technical innovations which will show how it is possible, by developing innovative runtime mechanisms, to allow a programmer to implement its simulation model in a fully sequential way, and have the underlying simulation framework to execute it in parallel according to speculative processing techniques. Some of the approaches we provide show applicability in either shared- or distributed-memory systems, while others will be specifically tailored to multi/many-core architectures. We will clearly show, during the development of these supports, what is the effect on performance of these solutions, which will nevertheless be negligible, allowing a fruitful exploitation of the available computing power. In the end, we will highlight which are the clear benefits on the programming model tha
    • …
    corecore