10 research outputs found

    Design and validation of a scalable Digital Wireless Channel Emulator using an FPGA computing cluster

    Get PDF
    A Digital Wireless Channel Emulator (DWCE) is a system that is capable of emulating the RF environment for a group of wireless devices. The use of digital wireless channel emulators with networking radios is hampered by the inability to efficiently scale a DWCE to a large number of nodes. If such a large scale digital wireless channel emulator were to exist, a significant amount of time and money could be saved by testing networking radios in a laboratory before running lengthy and costly field tests. By utilizing the repeatability of a laboratory environment it will be possible to investigate and solve issues more quickly and efficiently. This will enable the performance of the radios to be known with a high degree of certainty before they are brought to the field. This dissertation investigates the use of an FPGA cluster configured as a distributed system to provide the computational and network structure to scale a DWCE to support 1250 or more wireless devices. This number of wireless devices is approximately two orders of magnitude larger than any other documented system. In this dissertation, the term ”scale” used for a DWCE is defined as an increase of three key factors: number of wireless devices, signal bandwidth emulated, and the fidelity of the emulation. It is possible to make tradeoffs and reduce any one of these to increase the other two. This dissertation shows a DWCE that can increase all of these factors in an efficient manner and thoroughly investigates the fidelity of the emulation it produces

    Parallel simulation techniques for telecommunication network modelling

    Get PDF
    In this thesis, we consider the application of parallel simulation to the performance modelling of telecommunication networks. A largely automated approach was first explored using a parallelizing compiler to speed up the simulation of simple models of circuit-switched networks. This yielded reasonable results for relatively little effort compared with other approaches. However, more complex simulation models of packet- and cell-based telecommunication networks, requiring the use of discrete event techniques, need an alternative approach. A critical review of parallel discrete event simulation indicated that a distributed model components approach using conservative or optimistic synchronization would be worth exploring. Experiments were therefore conducted using simulation models of queuing networks and Asynchronous Transfer Mode (ATM) networks to explore the potential speed-up possible using this approach. Specifically, it is shown that these techniques can be used successfully to speed-up the execution of useful telecommunication network simulations. A detailed investigation has demonstrated that conservative synchronization performs very well for applications with good look ahead properties and sufficient message traffic density and, given such properties, will significantly outperform optimistic synchronization. Optimistic synchronization, however, gives reasonable speed-up for models with a wider range of such properties and can be optimized for speed-up and memory usage at run time. Thus, it is confirmed as being more generally applicable particularly as model development is somewhat easier than for conservative synchronization. This has to be balanced against the more difficult task of developing and debugging an optimistic synchronization kernel and the application models

    Improving the Scalability of High Performance Computer Systems

    Full text link
    Improving the performance of future computing systems will be based upon the ability of increasing the scalability of current technology. New paths need to be explored, as operating principles that were applied up to now are becoming irrelevant for upcoming computer architectures. It appears that scaling the number of cores, processors and nodes within an system represents the only feasible alternative to achieve Exascale performance. To accomplish this goal, we propose three novel techniques addressing different layers of computer systems. The Tightly Coupled Cluster technique significantly improves the communication for inter node communication within compute clusters. By improving the latency by an order of magnitude over existing solutions the cost of communication is considerably reduced. This enables to exploit fine grain parallelism within applications, thereby, extending the scalability considerably. The mechanism virtually moves the network interconnect into the processor, bypassing the latency of the I/O interface and rendering protocol conversions unnecessary. The technique is implemented entirely through firmware and kernel layer software utilizing off-the-shelf AMD processors. We present a proof-of-concept implementation and real world benchmarks to demonstrate the superior performance of our technique. In particular, our approach achieves a software-to-software communication latency of 240 ns between two remote compute nodes. The second part of the dissertation introduces a new framework for scalable Networks-on-Chip. A novel rapid prototyping methodology is proposed, that accelerates the design and implementation substantially. Due to its flexibility and modularity a large application space is covered ranging from Systems-on-chip, to high performance many-core processors. The Network-on-Chip compiler enables to generate complex networks in the form of synthesizable register transfer level code from an abstract design description. Our engine supports different target technologies including Field Programmable Gate Arrays and Application Specific Integrated Circuits. The framework enables to build large designs while minimizing development and verification efforts. Many topologies and routing algorithms are supported by partitioning the tasks into several layers and by the introduction of a protocol agnostic architecture. We provide a thorough evaluation of the design that shows excellent results regarding performance and scalability. The third part of the dissertation addresses the Processor-Memory Interface within computer architectures. The increasing compute power of many-core processors, leads to an equally growing demand for more memory bandwidth and capacity. Current processor designs exhibit physical limitations that restrict the scalability of main memory. To address this issue we propose a memory extension technique that attaches large amounts of DRAM memory to the processor via a low pin count interface using high speed serial transceivers. Our technique transparently integrates the extension memory into the system architecture by providing full cache coherency. Therefore, applications can utilize the memory extension by applying regular shared memory programming techniques. By supporting daisy chained memory extension devices and by introducing the asymmetric probing approach, the proposed mechanism ensures high scalability. We furthermore propose a DMA offloading technique to improve the performance of the processor memory interface. The design has been implemented in a Field Programmable Gate Array based prototype. Driver software and firmware modifications have been developed to bring up the prototype in a Linux based system. We show microbenchmarks that prove the feasibility of our design

    Treemaps: Visualizing Hierarchical and Categorical Data

    Get PDF
    Treemaps are a graphical method for the visualization of hierarchical and categorical data sets. Treemap presentations of data shift mental workload from the cognitive to the perceptual systems, taking advantage of the human visual processing system to increase the bandwidth of the human-computer interface. Efficient use of display space allows for the simultaneous presentation of thousands of data records, as well as facilitating the presentation of semantic information. Treemaps let users see the forest and the trees by providing local detail in the context of a global overview, providing a visually engaging environment in which to analyze, search, explore and manipulate large data sets. The treemap method of hierarchical visualization, at its core, is based on the property of containment. This property of containment is a fundamental idea which powerfully encapsulates many of our reasons for constructing information hierarchies. All members of the treemap family of algorithms partition multi-dimensional display spaces based on weighted hierarchical data sets. In addition to generating treemaps and standard traditional hierarchical diagrams, the treemap algorithms extend non-hierarchical techniques such as bar and pie charts into the domain of hierarchical presentation. Treemap algorithms can be used to generate bar charts, outlines, traditional 2-D node and link diagrams, pie charts, cone trees, cam trees, drum trees, etc. Generating existing diagrams via treemap transformations is an exercise meant to show the power, ease, and generality with which alternative presentations can be generated from the basic treemap algorithms. Two controlled experiments with novice treemap users and real data highlight the strengths of treemaps. The first experiment with 12 subjects compares the Macintosh TreeVizTM implementation of treemaps with the UNIX command line for questions dealing with a 530 node file hierarchy. Treemaps are shown to significantly reduce user performance times for global file comparison tasks. A second experiment with 40 subjects compares treemaps with dynamic outlines for questions dealing with the allocation funds in the 1992 US Budget (357 node budget hierarchy). Treemap users are 50% faster overall and as much as 8 times faster for specific questions

    Active Processor Scheduling Using Evolution Algorithms

    Get PDF
    The allocation of processes to processors has long been of interest to engineers. The processor allocation problem considered here assigns multiple applications onto a computing system. With this algorithm researchers could more efficiently examine real-time sensor data like that used by United States Air Force digital signal processing efforts or real-time aerosol hazard detection as examined by the Department of Homeland Security. Different choices for the design of a load balancing algorithm are examined in both the problem and algorithm domains. Evolutionary algorithms are used to find near-optimal solutions. These algorithms incorporate multiobjective coevolutionary and parallel principles to create an effective and efficient algorithm for real-world allocation problems. Three evolutionary algorithms (EA) are developed. The primary algorithm generates a solution to the processor allocation problem. This allocation EA is capable of evaluating objectives in both an aggregate single objective and a Pareto multiobjective manner. The other two EAs are designed for fine turning returned allocation EA solutions. One coevolutionary algorithm is used to optimize the parameters of the allocation algorithm. This meta-EA is parallelized using a coarse-grain approach to improve performance. Experiments are conducted that validate the improved effectiveness of the parallelized algorithm. Pareto multiobjective approach is used to optimize both effectiveness and efficiency objectives. The other coevolutionary algorithm generates difficult allocation problems for testing the capabilities of the allocation EA. The effectiveness of both coevolutionary algorithms for optimizing the allocation EA is examined quantitatively using standard statistical methods. Also the allocation EAs objective tradeoffs are analyzed and compared

    Data Bandwidth Reduction Techniques For Distributed Embedded Simulatio

    Get PDF
    Maintaining coherence between the independent views of multiple participants at distributed locations is essential in an Embedded Simulation environment. Currently, the Distributed Interactive Simulation (DIS) protocol maintains coherence by broadcasting the entity state streams from each simulation station. In this dissertation, a novel alternative to DIS that replaces the transmitting sources with local sources is developed, validated, and assessed by analytical and experimental means. The proposed Concurrent Model approach reduces the communication burden to transmission of only synchronization and model-update messages. Necessary and sufficient conditions for the correctness of Concurrent Models in a discrete event simulation environment are established by developing Behavioral Congruence ¨B(EL, ER) and Temporal Congruence ¨T(t, ER) functions. They indicate model discrepancies with respect to the simulation time t, and the local and remote entity state streams EL and ER, respectively. Performance benefits were quantified in terms of the bandwidth reduction ratio BR=N/I obtained from the comparison of the OneSAF Testbed Semi-Automated Forces (OTBSAF) simulator under DIS requiring a total of N bits and a testbed modified for the Concurrent Model approach which required I bits. In the experiments conducted, a range of 100 d BR d 294 was obtained representing two orders of magnitude reduction in simulation traffic. Investigation showed that the models rely heavily on the priority data structure of the discrete event simulation and that performance of the overall simulation can be enhanced by an additional 6% by improving the queue management. A low run-time overhead, self-adapting storage policy called the Smart Priority Queue (SPQ) was developed and evaluated within the Concurrent Model. The proposed SPQ policies employ a lowcomplexity linear queue for near head activities and a rapid-indexing variable binwidth calendar queue for distant events. The SPQ configuration is determined by monitoring queue access behavior using cost scoring factors and then applying heuristics to adjust the organization of the underlying data structures. Results indicate that optimizing storage to the spatial distribution of queue access can decrease HOLD operation cost between 25% and 250% over existing algorithms such as calendar queues. Taken together, these techniques provide an entity state generation mechanism capable of overcoming the challenges of Embedded Simulation in harsh mobile communications environments with restricted bandwidth, increased message latency, and extended message drop-outs

    Technology 2004, Vol. 2

    Get PDF
    Proceedings from symposia of the Technology 2004 Conference, November 8-10, 1994, Washington, DC. Volume 2 features papers on computers and software, virtual reality simulation, environmental technology, video and imaging, medical technology and life sciences, robotics and artificial intelligence, and electronics
    corecore