223 research outputs found

    Submicron Systems Architecture Project: Semiannual Technial Report

    Get PDF
    No abstract available

    Submicron Systems Architecture Project : Semiannual Technical Report

    Get PDF
    The Mosaic C is an experimental fine-grain multicomputer based on single-chip nodes. The Mosaic C chip includes 64KB of fast dynamic RAM, processor, packet interface, ROM for bootstrap and self-test, and a two-dimensional selftimed router. The chip architecture provides low-overhead and low-latency handling of message packets, and high memory and network bandwidth. Sixty-four Mosaic chips are packaged by tape-automated bonding (TAB) in an 8 x 8 array on circuit boards that can, in turn, be arrayed in two dimensions to build arbitrarily large machines. These 8 x 8 boards are now in prototype production under a subcontract with Hewlett-Packard. We are planning to construct a 16K-node Mosaic C system from 256 of these boards. The suite of Mosaic C hardware also includes host-interface boards and high-speed communication cables. The hardware developments and activities of the past eight months are described in section 2.1. The programming system that we are developing for the Mosaic C is based on the same message-passing, reactive-process, computational model that we have used with earlier multicomputers, but the model is implemented for the Mosaic in a way that supports finegrain concurrency. A process executes only in response to receiving a message, and may in execution send messages, create new processes, and modify its persistent variables before it either exits or becomes dormant in preparation for receiving another message. These computations are expressed in an object-oriented programming notation, a derivative of C++ called C+-. The computational model and the C+- programming notation are described in section 2.2. The Mosaic C runtime system, which is written in C+-, provides automatic process placement and highly distributed management of system resources. The Mosaic C runtime system is described in section 2.3

    Compile-Time Estimation of Communication Costs in Multicomputers

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratoryOffice of Naval Research / N00014-91-J-1096National Science Foundation / NSF MIP 86-57563 PYINational Aeronautics and Space Administration / NASA NAG 1-61

    Submicron Systems Architecture Project: Semiannual Technical Report

    Get PDF
    No abstract available

    Hardware Implementation Of Processor Allocator For Mesh Connected Chip Multiprocessors

    Full text link
    The advancements in the semiconductor process technology and the current demand for highly parallel computing has led to the advent of Chip Multiprocessors (CMPs). CMP is the integration of two or more independent processor cores, which can read and execute program instructions, on to a single integrated circuit die. CMPs are the main computing platforms for research and development in parallel and high performance computing environments. They offer minimum inter-core communication latencies as the processor cores are present on a single chip. The Operating System (OS) plays a key role in using a CMP effectively. The OS should support a multi-user environment in which the jobs are executed in parallel on different cores. This is handled by the processor management system of the OS. The Processor Management System consists of Job Scheduler (JS) and Processor Allocator (PA). The JS aligns the jobs in a queue in an order which is determined by the scheduling policy employed and thus specifying the job that is to be executed next. The PA deals with the selection of appropriate set of processors to execute the job scheduled by the job scheduler. Efficient design of a PA is crucial if one is to harness the full computational power of a CMP in large parallel computing systems. This thesis deals with the processor allocation part of the processor management system. The motive of this thesis is the hardware implementation of a PA for a mesh-connected CMP. The PA is implemented and a synthesis report is presented which shows the amount of logic utilized. Many contiguous and non-contiguous allocation strategies have been proposed for mesh networks in the recent years. The Improvised First Fit algorithm is used to select the appropriate set of processors for executing an incoming job in this hardware implementation. This algorithm is a contiguous allocation algorithm and has complete sub-mesh recognition ability and uses a bit-map approach. The JS is assumed to be employing a First Come First Serve (FCFS) policy to schedule the jobs. This thesis also acts as the basis for the hardware implementation of PA that uses other allocation algorithms in different topologies

    System level modelling and design of hypergraph based wireless system area networks for multi-computer systems

    Get PDF
    This thesis deals with issues pertaining the wireless multicomputer interconnection networks namely topology and Medium Access Control (MAC). It argues that new channel assignment technique based on regular low-dimensional hypergraph networks, the dual radio wireless hypermesh, represents a promising alternative high-performance wireless interconnection network for the future multicomputers to shared communication medium networks and/or ordinary wireless mesh networks, which have been widely used in current wireless networks. The focus of this work is on improving the network throughput while maintaining a relatively low latency of a wireless network system. By means of a Carrier Sense Multiple Access (CSMA) based design of the MAC protocol and based on the desirable features of hypermesh network topology a relatively high performance network has been introduced. Compared to the CSMA shared communication channel model, which is currently the de facto MAC protocol for most of wireless networks, our design is shown to achieve a significant increase in network throughput with less average network latency for large number of communication nodes. SystemC model of the proposed wireless hypermesh, validated through mathematical models, are then introduced. The analysis has been incorporated in the proper SystemC design methodology which facilitates the integration of communication modelling into the design modelling at the early stages of the system development. Another important application of SystemC modelling techniques is to perform meaningful comparative studies of different protocols, or new implementations to determine which communication scenario performs better and the ability to modify models to test system sensitivity and tune performance. Effects of different design parameters (e.g., packet sizes, number of nodes) has been carried out throughout this work. The results shows that the proposed structure has out perform the existing shared medium network structure and it can support relatively high number of wireless connected computers than conventional networks

    A design methodology for portable software on parallel computers

    Get PDF
    This final report for research that was supported by grant number NAG-1-995 documents our progress in addressing two difficulties in parallel programming. The first difficulty is developing software that will execute quickly on a parallel computer. The second difficulty is transporting software between dissimilar parallel computers. In general, we expect that more hardware-specific information will be included in software designs for parallel computers than in designs for sequential computers. This inclusion is an instance of portability being sacrificed for high performance. New parallel computers are being introduced frequently. Trying to keep one's software on the current high performance hardware, a software developer almost continually faces yet another expensive software transportation. The problem of the proposed research is to create a design methodology that helps designers to more precisely control both portability and hardware-specific programming details. The proposed research emphasizes programming for scientific applications. We completed our study of the parallelizability of a subsystem of the NASA Earth Radiation Budget Experiment (ERBE) data processing system. This work is summarized in section two. A more detailed description is provided in Appendix A ('Programming Practices to Support Eventual Parallelism'). Mr. Chrisman, a graduate student, wrote and successfully defended a Ph.D. dissertation proposal which describes our research associated with the issues of software portability and high performance. The list of research tasks are specified in the proposal. The proposal 'A Design Methodology for Portable Software on Parallel Computers' is summarized in section three and is provided in its entirety in Appendix B. We are currently studying a proposed subsystem of the NASA Clouds and the Earth's Radiant Energy System (CERES) data processing system. This software is the proof-of-concept for the Ph.D. dissertation. We have implemented and measured the performance of a portion of this subsystem on the Intel iPSC/2 parallel computer. These results are provided in section four. Our future work is summarized in section five, our acknowledgements are stated in section six, and references for published papers associated with NAG-1-995 are provided in section seven

    High-level asynchronous system design using the ACK framework

    Get PDF
    Journal ArticleDesigning asynchronous circuits is becoming easier as a number of design styles are making the transition from research projects to real, usable tools. However, designing asynchronous "systems" is still a difficult problem. We define asynchronous systems to be medium to large digital systems whose descriptions include both datapath and control, that may involve non-trivial interface requirements, and whose control is too large to be synthesized in one large controller. ACK is a framework for designing high performance asynchronous systems of this type. In ACK we advocate an approach that begins with procedural level descriptions of control and datapath and results in a hybrid system that mixes a variety of hardware implementation styles including burst-mode AFSMs, macromodule circuits, and programmable control. We present our views on what makes asynchronous high level system design different from lower level circuit design, motivate our ACK approach, and demonstrate using an example system design

    Design and resource management of reconfigurable multiprocessors for data-parallel applications

    Get PDF
    FPGA (Field-Programmable Gate Array)-based custom reconfigurable computing machines have established themselves as low-cost and low-risk alternatives to ASIC (Application-Specific Integrated Circuit) implementations and general-purpose microprocessors in accelerating a wide range of computation-intensive applications. Most often they are Application Specific Programmable Circuiits (ASPCs), which are developer programmable instead of user programmable. The major disadvantages of ASPCs are minimal programmability, and significant time and energy overheads caused by required hardware reconfiguration when the problem size outnumbers the available reconfigurable resources; these problems are expected to become more serious with increases in the FPGA chip size. On the other hand, dominant high-performance computing systems, such as PC clusters and SMPs (Symmetric Multiprocessors), suffer from high communication latencies and/or scalability problems. This research introduces low-cost, user-programmable and reconfigurable MultiProcessor-on-a-Programmable-Chip (MPoPC) systems for high-performance, low-cost computing. It also proposes a relevant resource management framework that deals with performance, power consumption and energy issues. These semi-customized systems reduce significantly runtime device reconfiguration by employing userprogrammable processing elements that are reusable for different tasks in large, complex applications. For the sake of illustration, two different types of MPoPCs with hardware FPUs (floating-point units) are designed and implemented for credible performance evaluation and modeling: the coarse-grain MIMD (Multiple-Instruction, Multiple-Data) CG-MPoPC machine based on a processor IP (Intellectual Property) core and the mixed-mode (MIMD, SIMD or M-SIMD) variant-grain HERA (HEterogeneous Reconfigurable Architecture) machine. In addition to alleviating the above difficulties, MPoPCs can offer several performance and energy advantages to our data-parallel applications when compared to ASPCs; they are simpler and more scalable, and have less verification time and cost. Various common computation-intensive benchmark algorithms, such as matrix-matrix multiplication (MMM) and LU factorization, are studied and their parallel solutions are shown for the two MPoPCs. The performance is evaluated with large sparse real-world matrices primarily from power engineering. We expect even further performance gains on MPoPCs in the near future by employing ever improving FPGAs. The innovative nature of this work has the potential to guide research in this arising field of high-performance, low-cost reconfigurable computing. The largest advantage of reconfigurable logic lies in its large degree of hardware customization and reconfiguration which allows reusing the resources to match the computation and communication needs of applications. Therefore, a major effort in the presented design methodology for mixed-mode MPoPCs, like HERA, is devoted to effective resource management. A two-phase approach is applied. A mixed-mode weighted Task Flow Graph (w-TFG) is first constructed for any given application, where tasks are classified according to their most appropriate computing mode (e.g., SIMD or MIMD). At compile time, an architecture is customized and synthesized for the TFG using an Integer Linear Programming (ILP) formulation and a parameterized hardware component library. Various run-time scheduling schemes with different performanceenergy objectives are proposed. A system-level energy model for HERA, which is based on low-level implementation data and run-time statistics, is proposed to guide performance-energy trade-off decisions. A parallel power flow analysis technique based on Newton\u27s method is proposed and employed to verify the methodology
    corecore