12 research outputs found

    Study of Various Motherboards

    Get PDF
    Not availabl

    Systematic Design Methods for Efficient Off-Chip DRAM Access

    No full text
    Typical design flows for digital hardware take, as their input, an abstract description of computation and data transfer between logical memories. No existing commercial high-level synthesis tool demonstrates the ability to map logical memory inferred from a high level language to external memory resources. This thesis develops techniques for doing this, specifically targeting off-chip dynamic memory (DRAM) devices. These are a commodity technology in widespread use with standardised interfaces. In use, the bandwidth of an external memory interface and the latency of memory requests asserted on it may become the bottleneck limiting the performance of a hardware design. Careful consideration of this is especially important when designing with DRAMs, whose latency and bandwidth characteristics depend upon the sequence of memory requests issued by a controller. Throughout the work presented here, we pursue exact compile-time methods for designing application-specific memory systems with a focus on guaranteeing predictable performance through static analysis. This contrasts with much of the surveyed existing work, which considers general purpose memory controllers and optimized policies which improve performance in experiments run using simulation of suites of benchmark codes. The work targets loop-nests within imperative source code, extracting a mathematical representation of the loop-nest statements and their associated memory accesses, referred to as the ‘Polytope Model’. We extend this mathematical representation to represent the physical DRAM ‘row’ and ‘column’ structures accessed when performing memory transfers. From this augmented representation, we can automatically derive DRAM controllers which buffer data in on-chip memory and transfer data in an efficient order. Buffering data and exploiting ‘reuse’ of data is shown to enable up to 50× reduction in the quantity of data transferred to external memory. The reordering of memory transactions exploiting knowledge of the physical layout of the DRAM device allowing to 4× improvement in the efficiency of those data transfers

    Low-Power Embedded Design Solutions and Low-Latency On-Chip Interconnect Architecture for System-On-Chip Design

    Get PDF
    This dissertation presents three design solutions to support several key system-on-chip (SoC) issues to achieve low-power and high performance. These are: 1) joint source and channel decoding (JSCD) schemes for low-power SoCs used in portable multimedia systems, 2) efficient on-chip interconnect architecture for massive multimedia data streaming on multiprocessor SoCs (MPSoCs), and 3) data processing architecture for low-power SoCs in distributed sensor network (DSS) systems and its implementation. The first part includes a low-power embedded low density parity check code (LDPC) - H.264 joint decoding architecture to lower the baseband energy consumption of a channel decoder using joint source decoding and dynamic voltage and frequency scaling (DVFS). A low-power multiple-input multiple-output (MIMO) and H.264 video joint detector/decoder design that minimizes energy for portable, wireless embedded systems is also designed. In the second part, a link-level quality of service (QoS) scheme using unequal error protection (UEP) for low-power network-on-chip (NoC) and low latency on-chip network designs for MPSoCs is proposed. This part contains WaveSync, a low-latency focused network-on-chip architecture for globally-asynchronous locally-synchronous (GALS) designs and a simultaneous dual-path routing (SDPR) scheme utilizing path diversity present in typical mesh topology network-on-chips. SDPR is akin to having a higher link width but without the significant hardware overhead associated with simple bus width scaling. The last part shows data processing unit designs for embedded SoCs. We propose a data processing and control logic design for a new radiation detection sensor system generating data at or above Peta-bits-per-second level. Implementation results show that the intended clock rate is achieved within the power target of less than 200mW. We also present a digital signal processing (DSP) accelerator supporting configurable MAC, FFT, FIR, and 3-D cross product operations for embedded SoCs. It consumes 12.35mW along with 0.167mm2 area at 333MHz

    The design and construction of high performance garbage collectors

    Get PDF
    Garbage collection is a performance-critical component of modern language implementations. The performance of a garbage collector depends in part on major algorithmic decisions, but also significantly on implementation details and techniques which are often incidental in the literature. In this dissertation I look in detail at the performance characteristics of garbage collection on modern architectures. My thesis is that a thorough understanding of the characteristics of the heap to be collected, coupled with measured performance of various design alternatives on a range of modern architectures provides insights that can be used to improve the performance of any garbage collection algorithm. The key contributions of this work are: 1) A new analysis technique (replay collection) for measuring the performance of garbage collection algorithms; 2) a novel technique for applying software prefetch to non-moving garbage collectors that achieves significant performance gains; and 3) a comprehensive analysis of object scanning techniques, cataloguing and comparing the performance of the known methods, and leading to a new technique that optimizes performance without significant cost to the runtime environment. These contributions are applicable to a wide range of garbage collectors, and can provide significant measurable speedups to a design point where each implementer in the past has had to trust intuition or their own benchmarking. The methodologies and implementation techniques contributed in this dissertation have the potential to make a significant improvement to the performance of every garbage collector

    Placement of dynamic data objects over heterogeneous memory organizations in embedded systems

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadoras y Automática, leída el 24-11-2015Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEunpu

    Unified on-chip multi-level cache management scheme using processor opcodes and addressing modes.

    Get PDF
    by Stephen Siu-ming Wong.Thesis (M.Phil.)--Chinese University of Hong Kong, 1996.Includes bibliographical references (leaves 164-170).Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Cache Memory --- p.2Chapter 1.2 --- System Performance --- p.3Chapter 1.3 --- Cache Performance --- p.3Chapter 1.4 --- Cache Prefetching --- p.5Chapter 1.5 --- Organization of Dissertation --- p.7Chapter 2 --- Related Work --- p.8Chapter 2.1 --- Memory Hierarchy --- p.8Chapter 2.2 --- Cache Memory Management --- p.10Chapter 2.2.1 --- Configuration --- p.10Chapter 2.2.2 --- Replacement Algorithms --- p.13Chapter 2.2.3 --- Write Back Policies --- p.15Chapter 2.2.4 --- Cache Miss Types --- p.16Chapter 2.2.5 --- Prefetching --- p.17Chapter 2.3 --- Locality --- p.18Chapter 2.3.1 --- Spatial vs. Temporal --- p.18Chapter 2.3.2 --- Instruction Cache vs. Data Cache --- p.20Chapter 2.4 --- Why Not a Large L1 Cache? --- p.26Chapter 2.4.1 --- Critical Time Path --- p.26Chapter 2.4.2 --- Hardware Cost --- p.27Chapter 2.5 --- Trend to have L2 Cache On Chip --- p.28Chapter 2.5.1 --- Examples --- p.29Chapter 2.5.2 --- Dedicated L2 Bus --- p.31Chapter 2.6 --- Hardware Prefetch Algorithms --- p.32Chapter 2.6.1 --- One Block Look-ahead --- p.33Chapter 2.6.2 --- Chen's RPT & similar algorithms --- p.34Chapter 2.7 --- Software Based Prefetch Algorithm --- p.38Chapter 2.7.1 --- Prefetch Instruction --- p.38Chapter 2.8 --- Hybrid Prefetch Algorithm --- p.40Chapter 2.8.1 --- Stride CAM Prefetching --- p.40Chapter 3 --- Simulator --- p.43Chapter 3.1 --- Multi-level Memory Hierarchy Simulator --- p.43Chapter 3.1.1 --- Multi-level Memory Support --- p.45Chapter 3.1.2 --- Non-blocking Cache --- p.45Chapter 3.1.3 --- Cycle-by-cycle Simulation --- p.47Chapter 3.1.4 --- Cache Prefetching Support --- p.47Chapter 4 --- Proposed Algorithms --- p.48Chapter 4.1 --- SIRPA --- p.48Chapter 4.1.1 --- Rationale --- p.48Chapter 4.1.2 --- Architecture Model --- p.50Chapter 4.2 --- Line Concept --- p.56Chapter 4.2.1 --- Rationale --- p.56Chapter 4.2.2 --- "Improvement Over ""Pure"" Algorithm" --- p.57Chapter 4.2.3 --- Architectural Model --- p.59Chapter 4.3 --- Combined L1-L2 Cache Management --- p.62Chapter 4.3.1 --- Rationale --- p.62Chapter 4.3.2 --- Feasibility --- p.63Chapter 4.4 --- Combine SIRPA with Default Prefetch --- p.66Chapter 4.4.1 --- Rationale --- p.67Chapter 4.4.2 --- Improvement Over “Pure´ح Algorithm --- p.69Chapter 4.4.3 --- Architectural Model --- p.70Chapter 5 --- Results --- p.73Chapter 5.1 --- Benchmarks Used --- p.73Chapter 5.1.1 --- SPEC92int and SPEC92fp --- p.75Chapter 5.2 --- Configurations Tested --- p.79Chapter 5.2.1 --- Prefetch Algorithms --- p.79Chapter 5.2.2 --- Cache Sizes --- p.80Chapter 5.2.3 --- Cache Block Sizes --- p.81Chapter 5.2.4 --- Cache Set Associativities --- p.81Chapter 5.2.5 --- "Bus Width, Speed and Other Parameters" --- p.81Chapter 5.3 --- Validity of Results --- p.83Chapter 5.3.1 --- Total Instructions and Cycles --- p.83Chapter 5.3.2 --- Total Reference to Caches --- p.84Chapter 5.4 --- Overall MCPI Comparison --- p.86Chapter 5.4.1 --- Cache Size Effect --- p.87Chapter 5.4.2 --- Cache Block Size Effect --- p.91Chapter 5.4.3 --- Set Associativity Effect --- p.101Chapter 5.4.4 --- Hardware Prefetch Algorithms --- p.108Chapter 5.4.5 --- Software Based Prefetch Algorithms --- p.119Chapter 5.5 --- L2 Cache & Main Memory MCPI Comparison --- p.127Chapter 5.5.1 --- Cache Size Effect --- p.130Chapter 5.5.2 --- Cache Block Size Effect --- p.130Chapter 5.5.3 --- Set Associativity Effect --- p.143Chapter 6 --- Conclusion --- p.154Chapter 7 --- Future Directions --- p.157Chapter 7.1 --- Prefetch Buffer --- p.157Chapter 7.2 --- Dissimilar L1-L2 Management --- p.158Chapter 7.3 --- Combined LRU/MRU Replacement Policy --- p.160Chapter 7.4 --- N Loops Look-ahead --- p.16

    Development and implementation of a combined discrete and finite element multibody dynamics simulation environment

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Civil and Environmental Engineering, 2001.Includes bibliographical references (p. [195]-198) and index.Some engineering applications and physical phenomena involve multiple bodies that undergo large displacements involving collisions between the bodies. Considering the difficulties and cost associated when conducting physical experiments of such systems, there is a demand for numerical simulation capabilities. The discrete element methods (DEM) are numerical techniques that have been specifically developed to facilitate simulations of distinct bodies that interact with each other through contact forces. In DEM the simulated bodies are typically assumed to be infinitely rigid. However, there are multibody systems for which it is useful to take into account the deformability of the simulated bodies. The objective of this research is to incorporate deformability in DEM, enabling the evaluation of the stress and strain distributions within simulated bodies during simulation. In order to achieve this goal, an Updated Lagrangian (UL) Finite Element (FE) formulation and an explicit time integration scheme have been employed together with some simplifiying assumptions to linearize this highly nonlinear contact problem and obtain solutions with realistic computational cost. An object-oriented extendable computational tool has been built specifically to allow us to simulate multiple distinct bodies that interact through contact forces allowing selected bodies to be deformable. Database technology has also been utilized in order to efficiently handle the huge amounts of computed results.by Petros Komodromos.Ph.D

    NASA Tech Briefs, June 1996

    Get PDF
    Topics: New Computer Hardware; Electronic Components and Circuits; Electronic Systems; Physical Sciences; Materials; Computer Programs; Mechanics; Machinery/Automation; Manufacturing/Fabrication; Mathematics and Information Sciences;Books and Reports

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters
    corecore