129 research outputs found

    Design of multimedia processor based on metric computation

    Get PDF
    Media-processing applications, such as signal processing, 2D and 3D graphics rendering, and image compression, are the dominant workloads in many embedded systems today. The real-time constraints of those media applications have taxing demands on today's processor performances with low cost, low power and reduced design delay. To satisfy those challenges, a fast and efficient strategy consists in upgrading a low cost general purpose processor core. This approach is based on the personalization of a general RISC processor core according the target multimedia application requirements. Thus, if the extra cost is justified, the general purpose processor GPP core can be enforced with instruction level coprocessors, coarse grain dedicated hardware, ad hoc memories or new GPP cores. In this way the final design solution is tailored to the application requirements. The proposed approach is based on three main steps: the first one is the analysis of the targeted application using efficient metrics. The second step is the selection of the appropriate architecture template according to the first step results and recommendations. The third step is the architecture generation. This approach is experimented using various image and video algorithms showing its feasibility

    Software and hardware methods for memory access latency reduction on ILP processors

    Get PDF
    While microprocessors have doubled their speed every 18 months, performance improvement of memory systems has continued to lag behind. to address the speed gap between CPU and memory, a standard multi-level caching organization has been built for fast data accesses before the data have to be accessed in DRAM core. The existence of these caches in a computer system, such as L1, L2, L3, and DRAM row buffers, does not mean that data locality will be automatically exploited. The effective use of the memory hierarchy mainly depends on how data are allocated and how memory accesses are scheduled. In this dissertation, we propose several novel software and hardware techniques to effectively exploit the data locality and to significantly reduce memory access latency.;We first presented a case study at the application level that reconstructs memory-intensive programs by utilizing program-specific knowledge. The problem of bit-reversals, a set of data reordering operations extensively used in scientific computing program such as FFT, and an application with a special data access pattern that can cause severe cache conflicts, is identified in this study. We have proposed several software methods, including padding and blocking, to restructure the program to reduce those conflicts. Our methods outperform existing ones on both uniprocessor and multiprocessor systems.;The access latency to DRAM core has become increasingly long relative to CPU speed, causing memory accesses to be an execution bottleneck. In order to reduce the frequency of DRAM core accesses to effectively shorten the overall memory access latency, we have conducted three studies at this level of memory hierarchy. First, motivated by our evaluation of DRAM row buffer\u27s performance roles and our findings of the reasons of its access conflicts, we propose a simple and effective memory interleaving scheme to reduce or even eliminate row buffer conflicts. Second, we propose a fine-grain priority scheduling scheme to reorder the sequence of data accesses on multi-channel memory systems, effectively exploiting the available bus bandwidth and access concurrency. In the final part of the dissertation, we first evaluate the design of cached DRAM and its organization alternatives associated with ILP processors. We then propose a new memory hierarchy integration that uses cached DRAM to construct a very large off-chip cache. We show that this structure outperforms a standard memory system with an off-level L3 cache for memory-intensive applications.;Memory access latency has become a major performance bottleneck for memory-intensive applications. as long as DRAM technology remains its most cost-effective position for making main memory, the memory performance problem will continue to exist. The studies conducted in this dissertation attempt to address this important issue. Our proposed software and hardware schemes are effective and applicable, which can be directly used in real-world memory system designs and implementations. Our studies also provide guidance for application programmers to understand memory performance implications, and for system architects to optimize memory hierarchies

    MINING AND VERIFICATION OF TEMPORAL EVENTS WITH APPLICATIONS IN COMPUTER MICRO-ARCHITECTURE RESEARCH

    Get PDF
    Computer simulation programs are essential tools for scientists and engineers to understand a particular system of interest. As expected, the complexity of the software increases with the depth of the model used. In addition to the exigent demands of software engineering, verification of simulation programs is especially challenging because the models represented are complex and ridden with unknowns that will be discovered by developers in an iterative process. To manage such complexity, advanced verification techniques for continually matching the intended model to the implemented model are necessary. Therefore, the main goal of this research work is to design a useful verification and validation framework that is able to identify model representation errors and is applicable to generic simulators. The framework that was developed and implemented consists of two parts. The first part is First-Order Logic Constraint Specification Language (FOLCSL) that enables users to specify the invariants of a model under consideration. From the first-order logic specification, the FOLCSL translator automatically synthesizes a verification program that reads the event trace generated by a simulator and signals whether all invariants are respected. The second part consists of mining the temporal flow of events using a newly developed representation called State Flow Temporal Analysis Graph (SFTAG). While the first part seeks an assurance of implementation correctness by checking that the model invariants hold, the second part derives an extended model of the implementation and hence enables a deeper understanding of what was implemented. The main application studied in this work is the validation of the timing behavior of micro-architecture simulators. The study includes SFTAGs generated for a wide set of benchmark programs and their analysis using several artificial intelligence algorithms. This work improves the computer architecture research and verification processes as shown by the case studies and experiments that have been conducted

    FBsim and the Fully Buffered DIMM Memory System Architecture

    Get PDF
    As DRAM device data rates increase in chase of ever increasing memory request rates, parallel bus limitations and cost constraints require a sharp decrease in load on the multi-drop buses between the devices and the memory controller, thus limiting the memory system's scalability and failing to meet the capacity requirements of modern server and workstation applications. A new technology, the Fully Buffered DIMM architecture is currently being introduced to address these challenges. FB-DIMM uses narrower, faster, buffered point to point channels to meet memory capacity and throughput requirements at the price of latency. This study provides a detailed look at the proposed architecture and its adoption, introduces an FB-DIMM simulation model - the FBSim simulator - and uses it to explore the design space of this new technology - identifying and experimentally proving some of its strengths, weaknesses and limitations, and uncovering future paths of academic research into the field

    Three Essays on Strategic Behavior and Policy.

    Full text link
    This dissertation contains three chapters that study the strategic behavior of economic agents in environments shaped by legal and regulatory policy. Two chapters examine strategic behavior as it pertains to the market outcomes of firms competing as oligopolists; a third studies the negotiation outcomes of a firm and consumer bargaining in the shadow of the law. Each chapter extends or applies fundamental models of strategic behavior to develop testable implications. It then exploits the historical change induced by a policy intervention to empirically test the model's mechanisms, conditions, and predictions. Finally, it interprets the meaning of empirical estimates for policy toward antitrust and other types of law. The first chapter analyzes the sustainability and consequencs of collusion in a high-technology industry with learning-by-doing. It shows how the dynamic cost savings that occur through learning, combined with multiproduct competition, can reverse the standard predictions of static supergame models of collusion. It illustrates this by building an oligopolistic model of learning-by-doing embedded within a supergame. It estimates the price change attributable to collusion by using firm-level data before, during, and after explicit collusion in the electronic memory chip market. The second chapter assesses the predictions of the Priest-Klein model of pre-trial settlement bargaining. It applies the model to an auto insurance negotiation setting to show how parties' uncertainty in a legal standard drives the likelihood of claim settlement. It then exploits state- and time-level variation in states' adoption of tort liability for insurance bad faith law to find that tort liability significantly increased the likelihood of a claim to settle with litigation in the short-term, but significantly reduced this likelihood in the long-term. The third chapter uses data from a price fixing cartel in the Korean petrochemical industry to test empirical screens of collusion. It exploits the information disclosed by the Korean antitrust regulator to formulate a hypothesis of the effect of collusion on price-cost asymmetry. It uses price data before and after the breakdown of the cartel to find evidence consistent with a change from high to low price-cost asymmetry before and after the cartel's collapse.PhDBusiness AdministrationUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113443/1/dasmat_1.pd

    The Future of Semiconductor Intellectual Property Architectural Blocks in Europe

    Get PDF
    Semiconductor intellectual property (IP) blocks, also known as IP cores, are reusable design components that are used to build advanced integrated circuits (ICs). It is typically impossible to create new IC designs without pre-designed IP blocks as a starting point. These design components are called Âżintellectual propertyÂż blocks because they are traded as rights to use and copy the design. Firms that focus on this business model are often called ÂżchiplessÂż semiconductor firms. IP cores are perhaps the most knowledge-intensive link in the information economy value chain. They define the capabilities of billions of electronic devices produced every year. As all products are becoming increasingly intelligent and embedded with information processing and communication capabilities, future developments in semiconductor IP will have a profound impact on the future developments in the overall knowledge economy and society. At present, the IC industry is approaching the most fundamental technological disruption in its history. The rapid incremental innovation that has led to exponential growth in the number of transistors on a chip and expanded the applications of ICT to all areas of human life is about to end. This discontinuityÂżthe end of semiconductor scalingÂżopens up new business opportunities and shifts the focus of ICT research to new areas. The main objective of this study is to describe the current state and potential future developments in semiconductor IP, and to relate the outcomes of the study to policy-related discussions relevant to the EU and its Member States.JRC.J.4-Information Societ

    Power management techniques for conserving energy in multiple system components

    Get PDF
    Energy consumption is a limiting constraint for both embedded and high performance systems. CPU-core, caches and memory contribute a large fraction of energy consumption in most computing systems. As a result, reducing the energy consumption in these components can significantly reduce the system's overall energy consumption. However, applying multiple independent power management policies in the system (one for each component) may interfere with each other and in some occasions increase the combined energy consumption.In this dissertation, I present three power management techniques that target more than a single component in a system. The focus is on reducing the total energy consumption in processors, caches and memory combinations. First, I present a memory-aware processor power management using collaboration between the OS and the compiler. The technique objectives are: (1) finish the application execution before its deadline and (2) minimize the combined energy consumption in processor and memory. Second, I present an Integrated Dynamic Voltage Scaling (IDVS) techniques for processor and on-chip cache power management in multi-voltage domains systems. IDVS co-ordinates power management decisions across voltage domains rather that being applied in isolation in each domain. Third, I present a Power-Aware Cached DRAM (PA-CDRAM) memory organization for reducing the energy consumption in DRAM memory and off-chip caches. PA-CDRAM exploits the high internal memory bandwidth by bringing the off-chip caches "closer" to the memory, which also improves the overall performance.The techniques in my thesis highlight the importance of designing power management schemes that consider multiple components and their interactions (in terms of power and performance) in the system rather than applying multiple isolated power management polices. This study should lay the foundation for further research in the domain of integrated power management, where a single power manager controls many system components
    • …
    corecore