Search CORE

131 research outputs found

Speculative Data Distribution in Shared Memory Multiprocessors

Author: Leventhal Sean
Publication venue
Publication date: 16/04/2008
Field of study

This work explores the possibility of using speculation at the directories in a cache coherent non-uniform memory access multiprocessor architecture to improve performance by forwarding data to their destinations before requests are sent. It improves on previous consumer prediction techniques, showing how to construct a predictor that can handle a tradeoff of accuracy and coverage. This dissertation then explores the correct time to perform consumer prediction, and show how a directory protocol can incorporate such a scheme. The consumer prediction enhanced protocol that is developed is able to reduce the runtime of a set of scientific benchmarks by 10%-20%, without substantially reducing the runtime of other benchmarks; specifically, those benchmarks feature simple phased behavior and regularly distribute data to more than two processors. This work then explores the interaction of consumer prediction with two other forms of prediction, migratory prediction and last touch prediction. It demonstrates a mechanism by which migratory prediction can be implemented using only the storage elements already present in a consumer predictor. By combining this migratory predictor with a consumer predictor, it is possible to produce greater speedups than did either individually. Finally, the signatures of the last touch predictor can be applied to improve the performance of consumer prediction

Digital Repository at the University of Maryland

A consistency architecture for hierarchical shared caches

Author: Charles E. Leiserson
Edya Ladan-mozes
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

Hierarchical Cache Consistency (HCC) is a scalable cache-con-sistency architecture for chip multiprocessors in which caches are shared hierarchically. HCC’s cache-consistency protocol is embed-ded in the message-routing network that interconnects the caches, providing a distributed and scalable alternative to bus-based and directory-based consistency mechanisms. The HCC consistency protocol is “progressive ” in that every message makes monotonic progress without timeouts, retries, negative acknowledgments, or retreating in any way. The latency is at most proportional to the di-ameter of the network. For HCC with a binary fat-tree network, the protocol requires at most 13 bits of additional state per cache line, no matter how large the system. We prove that the HCC protocol is deadlock free and provides sequential consistency

CiteSeerX

Crossref

Computer Aided Verification

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/08/2022
Field of study

This open access two-volume set LNCS 13371 and 13372 constitutes the refereed proceedings of the 34rd International Conference on Computer Aided Verification, CAV 2022, which was held in Haifa, Israel, in August 2022. The 40 full papers presented together with 9 tool papers and 2 case studies were carefully reviewed and selected from 209 submissions. The papers were organized in the following topical sections: Part I: Invited papers; formal methods for probabilistic programs; formal methods for neural networks; software Verification and model checking; hyperproperties and security; formal methods for hardware, cyber-physical, and hybrid systems. Part II: Probabilistic techniques; automata and logic; deductive verification and decision procedures; machine learning; synthesis and concurrency. This is an open access book

Directory of Open Access Books (DOAB)

Interactive High Performance Volume Rendering

Author: Zellmann Stefan
Publication venue
Publication date: 19/08/2014
Field of study

This thesis is about Direct Volume Rendering on high performance computing systems. As direct rendering methods do not create a lower-dimensional geometric representation, the whole scientific dataset must be kept in memory. Thus, this family of algorithms has a tremendous resource demand. Direct Volume Rendering algorithms in general are well suited to be implemented for dedicated graphics hardware. Nevertheless, high performance computing systems often do not provide resources for hardware accelerated rendering, so that the visualization algorithm must be implemented for the available general-purpose hardware. Ever growing datasets that imply copying large amounts of data from the compute system to the workstation of the scientist, and the need to review intermediate simulation results, make porting Direct Volume Rendering to high performance computing systems highly relevant. The contribution of this thesis is twofold. As part of the first contribution, after devising a software architecture for general implementations of Direct Volume Rendering on highly parallel platforms, parallelization issues and implementation details for various modern architectures are discussed. The contribution results in a highly parallel implementation that tackles several platforms. The second contribution is concerned with the display phase of the “Distributed Volume Rendering Pipeline”. Rendering on a high performance computing system typically implies displaying the rendered result at a remote location. This thesis presents a remote rendering technique that is capable of hiding latency and can thus be used in an interactive environment

Kölner UniversitätsPublikationsServer

Development of PCI Embedded Card Useful for Microcontroller Trainer

Author: Chirutar Harshadkumar G.
Publication venue
Publication date: 01/05/2011
Field of study

Not availabl

Etheses - A Saurashtra University Library Service

Camera Creatures: Rhetorics of Light and Emerging Media

Author: Collamati Anthony
Publication venue: Clemson University Libraries
Publication date: 01/08/2012
Field of study

Camera Creatures addresses the new media landscape in which cameras, in most situations, outnumber pens. The dissertation argues that despite the accessibility and power of imagemaking devices, there persists in the humanities and social sciences a hesitation to engage the possibilities for composing with optical media. A number of factors contributing to this trend are addressed, including the preference for image analysis over imagemaking practices, persistent assumptions of the camera\u27s mechanical objectivity, and a tendency to teach visual invention as collage. As a counter-measure, a proposal is made for investment in the mediation of light, or \u27photonic rhetorics.\u27 To explore these effects in visual communication and the possibility of bringing them into practice, three emerging camera technologies are examined. The first, the photo app, focuses on the controversy surrounding embedded journalists who use social networks and the Hipstamatic camera phone application to relay stories of U.S. Marines deployed in Afghanistan. The chapter argues that the filters and shooting styles of these mobile apps encourage fluencies in the persuasive effects of light. The second camera technology, the video clip, addresses the long take as the predominant technique of everyday video-making. Film theory, video sharing trends, and circadian science contribute to a discussion of the rhythms of long-take shooting and its capability to expose both visual habits and the contingencies capable of disrupting them. The third site turns to video game \u27shooters\u27 and the virtual camera\u27s construction of \u27surrogate vision,\u27 which the author argues is a critical tool for understanding the future of mediated interactivity in both physical and digital landscapes. The dissertation concludes with a pedagogical section devoted to conscientious cheating. Alongside theories of deliberate practice, \u27cheating\u27 is repurposed for education, offering new ways of testing the \u27rules\u27 of optical composition while discovering opportunities to intervene in light\u27s constant mediation of perception

Clemson University: TigerPrints

ICASE

Author
Publication venue
Publication date
Field of study

This report summarizes research conducted at the Institute for Computer Applications in Science and Engineering in the areas of (1) applied and numerical mathematics, including numerical analysis and algorithm development; (2) theoretical and computational research in fluid mechanics in selected areas of interest, including acoustics and combustion; (3) experimental research in transition and turbulence and aerodynamics involving Langley facilities and scientists; and (4) computer science

NASA Technical Reports Server

Recommended from our members

Cross-Layer Pathfinding for Off-Chip Interconnects

Author: Srinivas Vaishnav
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Off-chip interconnects for integrated circuits (ICs) today induce a diverse design space, spanning many different applications that require transmission of data at various bandwidths, latencies and link lengths. Off-chip interconnect design solutions are also variously sensitive to system performance, power and cost metrics, while also having a strong impact on these metrics. The costs associated with off-chip interconnects include die area, package (PKG) and printed circuit board (PCB) area, technology and bill of materials (BOM). Choices made regarding off-chip interconnects are fundamental to product definition, architecture, design implementation and technology enablement. Given their cross-layer impact, it is imperative that a cross-layer approach be employed to architect and analyze off-chip interconnects up front, so that a top-down design flow can comprehend the cross-layer impacts and correctly assess the system performance, power and cost tradeoffs for off-chip interconnects. Chip architects are not exposed to all the tradeoffs at the physical and circuit implementation or technology layers, and often lack the tools to accurately assess off-chip interconnects. Furthermore, the collaterals needed for a detailed analysis are often lacking when the chip is architected; these include circuit design and layout, PKG and PCB layout, and physical floorplan and implementation. To address the need for a framework that enables architects to assess the system-level impact of off-chip interconnects, this thesis presents power-area-timing (PAT) models for off-chip interconnects, optimization and planning tools with the appropriate abstraction using these PAT models, and die/PKG/PCB co-design methods that help expose the off-chip interconnect cross-layer metrics to the die/PKG/PCB design flows. Together, these models, tools and methods enable cross-layer optimization that allows for a top-down definition and exploration of the design space and helps converge on the correct off-chip interconnect implementation and technology choice. The tools presented cover off-chip memory interfaces for mobile and server products, silicon photonic interfaces, 2.5D silicon interposers and 3D through-silicon vias (TSVs). The goal of the cross-layer framework is to assess the key metrics of the interconnect (such as timing, latency, active/idle/sleep power, and area/cost) at an appropriate level of abstraction by being able to do this across layers of the design flow. In additional to signal interconnect, this thesis also explores the need for such cross-layer pathfinding for power distribution networks (PDN), where the system-on-chip (SoC) floorplan and pinmap must be optimized before the collateral layouts for PDN analysis are ready. Altogether, the developed cross-layer pathfinding methodology for off-chip interconnects enables more rapid and thorough exploration of a vast design space of off-chip parallel and serial links, inter-die and inter-chiplet links and silicon photonics. Such exploration will pave the way for off-chip interconnect technology enablement that is optimized for system needs. The basis of the framework can be extended to cover other interconnect technology as well, since it fundamentally relates to system-level metrics that are common to all off-chip interconnects

eScholarship - University of California

Analysing and Reducing Costs of Deep Learning Compiler Auto-tuning

Author: Borowiec Damian
Publication venue: Lancaster University
Publication date: 01/03/2023
Field of study

Deep Learning (DL) is significantly impacting many industries, including automotive, retail and medicine, enabling autonomous driving, recommender systems and genomics modelling, amongst other applications. At the same time, demand for complex and fast DL models is continually growing. The most capable models tend to exhibit highest operational costs, primarily due to their large computational resource footprint and inefficient utilisation of computational resources employed by DL systems. In an attempt to tackle these problems, DL compilers and auto-tuners emerged, automating the traditionally manual task of DL model performance optimisation. While auto-tuning improves model inference speed, it is a costly process, which limits its wider adoption within DL deployment pipelines. The high operational costs associated with DL auto-tuning have multiple causes. During operation, DL auto-tuners explore large search spaces consisting of billions of tensor programs, to propose potential candidates that improve DL model inference latency. Subsequently, DL auto-tuners measure candidate performance in isolation on the target-device, which constitutes the majority of auto-tuning compute-time. Suboptimal candidate proposals, combined with their serial measurement in an isolated target-device lead to prolonged optimisation time and reduced resource availability, ultimately reducing cost-efficiency of the process. In this thesis, we investigate the reasons behind prolonged DL auto-tuning and quantify their impact on the optimisation costs, revealing directions for improved DL auto-tuner design. Based on these insights, we propose two complementary systems: Trimmer and DOPpler. Trimmer improves tensor program search efficacy by filtering out poorly performing candidates, and controls end-to-end auto-tuning using cost objectives, monitoring optimisation cost. Simultaneously, DOPpler breaks long-held assumptions about the serial candidate measurements by successfully parallelising them intra-device, with minimal penalty to optimisation quality. Through extensive experimental evaluation of both systems, we demonstrate that they significantly improve cost-efficiency of autotuning (up to 50.5%) across a plethora of tensor operators, DL models, auto-tuners and target-devices

Lancaster E-Prints

Data systems elements technology assessment and system specifications, issue no. 2

Author
Publication venue
Publication date
Field of study

The ability to satisfy the objectives of future NASA Office of Applications programs is dependent on technology advances in a number of areas of data systems. The hardware and software technology of end-to-end systems (data processing elements through ground processing, dissemination, and presentation) are examined in terms of state of the art, trends, and projected developments in the 1980 to 1985 timeframe. Capability is considered in terms of elements that are either commercially available or that can be implemented from commercially available components with minimal development

NASA Technical Reports Server