22 research outputs found

    Data Pump Architecture Simulator and Performance Model

    Get PDF
    The Data Pump Architecture (DPA) is a novel non-von-Neumann computer architecture emphasizing efficient use of on-chip SRAM and off-chip DRAM bandwidth. The DPA is parameterized by local memory size, memory bandwidth, vector length, and number of compute processors, allowing the architecture configuration to be balanced for a given set of computations. The Data Pump Architecture Simulator and Performance Model is a functional simulator providing an implementation of the DPA as a software library. Simulation at this level provides the user with the ability to test algorithms for correctness as well as performance. A hardware designer may then use the Simulator as a tool to experimentally determine the effects of architecture parameters and software algorithms through performance data provided by the Simulator's model. The benefit of the approach described here compared to alternatives such as logic/gate level or RTL/HDL software emulation is primarily a reduction in the time needed for 1) a designer to modify and compile a HDL design and 2) to perform the simulation. The Simulator serves as a bridge between the application specific hardware and software design. The result is the ability to use the simulator as part of a framework for rapidly investigating the construction of an optimal system architecture for a given set of algorithms. Investigations with the Walsh-Hadamard transform (WHT), a prototypical digital signal processing transform, are shown. The SPIRAL (www.spiral.net) code generation system is used to explore the space of WHT algorithms while the Simulator is used to explore the performance and trade-offs of different DPA configurations.Ph.D., Computer Science -- Drexel University, 201

    AxBench: A Benchmark Suite for Approximate Computing Across the System Stack

    Get PDF
    Research areas: Approximate computing, Computer architectureAs the end of Dennard scaling looms, both the semiconductor industry and the research community are exploring for innovative solutions that allow energy efficiency and performance to continue to scale. Approximation computing has become one of the viable techniques to perpetuate the historical improvements in the computing landscape. As approximate computing attracts more attention in the community, having a general, diverse, and representative set of benchmarks to evaluate different approximation techniques becomes necessary. In this paper, we develop and introduce AxBench, a general, diverse and representative multi-framework set of benchmarks for CPUs, GPUs, and hardware design with the total number of 29 benchmarks. We judiciously select and develop each benchmark to cover a diverse set of domains such as machine learning, scientific computation, signal processing, image processing, robotics, and compression. AxBench comes with the necessary annotations to mark the approximable region of code and the application-specific quality metric to assess the output quality of each application. AxBenchwith these set of annotations facilitate the evaluation of different approximation techniques. To demonstrate its effectiveness, we evaluate three previously proposed approximation techniques using AxBench benchmarks: loop perforation [1] and neural processing units (NPUs) [2–4] on CPUs and GPUs, and Axilog [5] on dedicated hardware. We find that (1) NPUs offer higher performance and energy efficiency as compared to loop perforation on both CPUs and GPUs, (2) while NPUs provide considerable efficiency gains on CPUs, there still remains significant opportunity to be explored by other approximation techniques, (3) Unlike on CPUs, NPUs offer full benefits of approximate computations on GPUs, and (4) considerable opportunity remains to be explored by innovative approximate computation techniques at the hardware level after applying Axilog

    Towards a table top quantum computer

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 1999.Includes bibliographical references (leaves 135-139).In the early 1990s, quantum computing proved to be an enticing theoretical possibility but a extremely difficult experimental challenge. Two advances have made experimental quantum computing demonstrable: Quantum error correction; and bulk, thermal quantum computing using nuclear magnetic resonance (NMR). Simple algorithms have been implemented on large, commercial NMR spectrometers that are expensive and cumbersome. The goal of this project is to construct a table-top quantum computer that can match and eventually exceed the performance of commercial machines. This computer should be an inexpensive, easy-to-use machine that can be considered more a computer than its "supercomputer" counterparts. For this thesis, the goal is to develop a low-cost, table-top quantum computer capable of implementing simple quantum algorithms demonstrated thus far in the community, but is also amenable to the many scaling issues of practical quantum computing. Understanding these scaling issues requires developing a theoretical understanding of the signal enhancement techniques and fundamental noise sources of this powerful but delicate system. Complementary to quantum computing, this high performance but low cost NMR machine will be useful for a number of medical, low cost sensing and tagging applications due the unique properties of NMR: the ability to sense and manipulate the information content of materials on macroscopic and microscopic scales.Yael G. Maguire.S.M

    Exploring Processor and Memory Architectures for Multimedia

    Get PDF
    Multimedia has become one of the cornerstones of our 21st century society and, when combined with mobility, has enabled a tremendous evolution of our society. However, joining these two concepts introduces many technical challenges. These range from having sufficient performance for handling multimedia content to having the battery stamina for acceptable mobile usage. When taking a projection of where we are heading, we see these issues becoming ever more challenging by increased mobility as well as advancements in multimedia content, such as introduction of stereoscopic 3D and augmented reality. The increased performance needs for handling multimedia come not only from an ongoing step-up in resolution going from QVGA (320x240) to Full HD (1920x1080) a 27x increase in less than half a decade. On top of this, there is also codec evolution (MPEG-2 to H.264 AVC) that adds to the computational load increase. To meet these performance challenges there has been processing and memory architecture advances (SIMD, out-of-order superscalarity, multicore processing and heterogeneous multilevel memories) in the mobile domain, in conjunction with ever increasing operating frequencies (200MHz to 2GHz) and on-chip memory sizes (128KB to 2-3MB). At the same time there is an increase in requirements for mobility, placing higher demands on battery-powered systems despite the steady increase in battery capacity (500 to 2000mAh). This leaves negative net result in-terms of battery capacity versus performance advances. In order to make optimal use of these architectural advances and to meet the power limitations in mobile systems, there is a need for taking an overall approach on how to best utilize these systems. The right trade-off between performance and power is crucial. On top of these constraints, the flexibility aspects of the system need to be addressed. All this makes it very important to reach the right architectural balance in the system. The first goal for this thesis is to examine multimedia applications and propose a flexible solution that can meet the architectural requirements in a mobile system. Secondly, propose an automated methodology of optimally mapping multimedia data and instructions to a heterogeneous multilevel memory subsystem. The proposed methodology uses constraint programming for solving a multidimensional optimization problem. Results from this work indicate that using today’s most advanced mobile processor technology together with a multi-level heterogeneous on-chip memory subsystem can meet the performance requirements for handling multimedia. By utilizing the automated optimal memory mapping method presented in this thesis lower total power consumption can be achieved, whilst performance for multimedia applications is improved, by employing enhanced memory management. This is achieved through reduced external accesses and better reuse of memory objects. This automatic method shows high accuracy, up to 90%, for predicting multimedia memory accesses for a given architecture

    Indexed dependence metadata and its applications in software performance optimisation

    No full text
    To achieve continued performance improvements, modern microprocessor design is tending to concentrate an increasing proportion of hardware on computation units with less automatic management of data movement and extraction of parallelism. As a result, architectures increasingly include multiple computation cores and complicated, software-managed memory hierarchies. Compilers have difficulty characterizing the behaviour of a kernel in a general enough manner to enable automatic generation of efficient code in any but the most straightforward of cases. We propose the concept of indexed dependence metadata to improve application development and mapping onto such architectures. The metadata represent both the iteration space of a kernel and the mapping of that iteration space from a given index to the set of data elements that iteration might use: thus the dependence metadata is indexed by the kernel’s iteration space. This explicit mapping allows the compiler or runtime to optimise the program more efficiently, and improves the program structure for the developer. We argue that this form of explicit interface specification reduces the need for premature, architecture-specific optimisation. It improves program portability, supports intercomponent optimisation and enables generation of efficient data movement code. We offer the following contributions: an introduction to the concept of indexed dependence metadata as a generalisation of stream programming, a demonstration of its advantages in a component programming system, the decoupled access/execute model for C++ programs, and how indexed dependence metadata might be used to improve the programming model for GPU-based designs. Our experimental results with prototype implementations show that indexed dependence metadata supports automatic synthesis of double-buffered data movement for the Cell processor and enables aggressive loop fusion optimisations in image processing, linear algebra and multigrid application case studies

    A novel spatio-temporal scheme for reducing the rate of false positives in bloom filter based URL-caching

    Get PDF
    Masteroppgave i informasjons- og kommunikasjonsteknologi 2010 – Universitetet i Agder, GrimstadAchieving efficient use of available resources is an important problem in the field of web mining. Monitoring and analyzing the web is extremely resource demanding, and therefore, more efficient use of resources often translates directly into improved web monitoring coverage and accuracy. One important sub problem is to reduce the memory consumption of the URL cache in a web crawler system. Utilizing the space efficient data structure Bloom filter as URL cache, will reduce the memory consumption. However, the Bloom filter introduces false positives, leading to loss of valuable web content when the filter are utilized as a URL cache in a web crawler system. Based on the latter problems of false positives, this thesis propose three novel strategies, namely a temporal, a spatial and a spatio-temporal strategy, each aiming to reduce the false positive rate introduced by the Bloom filter. During testing and evaluation of the strategies, we discovered both the spatial and temporal strategy is able to reduce the false positive in the Bloom filter. The two former strategies was then combined to test if it is possible to further decrease the false positive probability. Testing and evaluation of the combined strategies shows that it does yield a reduction in the false positive probability

    Preserving privacy in edge computing

    Get PDF
    Edge computing or fog computing enables realtime services to smart application users by storing data and services at the edge of the networks. Edge devices in the edge computing handle data storage and service provisioning. Therefore, edge computing has become a  new norm for several delay-sensitive smart applications such as automated vehicles, ambient-assisted living, emergency response services, precision agriculture, and smart electricity grids. Despite having great potential, privacy threats are the main barriers to the success of edge computing. Attackers can leak private or sensitive information of data owners and modify service-related data for hampering service provisioning in edge computing-based smart applications. This research takes privacy issues of heterogeneous smart application data into account that are stored in edge data centers. From there, this study focuses on the development of privacy-preserving models for user-generated smart application data in edge computing and edge service-related data, such as Quality-of-Service (QoS) data, for ensuring unbiased service provisioning. We begin with developing privacy-preserving techniques for user data generated by smart applications using steganography that is one of the data hiding techniques. In steganography, user sensitive information is hidden within nonsensitive information of data before outsourcing smart application data, and stego data are produced for storing in the edge data center. A steganography approach must be reversible or lossless to be useful in privacy-preserving techniques. In this research, we focus on numerical (sensor data) and textual (DNA sequence and text) data steganography. Existing steganography approaches for numerical data are irreversible. Hence, we introduce a lossless or reversible numerical data steganography approach using Error Correcting Codes (ECC). Modern lossless steganography approaches for text data steganography are mainly application-specific and lacks imperceptibility, and DNA steganography requires reference DNA sequence for the reconstruction of the original DNA sequence. Therefore, we present the first blind and lossless DNA sequence steganography approach based on the nucleotide substitution method in this study. In addition, a text steganography method is proposed that using invisible character and compression based encoding for ensuring reversibility and higher imperceptibility.  Different experiments are conducted to demonstrate the justification of our proposed methods in these studies. The searching capability of the stored stego data is challenged in the edge data center without disclosing sensitive information. We present a privacy-preserving search framework for stego data on the edge data center that includes two methods. In the first method, we present a keyword-based privacy-preserving search method that allows a user to send a search query as a hash string. However, this method does not support the range query. Therefore, we develop a range search method on stego data using an order-preserving encryption (OPE) scheme. In both cases, the search service provider retrieves corresponding stego data without revealing any sensitive information. Several experiments are conducted for evaluating the performance of the framework. Finally, we present a privacy-preserving service computation framework using Fully Homomorphic Encryption (FHE) based cryptosystem for ensuring the service provider's privacy during service selection and composition. Our contributions are two folds. First, we introduce a privacy-preserving service selection model based on encrypted Quality-of-Service (QoS) values of edge services for ensuring privacy. QoS values are encrypted using FHE. A distributed computation model for service selection using MapReduce is designed for improving efficiency. Second, we develop a composition model for edge services based on the functional relationship among edge services for optimizing the service selection process. Various experiments are performed in both centralized and distributed computing environments to evaluate the performance of the proposed framework using a synthetic QoS dataset

    Understanding Quantum Technologies 2022

    Full text link
    Understanding Quantum Technologies 2022 is a creative-commons ebook that provides a unique 360 degrees overview of quantum technologies from science and technology to geopolitical and societal issues. It covers quantum physics history, quantum physics 101, gate-based quantum computing, quantum computing engineering (including quantum error corrections and quantum computing energetics), quantum computing hardware (all qubit types, including quantum annealing and quantum simulation paradigms, history, science, research, implementation and vendors), quantum enabling technologies (cryogenics, control electronics, photonics, components fabs, raw materials), quantum computing algorithms, software development tools and use cases, unconventional computing (potential alternatives to quantum and classical computing), quantum telecommunications and cryptography, quantum sensing, quantum technologies around the world, quantum technologies societal impact and even quantum fake sciences. The main audience are computer science engineers, developers and IT specialists as well as quantum scientists and students who want to acquire a global view of how quantum technologies work, and particularly quantum computing. This version is an extensive update to the 2021 edition published in October 2021.Comment: 1132 pages, 920 figures, Letter forma
    corecore