136 research outputs found

    Error properties of Hartley transform algorithms

    Get PDF
    Originally presented as author's thesis (M.S. -- Massachusetts Institute of Technology), 1985."References": leaves [196]-197.Supported in part by the Advanced Research Projects Agency monitored by ONR under contract no. N00014-81-K-0742 Supported in part by the National Science Foundation under grant ECS-8407285Avideh Zakhor

    Hardware-Oriented Cache Management for Large-Scale Chip Multiprocessors

    Get PDF
    One of the key requirements to obtaining high performance from chip multiprocessors (CMPs) is to effectively manage the limited on-chip cache resources shared among co-scheduled threads/processes. This thesis proposes new hardware-oriented solutions for distributed CMP caches. Computer architects are faced with growing challenges when designing cache systems for CMPs. These challenges result from non-uniform access latencies, interference misses, the bandwidth wall problem, and diverse workload characteristics. Our exploration of the CMP cache management problem suggests a CMP caching framework (CC-FR) that defines three main approaches to solve the problem: (1) data placement, (2) data retention, and (3) data relocation. We effectively implement CC-FR's components by proposing and evaluating multiple cache management mechanisms.Pressure and Distance Aware Placement (PDA) decouples the physical locations of cache blocks from their addresses for the sake of reducing misses caused by destructive interferences. Flexible Set Balancing (FSB), on the other hand, reduces interference misses via extending the life time of cache lines through retaining some fraction of the working set at underutilized local sets to satisfy far-flung reuses. PDA implements CC-FR's data placement and relocation components and FSB applies CC-FR's retention approach.To alleviate non-uniform access latencies and adapt to phase changes in programs, Adaptive Controlled Migration (ACM) dynamically and periodically promotes cache blocks towards L2 banks close to requesting cores. ACM lies under CC-FR's data relocation category. Dynamic Cache Clustering (DCC), on the other hand, addresses diverse workload characteristics and growing non-uniform access latencies challenges via constructing a cache cluster for each core and expands/contracts all clusters synergistically to match each core's cache demand. DCC implements CC-FR's data placement and relocation approaches. Lastly, Dynamic Pressure and Distance Aware Placement (DPDA) combines PDA and ACM to cooperatively mitigate interference misses and non-uniform access latencies. Dynamic Cache Clustering and Balancing (DCCB), on the other hand, combines DCC and FSB to employ all CC-FR's categories and achieve higher system performance. Simulation results demonstrate the effectiveness of the proposed mechanisms and show that they compare favorably with related cache designs

    The End of Slow Networks: It's Time for a Redesign

    Full text link
    Next generation high-performance RDMA-capable networks will require a fundamental rethinking of the design and architecture of modern distributed DBMSs. These systems are commonly designed and optimized under the assumption that the network is the bottleneck: the network is slow and "thin", and thus needs to be avoided as much as possible. Yet this assumption no longer holds true. With InfiniBand FDR 4x, the bandwidth available to transfer data across network is in the same ballpark as the bandwidth of one memory channel, and it increases even further with the most recent EDR standard. Moreover, with the increasing advances of RDMA, the latency improves similarly fast. In this paper, we first argue that the "old" distributed database design is not capable of taking full advantage of the network. Second, we propose architectural redesigns for OLTP, OLAP and advanced analytical frameworks to take better advantage of the improved bandwidth, latency and RDMA capabilities. Finally, for each of the workload categories, we show that remarkable performance improvements can be achieved

    The Design and Implementation of FFTW3

    Full text link

    Demonstration of Inexact Computing Implemented in the JPEG Compression Algorithm using Probabilistic Boolean Logic applied to CMOS Components

    Get PDF
    Probabilistic computing offers potential improvements in energy, performance, and area compared with traditional digital design. This dissertation quantifies energy and energy-delay tradeoffs in digital adders, multipliers, and the JPEG image compression algorithm. This research shows that energy demand can be cut in half with noisesusceptible16-bit Kogge-Stone adders that deviate from the correct value by an average of 3 in 14 nanometer CMOS FinFET technology, while the energy-delay product (EDP) is reduced by 38 . This is achieved by reducing the power supply voltage which drives the noisy transistors. If a 19 average error is allowed, the adders are 13 times more energy-efficient and the EDP is reduced by 35 . This research demonstrates that 92 of the color space transform and discrete cosine transform circuits within the JPEG algorithm can be built from inexact components, and still produce readable images. Given the case in which each binary logic gate has a 1 error probability, the color space transformation has an average pixel error of 5.4 and a 55 energy reduction compared to the error-free circuit, and the discrete cosine transformation has a 55 energy reduction with an average pixel error of 20

    Design of application specific instruction set processors for the EFT and FHT algorithms

    Get PDF
    Cataloged from PDF version of article.Orthogonal Frequency Division Multiplexing (OFDM) is a multicarrier transmission technique which is used in many digital communication systems. In this technique, Fast Fourier Transformation (FFT) and inverse FFT (IFFT) are kernel processing blocks which are used for data modulation and demodulation respectively. Another algorithm which can be used for multi-carrier transmission is the Fast Hartley Transform algorithm. The FHT is a real valued transformation and can give significantly better results than FFT algorithm in terms of energy efficiency, speed and die area. This thesis presents Application Specific Instruction Set Processors (ASIP) for the FFT and FHT algorithms. ASIPs combine the flexibility of general purpose processors and efficiency of application specific integrated circuits (ASIC). Programmability makes the processor flexible and special instructions, memory architecture and pipeline makes the processor efficient. In order to design a low power processor we have selected the recently proposed cached FFT algorithm which outperforms standard FFT. For the cached FFT algorithm we have designed two ASIPs one having a single execution unitAtak, OğuzhanM.S

    LexiDB: Patterns & Methods for Corpus Linguistic Database Management

    Get PDF
    LexiDB is a tool for storing, managing and querying corpus data. In contrast to other database management systems (DBMSs), itis designed specifically for text corpora. It improves on other corpus management systems (CMSs) because data can be added anddeleted from corpora on the fly with the ability to add live data to existing corpora. LexiDB sits between these two categories ofDBMSs and CMSs, more specialised to language data than a general-purpose DBMS but more flexible than a traditional static corpusmanagement system. Previous work has demonstrated the scalability of LexiDB in response to the growing need to be able to scale outfor ever-growing corpus datasets. Here, we present the patterns and methods developed in LexiDB for storage, retrieval and querying ofmulti-level annotated corpus data. These techniques are evaluated and compared to an existing CMS (Corpus Workbench CWB - CQP)and indexer (Lucene). We find that LexiDB consistently outperforms existing tools for corpus queries. This is particularly apparent withlarge corpora and when handling queries with large result sets

    A Location-Aware Middleware Framework for Collaborative Visual Information Discovery and Retrieval

    Get PDF
    This work addresses the problem of scalable location-aware distributed indexing to enable the leveraging of collaborative effort for the construction and maintenance of world-scale visual maps and models which could support numerous activities including navigation, visual localization, persistent surveillance, structure from motion, and hazard or disaster detection. Current distributed approaches to mapping and modeling fail to incorporate global geospatial addressing and are limited in their functionality to customize search. Our solution is a peer-to-peer middleware framework based on XOR distance routing which employs a Hilbert Space curve addressing scheme in a novel distributed geographic index. This allows for a universal addressing scheme supporting publish and search in dynamic environments while ensuring global availability of the model and scalability with respect to geographic size and number of users. The framework is evaluated using large-scale network simulations and a search application that supports visual navigation in real-world experiments

    Signal processing applications of massively parallel charge domain computing devices

    Get PDF
    The present invention is embodied in a charge coupled device (CCD)/charge injection device (CID) architecture capable of performing a Fourier transform by simultaneous matrix vector multiplication (MVM) operations in respective plural CCD/CID arrays in parallel in O(1) steps. For example, in one embodiment, a first CCD/CID array stores charge packets representing a first matrix operator based upon permutations of a Hartley transform and computes the Fourier transform of an incoming vector. A second CCD/CID array stores charge packets representing a second matrix operator based upon different permutations of a Hartley transform and computes the Fourier transform of an incoming vector. The incoming vector is applied to the inputs of the two CCD/CID arrays simultaneously, and the real and imaginary parts of the Fourier transform are produced simultaneously in the time required to perform a single MVM operation in a CCD/CID array

    Acta Cybernetica : Volume 25. Number 2.

    Get PDF
    corecore