367 research outputs found

    The Capacity of Write-Once Memory with a Writing Cost Function

    Get PDF
    書換に制限を有する記憶媒体というのは古今東西存在する.例えば,パンチカードは一度穴を開けたら塞ぐことができない.またフラッシュメモリにおいては,ブロック消去と呼ばれる特別な操作を行わなければ,記憶素子であるセル内部の電荷を減らすことができない.R.L. RivestとA. Shamirは1982年に,そのような制限を有する記憶媒体のモデルとしてWrite-Once Memory (WOM)を導入した.そして彼らは,WOMを複数回にわたり書換える際,書込可能なメッセージの量をなるべく多くするための工夫とはどのようなものかを提示した.そのような工夫は,WOM符号化と呼ばれている.オリジナルのWOMでは記憶素子の取りうる状態が二値であったが,その後1984年にA. FiatとA. Shamirにより,多値を扱える一般化WOMが提案された.一般化WOMでは,素子の取りうる状態および状態間の可能な遷移を,無閉路有向グラフによって指定する.WOMにおいて興味深い問題の一つは,最良の符号化によりどれだけ多くのメッセージを書込めるかということである.この問題に対する解答として,F. FuとA.J. Han Vinckは1999年に,任意の一般化WOMにおける容量域と最大和率を決定した.本研究では,一般化WOMをさらに拡張したWrite-Constrained Memory (WCM)を提案する.WCMの大きな特徴の一つは,それが状態遷移のコストを考慮することである.実用的には,状態遷移コストは時間やエネルギーといった物理量としての意味を持つものであり,何らかの理由でそれを制限しなければならないという状況が想定される.本研究ではそのような制限の一つとして,WCMに対する平均コスト制約というものを提案する.本研究における主要な結果は,任意のWCMにおいて,定数回にわたり書換える際,平均コスト制約を満足する容量域および最大和率を決定したことである.電気通信大学201

    CU2CL: A CUDA-to-OpenCL Translator for Multi- and Many-core Architectures

    Get PDF
    The use of graphics processing units (GPUs) in high-performance parallel computing continues to become more prevalent, often as part of a heterogeneous system. For years, CUDA has been the de facto programming environment for nearly all general-purpose GPU (GPGPU) applications. In spite of this, the framework is available only on NVIDIA GPUs, traditionally requiring reimplementation in other frameworks in order to utilize additional multi- or many-core devices. On the other hand, OpenCL provides an open and vendorneutral programming environment and runtime system. With implementations available for CPUs, GPUs, and other types of accelerators, OpenCL therefore holds the promise of a “write once, run anywhere” ecosystem for heterogeneous computing. Given the many similarities between CUDA and OpenCL, manually porting a CUDA application to OpenCL is typically straightforward, albeit tedious and error-prone. In response to this issue, we created CU2CL, an automated CUDA-to- OpenCL source-to-source translator that possesses a novel design and clever reuse of the Clang compiler framework. Currently, the CU2CL translator covers the primary constructs found in CUDA runtime API, and we have successfully translated many applications from the CUDA SDK and Rodinia benchmark suite. The performance of our automatically translated applications via CU2CL is on par with their manually ported countparts

    Deterministic History-Independent Strategies for Storing Information on Write-Once Memories

    Get PDF
    Motivated by the challenging task of designing ``secure\u27\u27 vote storage mechanisms, we study information storage mechanisms that operate in extremely hostile environments. In such environments, the majority of existing techniques for information storage and for security are susceptible to powerful adversarial attacks. We propose a mechanism for storing a set of at most KK elements from a large universe of size NN on write-once memories in a manner that does not reveal the insertion order of the elements. We consider a standard model for write-once memories, in which the memory is initialized to the all 00\u27s state, and the only operation allowed is flipping bits from 00 to 11. Whereas previously known constructions were either inefficient (required Θ(K2)\Theta(K^2) memory), randomized, or employed cryptographic techniques which are unlikely to be available in hostile environments, we eliminate each of these undesirable properties. The total amount of memory used by the mechanism is linear in the number of stored elements and poly-logarithmic in the size of the universe of elements. We also demonstrate a connection between secure vote storage mechanisms and one of the classical distributed computing problems: conflict resolution in multiple-access channels. By establishing a tight connection with the basic building block of our mechanism, we construct the first deterministic and non-adaptive conflict resolution algorithm whose running time is optimal up to poly-logarithmic factors

    DALiuGE: A Graph Execution Framework for Harnessing the Astronomical Data Deluge

    Full text link
    The Data Activated Liu Graph Engine - DALiuGE - is an execution framework for processing large astronomical datasets at a scale required by the Square Kilometre Array Phase 1 (SKA1). It includes an interface for expressing complex data reduction pipelines consisting of both data sets and algorithmic components and an implementation run-time to execute such pipelines on distributed resources. By mapping the logical view of a pipeline to its physical realisation, DALiuGE separates the concerns of multiple stakeholders, allowing them to collectively optimise large-scale data processing solutions in a coherent manner. The execution in DALiuGE is data-activated, where each individual data item autonomously triggers the processing on itself. Such decentralisation also makes the execution framework very scalable and flexible, supporting pipeline sizes ranging from less than ten tasks running on a laptop to tens of millions of concurrent tasks on the second fastest supercomputer in the world. DALiuGE has been used in production for reducing interferometry data sets from the Karl E. Jansky Very Large Array and the Mingantu Ultrawide Spectral Radioheliograph; and is being developed as the execution framework prototype for the Science Data Processor (SDP) consortium of the Square Kilometre Array (SKA) telescope. This paper presents a technical overview of DALiuGE and discusses case studies from the CHILES and MUSER projects that use DALiuGE to execute production pipelines. In a companion paper, we provide in-depth analysis of DALiuGE's scalability to very large numbers of tasks on two supercomputing facilities.Comment: 31 pages, 12 figures, currently under review by Astronomy and Computin

    External Memory Algorithms for Factoring Sparse Matrices

    Get PDF
    We consider the factorization of sparse symmetric matrices in the context of a two-layer storage system: disk/core. When the core is sufficiently large the factorization can be performed in-core. In this case we must read the input, compute, and write the output, in this sequence. On the other hand, when the core is not large enough, the factorization becomes out-of-core, which means that data movement and computation must be interleaved. We identify two major out-of-core factorization scenarios: read-once/write-once (R1/W1) and read-many/write-many (RM/WM). The former requires minimum traffic, exactly as much as the in-core factorization: reading the input and writing the output. More traffic is required for the latter. We investigate three issues: the size of the core that determines the boundary between the two out-of-core scenarios, the in-core data structure reorganizations required by the R1/W1 factorization and the traffic required by the RM/WM factorization. We use three common factorization algorithms: left-looking, right-looking and multifrontal. In the R1/W1 scenario, our results indicate that for problems with good separators, such as those coming from the discretization of partial differential equations, ordered with nested dissection, right-looking and multifrontal factorization perform slightly better than left-looking factorization. There are, however, applications for which multifrontal is a bad choice, requiring too much temporary storage. On the other hand, right-looking factorization should be avoided in the RM/WM scenario. Left-looking is a good choice, but only if data is blocked along one dimension. Multifrontal performs well for both one and two dimensional blocks as long as not too much storage is required. We also explore a framework for a software implementation. We have implemented an in-core solver that relies on some object-oriented constructs. Most of the code is written in C++, except for some kernels written in Fortran 77. We intend to add out-of-core functionality to the code and data movement is a major concern. Implicit data movement represents the easy way, but, as some of our experiments show, good performance can be achieved only with explicit data movement. This complicates the code and we expect a substantial effort in order to implement an efficient out-of-core solver

    BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments

    Get PDF
    Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing (HPC) techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems (SWfMS) and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process

    Overview of Caching Mechanisms to Improve Hadoop Performance

    Full text link
    Nowadays distributed computing environments, large amounts of data are generated from different resources with a high velocity, rendering the data difficult to capture, manage, and process within existing relational databases. Hadoop is a tool to store and process large datasets in a parallel manner across a cluster of machines in a distributed environment. Hadoop brings many benefits like flexibility, scalability, and high fault tolerance; however, it faces some challenges in terms of data access time, I/O operation, and duplicate computations resulting in extra overhead, resource wastage, and poor performance. Many researchers have utilized caching mechanisms to tackle these challenges. For example, they have presented approaches to improve data access time, enhance data locality rate, remove repetitive calculations, reduce the number of I/O operations, decrease the job execution time, and increase resource efficiency. In the current study, we provide a comprehensive overview of caching strategies to improve Hadoop performance. Additionally, a novel classification is introduced based on cache utilization. Using this classification, we analyze the impact on Hadoop performance and discuss the advantages and disadvantages of each group. Finally, a novel hybrid approach called Hybrid Intelligent Cache (HIC) that combines the benefits of two methods from different groups, H-SVM-LRU and CLQLMRS, is presented. Experimental results show that our hybrid method achieves an average improvement of 31.2% in job execution time
    corecore