367 research outputs found
The Capacity of Write-Once Memory with a Writing Cost Function
書換に制限を有する記憶媒体というのは古今東西存在する.例えば,パンチカードは一度穴を開けたら塞ぐことができない.またフラッシュメモリにおいては,ブロック消去と呼ばれる特別な操作を行わなければ,記憶素子であるセル内部の電荷を減らすことができない.R.L. RivestとA. Shamirは1982年に,そのような制限を有する記憶媒体のモデルとしてWrite-Once Memory (WOM)を導入した.そして彼らは,WOMを複数回にわたり書換える際,書込可能なメッセージの量をなるべく多くするための工夫とはどのようなものかを提示した.そのような工夫は,WOM符号化と呼ばれている.オリジナルのWOMでは記憶素子の取りうる状態が二値であったが,その後1984年にA. FiatとA. Shamirにより,多値を扱える一般化WOMが提案された.一般化WOMでは,素子の取りうる状態および状態間の可能な遷移を,無閉路有向グラフによって指定する.WOMにおいて興味深い問題の一つは,最良の符号化によりどれだけ多くのメッセージを書込めるかということである.この問題に対する解答として,F. FuとA.J. Han Vinckは1999年に,任意の一般化WOMにおける容量域と最大和率を決定した.本研究では,一般化WOMをさらに拡張したWrite-Constrained Memory (WCM)を提案する.WCMの大きな特徴の一つは,それが状態遷移のコストを考慮することである.実用的には,状態遷移コストは時間やエネルギーといった物理量としての意味を持つものであり,何らかの理由でそれを制限しなければならないという状況が想定される.本研究ではそのような制限の一つとして,WCMに対する平均コスト制約というものを提案する.本研究における主要な結果は,任意のWCMにおいて,定数回にわたり書換える際,平均コスト制約を満足する容量域および最大和率を決定したことである.電気通信大学201
CU2CL: A CUDA-to-OpenCL Translator for Multi- and Many-core Architectures
The use of graphics processing units (GPUs) in
high-performance parallel computing continues to become more
prevalent, often as part of a heterogeneous system. For years,
CUDA has been the de facto programming environment for
nearly all general-purpose GPU (GPGPU) applications. In spite
of this, the framework is available only on NVIDIA GPUs,
traditionally requiring reimplementation in other frameworks
in order to utilize additional multi- or many-core devices.
On the other hand, OpenCL provides an open and vendorneutral
programming environment and runtime system. With
implementations available for CPUs, GPUs, and other types of
accelerators, OpenCL therefore holds the promise of a “write
once, run anywhere” ecosystem for heterogeneous computing.
Given the many similarities between CUDA and OpenCL,
manually porting a CUDA application to OpenCL is typically
straightforward, albeit tedious and error-prone. In response
to this issue, we created CU2CL, an automated CUDA-to-
OpenCL source-to-source translator that possesses a novel design
and clever reuse of the Clang compiler framework. Currently,
the CU2CL translator covers the primary constructs found in
CUDA runtime API, and we have successfully translated many
applications from the CUDA SDK and Rodinia benchmark suite.
The performance of our automatically translated applications via
CU2CL is on par with their manually ported countparts
Deterministic History-Independent Strategies for Storing Information on Write-Once Memories
Motivated by the challenging task of designing ``secure\u27\u27 vote storage mechanisms, we study information storage mechanisms that operate in extremely hostile environments. In such environments, the majority of existing techniques for information storage and for security are susceptible to powerful adversarial attacks. We propose a mechanism for storing a set of at most elements from a large universe of size on write-once memories in a manner that does not reveal the insertion order of the elements. We consider a standard model for write-once memories, in which the memory is initialized to the all \u27s state, and the only operation allowed is flipping bits from to . Whereas previously known constructions were either inefficient (required memory), randomized, or employed cryptographic techniques which are unlikely to be available in hostile environments, we eliminate each of these undesirable properties. The total amount of memory used by the mechanism is linear in the number of stored elements and poly-logarithmic in the size of the universe of elements.
We also demonstrate a connection between secure vote storage mechanisms and one of the classical distributed computing problems: conflict resolution in multiple-access channels. By establishing a tight connection with the basic building block of our mechanism, we construct the first deterministic and non-adaptive conflict resolution algorithm whose running time is optimal up to poly-logarithmic factors
Recommended from our members
Automatic data/program partitioning using the single assignment principle
Loosely-coupled MIMD architectures do not suffer from memory contention; hence large numbers of processors may be utilized. The main problem, however, is how to partition data and programs in order to exploit the available parallelism. In this paper we show that efficient schemes for automatic data/program partitioning and synchronization may be employed if single assignment is used. Using simulations of program loops common to scientific computations (the Livermore Loops), we demonstrate that only a small fraction of data accesses are remote and thus the degradation in network performance due to multiprocessing is minimal
DALiuGE: A Graph Execution Framework for Harnessing the Astronomical Data Deluge
The Data Activated Liu Graph Engine - DALiuGE - is an execution framework for
processing large astronomical datasets at a scale required by the Square
Kilometre Array Phase 1 (SKA1). It includes an interface for expressing complex
data reduction pipelines consisting of both data sets and algorithmic
components and an implementation run-time to execute such pipelines on
distributed resources. By mapping the logical view of a pipeline to its
physical realisation, DALiuGE separates the concerns of multiple stakeholders,
allowing them to collectively optimise large-scale data processing solutions in
a coherent manner. The execution in DALiuGE is data-activated, where each
individual data item autonomously triggers the processing on itself. Such
decentralisation also makes the execution framework very scalable and flexible,
supporting pipeline sizes ranging from less than ten tasks running on a laptop
to tens of millions of concurrent tasks on the second fastest supercomputer in
the world. DALiuGE has been used in production for reducing interferometry data
sets from the Karl E. Jansky Very Large Array and the Mingantu Ultrawide
Spectral Radioheliograph; and is being developed as the execution framework
prototype for the Science Data Processor (SDP) consortium of the Square
Kilometre Array (SKA) telescope. This paper presents a technical overview of
DALiuGE and discusses case studies from the CHILES and MUSER projects that use
DALiuGE to execute production pipelines. In a companion paper, we provide
in-depth analysis of DALiuGE's scalability to very large numbers of tasks on
two supercomputing facilities.Comment: 31 pages, 12 figures, currently under review by Astronomy and
Computin
External Memory Algorithms for Factoring Sparse Matrices
We consider the factorization of sparse symmetric matrices in the context of a two-layer storage system: disk/core. When the core is sufficiently large the factorization can be performed in-core. In this case we must read the input, compute, and write the output, in this sequence. On the other hand, when the core is not large enough, the factorization becomes out-of-core, which means that data movement and computation must be interleaved.
We identify two major out-of-core factorization scenarios: read-once/write-once (R1/W1) and read-many/write-many (RM/WM). The former requires minimum traffic, exactly as much as the in-core factorization: reading the input and writing the output. More traffic is required for the latter. We investigate three issues: the size of the core that determines the boundary between the two out-of-core scenarios, the in-core data structure reorganizations required by the R1/W1 factorization and the traffic required by the RM/WM factorization. We use three common factorization algorithms: left-looking, right-looking and multifrontal.
In the R1/W1 scenario, our results indicate that for problems with good separators, such as those coming from the discretization of partial differential equations, ordered with nested dissection, right-looking and multifrontal factorization perform slightly better than left-looking factorization. There are, however, applications for which multifrontal is a bad choice, requiring too much temporary storage. On the other hand, right-looking factorization should be avoided in the RM/WM scenario. Left-looking is a good choice, but only if data is blocked along one dimension. Multifrontal performs well for both one and two dimensional blocks as long as not too much storage is required.
We also explore a framework for a software implementation. We have implemented an in-core solver that relies on some object-oriented constructs. Most of the code is written in C++, except for some kernels written in Fortran 77. We intend to add out-of-core functionality to the code and data movement is a major concern. Implicit data movement represents the easy way, but, as some of our experiments show, good performance can be achieved only with explicit data movement. This complicates the code and we expect a substantial effort in order to implement an efficient out-of-core solver
BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments
Advances in sequencing techniques have led to exponential growth in
biological data, demanding the development of large-scale bioinformatics
experiments. Because these experiments are computation- and data-intensive,
they require high-performance computing (HPC) techniques and can benefit from
specialized technologies such as Scientific Workflow Management Systems (SWfMS)
and databases. In this work, we present BioWorkbench, a framework for managing
and analyzing bioinformatics experiments. This framework automatically collects
provenance data, including both performance data from workflow execution and
data from the scientific domain of the workflow application. Provenance data
can be analyzed through a web application that abstracts a set of queries to
the provenance database, simplifying access to provenance information. We
evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree
assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a
RASopathy analysis workflow. We analyze each workflow from both computational
and scientific domain perspectives, by using queries to a provenance and
annotation database. Some of these queries are available as a pre-built feature
of the BioWorkbench web application. Through the provenance data, we show that
the framework is scalable and achieves high-performance, reducing up to 98% of
the case studies execution time. We also show how the application of machine
learning techniques can enrich the analysis process
Overview of Caching Mechanisms to Improve Hadoop Performance
Nowadays distributed computing environments, large amounts of data are
generated from different resources with a high velocity, rendering the data
difficult to capture, manage, and process within existing relational databases.
Hadoop is a tool to store and process large datasets in a parallel manner
across a cluster of machines in a distributed environment. Hadoop brings many
benefits like flexibility, scalability, and high fault tolerance; however, it
faces some challenges in terms of data access time, I/O operation, and
duplicate computations resulting in extra overhead, resource wastage, and poor
performance. Many researchers have utilized caching mechanisms to tackle these
challenges. For example, they have presented approaches to improve data access
time, enhance data locality rate, remove repetitive calculations, reduce the
number of I/O operations, decrease the job execution time, and increase
resource efficiency. In the current study, we provide a comprehensive overview
of caching strategies to improve Hadoop performance. Additionally, a novel
classification is introduced based on cache utilization. Using this
classification, we analyze the impact on Hadoop performance and discuss the
advantages and disadvantages of each group. Finally, a novel hybrid approach
called Hybrid Intelligent Cache (HIC) that combines the benefits of two methods
from different groups, H-SVM-LRU and CLQLMRS, is presented. Experimental
results show that our hybrid method achieves an average improvement of 31.2% in
job execution time
- …