85,437 research outputs found
MOF-BC: A Memory Optimized and Flexible BlockChain for Large Scale Networks
BlockChain (BC) immutability ensures BC resilience against modification or
removal of the stored data. In large scale networks like the Internet of Things
(IoT), however, this feature significantly increases BC storage size and raises
privacy challenges. In this paper, we propose a Memory Optimized and Flexible
BC (MOF-BC) that enables the IoT users and service providers to remove or
summarize their transactions and age their data and to exercise the "right to
be forgotten". To increase privacy, a user may employ multiple keys for
different transactions. To allow for the removal of stored transactions, all
keys would need to be stored which complicates key management and storage.
MOF-BC introduces the notion of a Generator Verifier (GV) which is a signed
hash of a Generator Verifier Secret (GVS). The GV changes for each transaction
to provide privacy yet is signed by a unique key, thus minimizing the
information that needs to be stored. A flexible transaction fee model and a
reward mechanism is proposed to incentivize users to participate in optimizing
memory consumption. Qualitative security and privacy analysis demonstrates that
MOF-BC is resilient against several security attacks. Evaluation results show
that MOF-BC decreases BC memory consumption by up to 25\% and the user cost by
more than two orders of magnitude compared to conventional BC instantiations
Job Management and Task Bundling
High Performance Computing is often performed on scarce and shared computing
resources. To ensure computers are used to their full capacity, administrators
often incentivize large workloads that are not possible on smaller systems.
Measurements in Lattice QCD frequently do not scale to machine-size workloads.
By bundling tasks together we can create large jobs suitable for gigantic
partitions. We discuss METAQ and mpi_jm, software developed to dynamically
group computational tasks together, that can intelligently backfill to consume
idle time without substantial changes to users' current workflows or
executables.Comment: 8 pages, 3 figures, LATTICE 2017 proceeding
Instruction fetch architectures and code layout optimizations
The design of higher performance processors has been following two major trends: increasing the pipeline depth to allow faster clock rates, and widening the pipeline to allow parallel execution of more instructions. Designing a higher performance processor implies balancing all the pipeline stages to ensure that overall performance is not dominated by any of them. This means that a faster execution engine also requires a faster fetch engine, to ensure that it is possible to read and decode enough instructions to keep the pipeline full and the functional units busy. This paper explores the challenges faced by the instruction fetch stage for a variety of processor designs, from early pipelined processors, to the more aggressive wide issue superscalars. We describe the different fetch engines proposed in the literature, the performance issues involved, and some of the proposed improvements. We also show how compiler techniques that optimize the layout of the code in memory can be used to improve the fetch performance of the different engines described Overall, we show how instruction fetch has evolved from fetching one instruction every few cycles, to fetching one instruction per cycle, to fetching a full basic block per cycle, to several basic blocks per cycle: the evolution of the mechanism surrounding the instruction cache, and the different compiler optimizations used to better employ these mechanisms.Peer ReviewedPostprint (published version
Exploring compression techniques for ROOT IO
ROOT provides an flexible format used throughout the HEP community. The
number of use cases - from an archival data format to end-stage analysis - has
required a number of tradeoffs to be exposed to the user. For example, a high
"compression level" in the traditional DEFLATE algorithm will result in a
smaller file (saving disk space) at the cost of slower decompression (costing
CPU time when read). At the scale of the LHC experiment, poor design choices
can result in terabytes of wasted space or wasted CPU time. We explore and
attempt to quantify some of these tradeoffs. Specifically, we explore: the use
of alternate compressing algorithms to optimize for read performance; an
alternate method of compressing individual events to allow efficient random
access; and a new approach to whole-file compression. Quantitative results are
given, as well as guidance on how to make compression decisions for different
use cases.Comment: Proceedings for 22nd International Conference on Computing in High
Energy and Nuclear Physics (CHEP 2016
- …