7,328 research outputs found
SL: a "quick and dirty" but working intermediate language for SVP systems
The CSA group at the University of Amsterdam has developed SVP, a framework
to manage and program many-core and hardware multithreaded processors. In this
article, we introduce the intermediate language SL, a common vehicle to program
SVP platforms. SL is designed as an extension to the standard C language (ISO
C99/C11). It includes primitive constructs to bulk create threads, bulk
synchronize on termination of threads, and communicate using word-sized
dataflow channels between threads. It is intended for use as target language
for higher-level parallelizing compilers. SL is a research vehicle; as of this
writing, it is the only interface language to program a main SVP platform, the
new Microgrid chip architecture. This article provides an overview of the
language, to complement a detailed specification available separately.Comment: 22 pages, 3 figures, 18 listings, 1 tabl
Extending and Implementing the Self-adaptive Virtual Processor for Distributed Memory Architectures
Many-core architectures of the future are likely to have distributed memory
organizations and need fine grained concurrency management to be used
effectively. The Self-adaptive Virtual Processor (SVP) is an abstract
concurrent programming model which can provide this, but the model and its
current implementations assume a single address space shared memory. We
investigate and extend SVP to handle distributed environments, and discuss a
prototype SVP implementation which transparently supports execution on
heterogeneous distributed memory clusters over TCP/IP connections, while
retaining the original SVP programming model
TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA
Memory consistency models (MCMs) which govern inter-module interactions in a
shared memory system, are a significant, yet often under-appreciated, aspect of
system design. MCMs are defined at the various layers of the hardware-software
stack, requiring thoroughly verified specifications, compilers, and
implementations at the interfaces between layers. Current verification
techniques evaluate segments of the system stack in isolation, such as proving
compiler mappings from a high-level language (HLL) to an ISA or proving
validity of a microarchitectural implementation of an ISA.
This paper makes a case for full-stack MCM verification and provides a
toolflow, TriCheck, capable of verifying that the HLL, compiler, ISA, and
implementation collectively uphold MCM requirements. The work showcases
TriCheck's ability to evaluate a proposed ISA MCM in order to ensure that each
layer and each mapping is correct and complete. Specifically, we apply TriCheck
to the open source RISC-V ISA, seeking to verify accurate, efficient, and legal
compilations from C11. We uncover under-specifications and potential
inefficiencies in the current RISC-V ISA documentation and identify possible
solutions for each. As an example, we find that a RISC-V-compliant
microarchitecture allows 144 outcomes forbidden by C11 to be observed out of
1,701 litmus tests examined. Overall, this paper demonstrates the necessity of
full-stack verification for detecting MCM-related bugs in the hardware-software
stack.Comment: Proceedings of the Twenty-Second International Conference on
Architectural Support for Programming Languages and Operating System
A METHOD AND SYSTEM FOR SELECTION OF PAYMENT CARDS FOR TRANSACTIONS AFTER PRE-STIPULATED TIME PERIODS
The present disclosure relates to a method and system for providing selection of payment cards for transactions after pre-stipulated time periods. The method includes generating a universal token for a consumer, such that the universal token is mapped to the consumer’s payment cards. Thereafter, the universal token is utilised for a transaction. In order for the transaction to be authorised, the universal token is authorised by issuing agencies. At an end of the pre-stipulated time period, the consumer may pay for purchases by choosing a payment card from a plurality of payment cards for each transaction made using the universal token. The transaction is ultimately cleared after the pre-stipulated time period using the payment card chosen by the consumer
The GPU vs Phi Debate: Risk Analytics Using Many-Core Computing
The risk of reinsurance portfolios covering globally occurring natural
catastrophes, such as earthquakes and hurricanes, is quantified by employing
simulations. These simulations are computationally intensive and require large
amounts of data to be processed. The use of many-core hardware accelerators,
such as the Intel Xeon Phi and the NVIDIA Graphics Processing Unit (GPU), are
desirable for achieving high-performance risk analytics. In this paper, we set
out to investigate how accelerators can be employed in risk analytics, focusing
on developing parallel algorithms for Aggregate Risk Analysis, a simulation
which computes the Probable Maximum Loss of a portfolio taking both primary and
secondary uncertainties into account. The key result is that both hardware
accelerators are useful in different contexts; without taking data transfer
times into account the Phi had lowest execution times when used independently
and the GPU along with a host in a hybrid platform yielded best performance.Comment: A modified version of this article is accepted to the Computers and
Electrical Engineering Journal under the title - "The Hardware Accelerator
Debate: A Financial Risk Case Study Using Many-Core Computing"; Blesson
Varghese, "The Hardware Accelerator Debate: A Financial Risk Case Study Using
Many-Core Computing," Computers and Electrical Engineering, 201
A Wait-free Multi-word Atomic (1,N) Register for Large-scale Data Sharing on Multi-core Machines
We present a multi-word atomic (1,N) register for multi-core machines
exploiting Read-Modify-Write (RMW) instructions to coordinate the writer and
the readers in a wait-free manner. Our proposal, called Anonymous Readers
Counting (ARC), enables large-scale data sharing by admitting up to
concurrent readers on off-the-shelf 64-bits machines, as opposed to the most
advanced RMW-based approach which is limited to 58 readers. Further, ARC avoids
multiple copies of the register content when accessing it---this affects
classical register's algorithms based on atomic read/write operations on single
words. Thus it allows for higher scalability with respect to the register size.
Moreover, ARC explicitly reduces improves performance via a proper limitation
of RMW instructions in case of read operations, and by supporting constant time
for read operations and amortized constant time for write operations. A proof
of correctness of our register algorithm is also provided, together with
experimental data for a comparison with literature proposals. Beyond assessing
ARC on physical platforms, we carry out as well an experimentation on
virtualized infrastructures, which shows the resilience of wait-free
synchronization as provided by ARC with respect to CPU-steal times, proper of
more modern paradigms such as cloud computing.Comment: non
Containers and Reproducibility in Scientific Research
Numerical reproducibility has received increased emphasis in the scientific community. One reason that makes scientific research difficult to repeat is that different computing platforms calculate mathematical operations differently. Software containers have been shown to improve reproducibility in some instances and provide a convenient way to deploy applications in a variety of computing environments. However, there are software patterns or idioms that produce inconsistent results because mathematical operations are performed in different orders in different environments resulting in reproducibility errors. The performance of software in containers and the performance of software that improves numeric reproducibility may be of concern for some scientists. An existing algorithm for reproducible sum reduction was implemented, the runtime performance of this implementation was found to be between 0.3x and 0.5x the speed of the non-reproducible sum reduction. Finally, to evaluate the impact of using a container on performance, the runtime performance of the WRF (Weather Research Forecasting) package was tested and found to be 0.98x of the performance in a native Linux environment
- …