2,656 research outputs found
C++ programming language for an abstract massively parallel SIMD architecture
The aim of this work is to define and implement an extended C++ language to
support the SIMD programming paradigm. The C++ programming language has been
extended to express all the potentiality of an abstract SIMD machine consisting
of a central Control Processor and a N-dimensional toroidal array of Numeric
Processors. Very few extensions have been added to the standard C++ with the
goal of minimising the effort for the programmer in learning a new language and
to keep very high the performance of the compiled code. The proposed language
has been implemented as a porting of the GNU C++ Compiler on a SIMD
supercomputer.Comment: 10 page
Adapting the interior point method for the solution of linear programs on high performance computers
In this paper we describe a unified algorithmic framework for the interior point method (IPM) of solving Linear Programs (LPs) which allows us to adapt it over a range of high performance computer architectures. We set out the reasons as to why IPM makes better use of high performance computer architecture than the sparse simplex method. In the inner iteration of the IPM a search direction is computed using Newton or higher order methods. Computationally this involves solving a sparse symmetric positive definite (SSPD) system of equations. The choice of direct and indirect methods for the solution of this system and the design of data structures to take advantage of coarse grain parallel and massively parallel computer architectures are considered in detail. Finally, we present experimental results of solving NETLIB test problems on examples of these architectures and put forward arguments as to why integration of the system within sparse simplex is beneficial
Content addressable memory project
A parameterized version of the tree processor was designed and tested (by simulation). The leaf processor design is 90 percent complete. We expect to complete and test a combination of tree and leaf cell designs in the next period. Work is proceeding on algorithms for the computer aided manufacturing (CAM), and once the design is complete we will begin simulating algorithms for large problems. The following topics are covered: (1) the practical implementation of content addressable memory; (2) design of a LEAF cell for the Rutgers CAM architecture; (3) a circuit design tool user's manual; and (4) design and analysis of efficient hierarchical interconnection networks
Architecture and Design of Medical Processor Units for Medical Networks
This paper introduces analogical and deductive methodologies for the design
medical processor units (MPUs). From the study of evolution of numerous earlier
processors, we derive the basis for the architecture of MPUs. These specialized
processors perform unique medical functions encoded as medical operational
codes (mopcs). From a pragmatic perspective, MPUs function very close to CPUs.
Both processors have unique operation codes that command the hardware to
perform a distinct chain of subprocesses upon operands and generate a specific
result unique to the opcode and the operand(s). In medical environments, MPU
decodes the mopcs and executes a series of medical sub-processes and sends out
secondary commands to the medical machine. Whereas operands in a typical
computer system are numerical and logical entities, the operands in medical
machine are objects such as such as patients, blood samples, tissues, operating
rooms, medical staff, medical bills, patient payments, etc. We follow the
functional overlap between the two processes and evolve the design of medical
computer systems and networks.Comment: 17 page
Massively Parallel Computing at the Large Hadron Collider up to the HL-LHC
As the Large Hadron Collider (LHC) continues its upward progression in energy
and luminosity towards the planned High-Luminosity LHC (HL-LHC) in 2025, the
challenges of the experiments in processing increasingly complex events will
also continue to increase. Improvements in computing technologies and
algorithms will be a key part of the advances necessary to meet this challenge.
Parallel computing techniques, especially those using massively parallel
computing (MPC), promise to be a significant part of this effort. In these
proceedings, we discuss these algorithms in the specific context of a
particularly important problem: the reconstruction of charged particle tracks
in the trigger algorithms in an experiment, in which high computing performance
is critical for executing the track reconstruction in the available time. We
discuss some areas where parallel computing has already shown benefits to the
LHC experiments, and also demonstrate how a MPC-based trigger at the CMS
experiment could not only improve performance, but also extend the reach of the
CMS trigger system to capture events which are currently not practical to
reconstruct at the trigger level.Comment: 14 pages, 6 figures. Proceedings of 2nd International Summer School
on Intelligent Signal Processing for Frontier Research and Industry
(INFIERI2014), to appear in JINST. Revised version in response to referee
comment
Adapting the interior point method for the solution of LPs on serial, coarse grain parallel and massively parallel computers
In this paper we describe a unified scheme for implementing an interior point algorithm (IPM) over a range of computer architectures. In the inner iteration of the IPM a search direction is computed using Newton's method. Computationally this involves solving a sparse symmetric positive definite (SSPD) system of equations. The choice of direct and indirect methods for the solution of this system, and the design of data structures to take advantage of serial, coarse grain parallel and massively parallel computer architectures, are considered in detail. We put forward arguments as to why integration of the system within a sparse simplex solver is important and outline how the system is designed to achieve this integration
Distributed pre-computation for a cryptanalytic time-memory trade-off
Cryptanalytic tables often play a critical role in decryption efforts for ciphers where the key is not known. Using a cryptanalytic table allows a time-memory tradeoff attack in which disk space or physical memory is traded for a shorter decryption time. For any N key cryptosystem, potential keys are generated and stored in a lookup table, thus reducing the time it takes to perform cryptanalysis of future keys and the space required to store them. The success rate of these lookup tables varies with the size of the key space, but can be calculated based on the number of keys and the length of the chains used within the table. The up-front cost of generating the tables is typically ignored when calculating cryptanalysis time, as the work is assumed to have already been performed. As computers move from 32 bit to 64 bit architectures and as key lengths increase, the time it takes to pre-compute these tables rises exponentially. In some cases, the pre-computation time can no longer be ignored because it becomes infeasible to pre-compute the tables due to the sheer size of the key space. This thesis focuses on parallel techniques for generating pre-computed cryptanalytic tables in a heterogeneous environment and presents a working parallel application that makes use of the Message Passing Interface (MPI). The parallel implementation is designed to divide the workload for pre-computing a single table across multiple heterogeneous nodes with minimal overhead incurred from message passing. The result is an increase in pre-computational speed that is close to that which can be achieved by adding the computational ability of all processors together
Monitoring Cluster on Online Compiler with Ganglia
Ganglia is an open source monitoring system for high performance computing (HPC) that collect both a whole cluster and every nodes status and report to the user. We use Ganglia to monitor our spasi.informatika.lipi.go.id (SPASI), a customized-fedora10-based cluster, for our cluster online compiler, CLAW (cluster access through web). Our experience on using Ganglia shows that Ganglia has a capability to view our cluster status and allow us to track them
- …