4,009 research outputs found
Adapting the interior point method for the solution of LPs on serial, coarse grain parallel and massively parallel computers
In this paper we describe a unified scheme for implementing an interior point algorithm (IPM) over a range of computer architectures. In the inner iteration of the IPM a search direction is computed using Newton's method. Computationally this involves solving a sparse symmetric positive definite (SSPD) system of equations. The choice of direct and indirect methods for the solution of this system, and the design of data structures to take advantage of serial, coarse grain parallel and massively parallel computer architectures, are considered in detail. We put forward arguments as to why integration of the system within a sparse simplex solver is important and outline how the system is designed to achieve this integration
Polly's Polyhedral Scheduling in the Presence of Reductions
The polyhedral model provides a powerful mathematical abstraction to enable
effective optimization of loop nests with respect to a given optimization goal,
e.g., exploiting parallelism. Unexploited reduction properties are a frequent
reason for polyhedral optimizers to assume parallelism prohibiting dependences.
To our knowledge, no polyhedral loop optimizer available in any production
compiler provides support for reductions. In this paper, we show that
leveraging the parallelism of reductions can lead to a significant performance
increase. We give a precise, dependence based, definition of reductions and
discuss ways to extend polyhedral optimization to exploit the associativity and
commutativity of reduction computations. We have implemented a
reduction-enabled scheduling approach in the Polly polyhedral optimizer and
evaluate it on the standard Polybench 3.2 benchmark suite. We were able to
detect and model all 52 arithmetic reductions and achieve speedups up to
2.21 on a quad core machine by exploiting the multidimensional
reduction in the BiCG benchmark.Comment: Presented at the IMPACT15 worksho
Optimizing I/O for Big Array Analytics
Big array analytics is becoming indispensable in answering important
scientific and business questions. Most analysis tasks consist of multiple
steps, each making one or multiple passes over the arrays to be analyzed and
generating intermediate results. In the big data setting, I/O optimization is a
key to efficient analytics. In this paper, we develop a framework and
techniques for capturing a broad range of analysis tasks expressible in
nested-loop forms, representing them in a declarative way, and optimizing their
I/O by identifying sharing opportunities. Experiment results show that our
optimizer is capable of finding execution plans that exploit nontrivial I/O
sharing opportunities with significant savings.Comment: VLDB201
Synthesis Optimization on Galois-Field Based Arithmetic Operators for Rijndael Cipher
A series of experiments has been conducted to show that FPGA synthesis of Galois-Field (GF) based arithmetic operators can be optimized automatically to improve Rijndael Cipher throughput. Moreover, it has been demonstrated that efficiency improvement in GF operators does not directly correspond to the system performance at application level. The experiments were motivated by so many research works that focused on improving performance of GF operators. Each of the variants has the most efficient form in either time (fastest) or space (smallest occupied area) when implemented in FPGA chips. In fact, GF operators are not utilized individually, but rather integrated one to the others to implement algorithms. Contribution of this paper is to raise issue on GF-based application performance and suggest alternative aspects that potentially affect it. Instead of focusing on GF operator efficiency, system characteristics are worth considered in optimizing application performance
- …