3,918 research outputs found
Optimal algorithms for sensitivity analysis in associative multiplication problems
We consider efficient ways of determining the sensitivity of a product to changes in individual factors. The task is motivated by several interesting combinatorial and numeric problems which can be given a unified formulation as the problem of finding the (associative) product of N objects. Both deterministic and probabilistic changes to the factors are considered. Algorithms for two kinds of deterministic variation schemes are considered. Nontrivial lower bounds are obtained which demonstrate the algorithms to be optimal. For probabilistic choice of the parameter to be varied, it is shown that optimal ordered binary search trees or Huffman trees determine the optimal strategies. A number of unsolved are posed.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/24498/1/0000775.pd
Asymptotic Expansions for Stationary Distributions of Perturbed Semi-Markov Processes
New algorithms for computing of asymptotic expansions for stationary
distributions of nonlinearly perturbed semi-Markov processes are presented. The
algorithms are based on special techniques of sequential phase space reduction,
which can be applied to processes with asymptotically coupled and uncoupled
finite phase spaces.Comment: 83 page
On the Complexity of the Cayley Semigroup Membership Problem
We investigate the complexity of deciding, given a multiplication table representing a semigroup S, a subset X of S and an element t of S, whether t can be expressed as a product of elements of X. It is well-known that this problem is {NL}-complete and that the more general Cayley groupoid membership problem, where the multiplication table is not required to be associative, is {P}-complete. For groups, the problem can be solved in deterministic log-space which raised the question of determining the exact complexity of this variant. Barrington, Kadau, Lange and McKenzie showed that for Abelian groups and for certain solvable groups, the problem is contained in the complexity class {FOLL} and they concluded that these variants are not hard for any complexity class containing {Parity}. The more general case of arbitrary groups remained open. In this work, we show that for both groups and for commutative semigroups, the problem is solvable in {qAC}^0 (quasi-polynomial size circuits of constant depth with unbounded fan-in) and conclude that these variants are also not hard for any class containing {Parity}. Moreover, we prove that {NL}-completeness already holds for the classes of 0-simple semigroups and nilpotent semigroups. Together with our results on groups and commutative semigroups, we prove the existence of a natural class of finite semigroups which generates a variety of finite semigroups with {NL}-complete Cayley semigroup membership, while the Cayley semigroup membership problem for the class itself is not {NL}-hard. We also discuss applications of our technique to {FOLL}
An investigation into adaptive power reduction techniques for neural hardware
In light of the growing applicability of Artificial Neural Network (ANN) in the signal processing field [1] and the present thrust of the semiconductor industry towards lowpower SOCs for mobile devices [2], the power consumption of ANN hardware has become a very important implementation issue. Adaptability is a powerful and useful feature of neural networks. All current approaches for low-power ANN hardware techniques are ‘non-adaptive’ with respect to the power consumption of the network (i.e. power-reduction is not an objective of the adaptation/learning process). In the research work presented in this thesis, investigations on possible adaptive power reduction techniques have been carried out, which attempt to exploit the adaptability of neural networks in order to reduce the power consumption. Three separate approaches for such adaptive power reduction are proposed: adaptation of size, adaptation of network weights and adaptation of calculation precision. Initial case studies exhibit promising results with significantpower reduction
GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU
High-performance implementations of graph algorithms are challenging to
implement on new parallel hardware such as GPUs because of three challenges:
(1) the difficulty of coming up with graph building blocks, (2) load imbalance
on parallel hardware, and (3) graph problems having low arithmetic intensity.
To address some of these challenges, GraphBLAS is an innovative, on-going
effort by the graph analytics community to propose building blocks based on
sparse linear algebra, which will allow graph algorithms to be expressed in a
performant, succinct, composable and portable manner. In this paper, we examine
the performance challenges of a linear-algebra-based approach to building graph
frameworks and describe new design principles for overcoming these bottlenecks.
Among the new design principles is exploiting input sparsity, which allows
users to write graph algorithms without specifying push and pull direction.
Exploiting output sparsity allows users to tell the backend which values of the
output in a single vectorized computation they do not want computed.
Load-balancing is an important feature for balancing work amongst parallel
workers. We describe the important load-balancing features for handling graphs
with different characteristics. The design principles described in this paper
have been implemented in "GraphBLAST", the first high-performance linear
algebra-based graph framework on NVIDIA GPUs that is open-source. The results
show that on a single GPU, GraphBLAST has on average at least an order of
magnitude speedup over previous GraphBLAS implementations SuiteSparse and GBTL,
comparable performance to the fastest GPU hardwired primitives and
shared-memory graph frameworks Ligra and Gunrock, and better performance than
any other GPU graph framework, while offering a simpler and more concise
programming model.Comment: 50 pages, 14 figures, 14 table
Demystifying the Performance of HPC Scientific Applications on NVM-based Memory Systems
The emergence of high-density byte-addressable non-volatile memory (NVM) is
promising to accelerate data- and compute-intensive applications. Current NVM
technologies have lower performance than DRAM and, thus, are often paired with
DRAM in a heterogeneous main memory. Recently, byte-addressable NVM hardware
becomes available. This work provides a timely evaluation of representative HPC
applications from the "Seven Dwarfs" on NVM-based main memory. Our results
quantify the effectiveness of DRAM-cached-NVM for accelerating HPC applications
and enabling large problems beyond the DRAM capacity. On uncached-NVM, HPC
applications exhibit three tiers of performance sensitivity, i.e., insensitive,
scaled, and bottlenecked. We identify write throttling and concurrency control
as the priorities in optimizing applications. We highlight that concurrency
change may have a diverging effect on read and write accesses in applications.
Based on these findings, we explore two optimization approaches. First, we
provide a prediction model that uses datasets from a small set of
configurations to estimate performance at various concurrency and data sizes to
avoid exhaustive search in the configuration space. Second, we demonstrate that
write-aware data placement on uncached-NVM could achieve x performance
improvement with a 60% reduction in DRAM usage.Comment: 34th IEEE International Parallel and Distributed Processing Symposium
(IPDPS2020
- …