3,918 research outputs found

    Optimal algorithms for sensitivity analysis in associative multiplication problems

    Get PDF
    We consider efficient ways of determining the sensitivity of a product to changes in individual factors. The task is motivated by several interesting combinatorial and numeric problems which can be given a unified formulation as the problem of finding the (associative) product of N objects. Both deterministic and probabilistic changes to the factors are considered. Algorithms for two kinds of deterministic variation schemes are considered. Nontrivial lower bounds are obtained which demonstrate the algorithms to be optimal. For probabilistic choice of the parameter to be varied, it is shown that optimal ordered binary search trees or Huffman trees determine the optimal strategies. A number of unsolved are posed.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/24498/1/0000775.pd

    Asymptotic Expansions for Stationary Distributions of Perturbed Semi-Markov Processes

    Full text link
    New algorithms for computing of asymptotic expansions for stationary distributions of nonlinearly perturbed semi-Markov processes are presented. The algorithms are based on special techniques of sequential phase space reduction, which can be applied to processes with asymptotically coupled and uncoupled finite phase spaces.Comment: 83 page

    On the Complexity of the Cayley Semigroup Membership Problem

    Get PDF
    We investigate the complexity of deciding, given a multiplication table representing a semigroup S, a subset X of S and an element t of S, whether t can be expressed as a product of elements of X. It is well-known that this problem is {NL}-complete and that the more general Cayley groupoid membership problem, where the multiplication table is not required to be associative, is {P}-complete. For groups, the problem can be solved in deterministic log-space which raised the question of determining the exact complexity of this variant. Barrington, Kadau, Lange and McKenzie showed that for Abelian groups and for certain solvable groups, the problem is contained in the complexity class {FOLL} and they concluded that these variants are not hard for any complexity class containing {Parity}. The more general case of arbitrary groups remained open. In this work, we show that for both groups and for commutative semigroups, the problem is solvable in {qAC}^0 (quasi-polynomial size circuits of constant depth with unbounded fan-in) and conclude that these variants are also not hard for any class containing {Parity}. Moreover, we prove that {NL}-completeness already holds for the classes of 0-simple semigroups and nilpotent semigroups. Together with our results on groups and commutative semigroups, we prove the existence of a natural class of finite semigroups which generates a variety of finite semigroups with {NL}-complete Cayley semigroup membership, while the Cayley semigroup membership problem for the class itself is not {NL}-hard. We also discuss applications of our technique to {FOLL}

    An investigation into adaptive power reduction techniques for neural hardware

    No full text
    In light of the growing applicability of Artificial Neural Network (ANN) in the signal processing field [1] and the present thrust of the semiconductor industry towards lowpower SOCs for mobile devices [2], the power consumption of ANN hardware has become a very important implementation issue. Adaptability is a powerful and useful feature of neural networks. All current approaches for low-power ANN hardware techniques are ‘non-adaptive’ with respect to the power consumption of the network (i.e. power-reduction is not an objective of the adaptation/learning process). In the research work presented in this thesis, investigations on possible adaptive power reduction techniques have been carried out, which attempt to exploit the adaptability of neural networks in order to reduce the power consumption. Three separate approaches for such adaptive power reduction are proposed: adaptation of size, adaptation of network weights and adaptation of calculation precision. Initial case studies exhibit promising results with significantpower reduction

    GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU

    Full text link
    High-performance implementations of graph algorithms are challenging to implement on new parallel hardware such as GPUs because of three challenges: (1) the difficulty of coming up with graph building blocks, (2) load imbalance on parallel hardware, and (3) graph problems having low arithmetic intensity. To address some of these challenges, GraphBLAS is an innovative, on-going effort by the graph analytics community to propose building blocks based on sparse linear algebra, which will allow graph algorithms to be expressed in a performant, succinct, composable and portable manner. In this paper, we examine the performance challenges of a linear-algebra-based approach to building graph frameworks and describe new design principles for overcoming these bottlenecks. Among the new design principles is exploiting input sparsity, which allows users to write graph algorithms without specifying push and pull direction. Exploiting output sparsity allows users to tell the backend which values of the output in a single vectorized computation they do not want computed. Load-balancing is an important feature for balancing work amongst parallel workers. We describe the important load-balancing features for handling graphs with different characteristics. The design principles described in this paper have been implemented in "GraphBLAST", the first high-performance linear algebra-based graph framework on NVIDIA GPUs that is open-source. The results show that on a single GPU, GraphBLAST has on average at least an order of magnitude speedup over previous GraphBLAS implementations SuiteSparse and GBTL, comparable performance to the fastest GPU hardwired primitives and shared-memory graph frameworks Ligra and Gunrock, and better performance than any other GPU graph framework, while offering a simpler and more concise programming model.Comment: 50 pages, 14 figures, 14 table

    Demystifying the Performance of HPC Scientific Applications on NVM-based Memory Systems

    Full text link
    The emergence of high-density byte-addressable non-volatile memory (NVM) is promising to accelerate data- and compute-intensive applications. Current NVM technologies have lower performance than DRAM and, thus, are often paired with DRAM in a heterogeneous main memory. Recently, byte-addressable NVM hardware becomes available. This work provides a timely evaluation of representative HPC applications from the "Seven Dwarfs" on NVM-based main memory. Our results quantify the effectiveness of DRAM-cached-NVM for accelerating HPC applications and enabling large problems beyond the DRAM capacity. On uncached-NVM, HPC applications exhibit three tiers of performance sensitivity, i.e., insensitive, scaled, and bottlenecked. We identify write throttling and concurrency control as the priorities in optimizing applications. We highlight that concurrency change may have a diverging effect on read and write accesses in applications. Based on these findings, we explore two optimization approaches. First, we provide a prediction model that uses datasets from a small set of configurations to estimate performance at various concurrency and data sizes to avoid exhaustive search in the configuration space. Second, we demonstrate that write-aware data placement on uncached-NVM could achieve 22x performance improvement with a 60% reduction in DRAM usage.Comment: 34th IEEE International Parallel and Distributed Processing Symposium (IPDPS2020
    • …
    corecore