801 research outputs found

    Temporalized logics and automata for time granularity

    Full text link
    Suitable extensions of the monadic second-order theory of k successors have been proposed in the literature to capture the notion of time granularity. In this paper, we provide the monadic second-order theories of downward unbounded layered structures, which are infinitely refinable structures consisting of a coarsest domain and an infinite number of finer and finer domains, and of upward unbounded layered structures, which consist of a finest domain and an infinite number of coarser and coarser domains, with expressively complete and elementarily decidable temporal logic counterparts. We obtain such a result in two steps. First, we define a new class of combined automata, called temporalized automata, which can be proved to be the automata-theoretic counterpart of temporalized logics, and show that relevant properties, such as closure under Boolean operations, decidability, and expressive equivalence with respect to temporal logics, transfer from component automata to temporalized ones. Then, we exploit the correspondence between temporalized logics and automata to reduce the task of finding the temporal logic counterparts of the given theories of time granularity to the easier one of finding temporalized automata counterparts of them.Comment: Journal: Theory and Practice of Logic Programming Journal Acronym: TPLP Category: Paper for Special Issue (Verification and Computational Logic) Submitted: 18 March 2002, revised: 14 Januari 2003, accepted: 5 September 200

    Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators

    Full text link
    We show that DNN accelerator micro-architectures and their program mappings represent specific choices of loop order and hardware parallelism for computing the seven nested loops of DNNs, which enables us to create a formal taxonomy of all existing dense DNN accelerators. Surprisingly, the loop transformations needed to create these hardware variants can be precisely and concisely represented by Halide's scheduling language. By modifying the Halide compiler to generate hardware, we create a system that can fairly compare these prior accelerators. As long as proper loop blocking schemes are used, and the hardware can support mapping replicated loops, many different hardware dataflows yield similar energy efficiency with good performance. This is because the loop blocking can ensure that most data references stay on-chip with good locality and the processing units have high resource utilization. How resources are allocated, especially in the memory system, has a large impact on energy and performance. By optimizing hardware resource allocation while keeping throughput constant, we achieve up to 4.2X energy improvement for Convolutional Neural Networks (CNNs), 1.6X and 1.8X improvement for Long Short-Term Memories (LSTMs) and multi-layer perceptrons (MLPs), respectively.Comment: Published as a conference paper at ASPLOS 202

    A bibliography on parallel and vector numerical algorithms

    Get PDF
    This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

    A parallel dynamic programming algorithm for unranking set partitions

    Get PDF
    In this paper, an O(n) parallel algorithm is presented for unranking set partitions in Hutchinson’s representation. A simple sequential algorithm is derived on the basis of a dynamic programming paradigm. In the parallel algorithm, processing is performed in a dedicated parallel architecture combining certain systolic and associative features. The algorithm consists of two phases. In the first phase, a coefficient table is created by systolic computations. Then, n subsequent elements of a partition codeword are computed, in O(1) time each, through associative search operations

    Testability Properties of Divergent Trees

    Full text link
    The testability of a class of regular circuits calleddivergent trees is investigated under a functional fault model. Divergent trees include such practical circuits as decoders anddemultiplexers. We prove that uncontrolled divergent trees aretestable with a fixed number of test patterns (C-testable) if andonly if the module function is surjective. Testable controlled treesare also surjective but require sensitizing vectors for errorpropagation. We derive the conditions for testing controlleddivergent trees with a test set whose size is proportional to thenumber of levels p found in the tree (L-testability). By viewing a tree as overlapping arrays of various types, we also deriveconditions for a controlled divergent tree to be C-testable. Typicaldecoders/demultiplexers are shown to only partially satisfy L- andC-testability conditions but a design modification that ensuresL-testability is demonstrated.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/43009/1/10836_2004_Article_146935.pd

    Wall Orientation and Shear Stress in the Lattice Boltzmann Model

    Full text link
    The wall shear stress is a quantity of profound importance for clinical diagnosis of artery diseases. The lattice Boltzmann is an easily parallelizable numerical method of solving the flow problems, but it suffers from errors of the velocity field near the boundaries which leads to errors in the wall shear stress and normal vectors computed from the velocity. In this work we present a simple formula to calculate the wall shear stress in the lattice Boltzmann model and propose to compute wall normals, which are necessary to compute the wall shear stress, by taking the weighted mean over boundary facets lying in a vicinity of a wall element. We carry out several tests and observe an increase of accuracy of computed normal vectors over other methods in two and three dimensions. Using the scheme we compute the wall shear stress in an inclined and bent channel fluid flow and show a minor influence of the normal on the numerical error, implying that that the main error arises due to a corrupted velocity field near the staircase boundary. Finally, we calculate the wall shear stress in the human abdominal aorta in steady conditions using our method and compare the results with a standard finite volume solver and experimental data available in the literature. Applications of our ideas in a simplified protocol for data preprocessing in medical applications are discussed.Comment: 9 pages, 11 figure
    • 

    corecore