801 research outputs found
Temporalized logics and automata for time granularity
Suitable extensions of the monadic second-order theory of k successors have
been proposed in the literature to capture the notion of time granularity. In
this paper, we provide the monadic second-order theories of downward unbounded
layered structures, which are infinitely refinable structures consisting of a
coarsest domain and an infinite number of finer and finer domains, and of
upward unbounded layered structures, which consist of a finest domain and an
infinite number of coarser and coarser domains, with expressively complete and
elementarily decidable temporal logic counterparts.
We obtain such a result in two steps. First, we define a new class of
combined automata, called temporalized automata, which can be proved to be the
automata-theoretic counterpart of temporalized logics, and show that relevant
properties, such as closure under Boolean operations, decidability, and
expressive equivalence with respect to temporal logics, transfer from component
automata to temporalized ones. Then, we exploit the correspondence between
temporalized logics and automata to reduce the task of finding the temporal
logic counterparts of the given theories of time granularity to the easier one
of finding temporalized automata counterparts of them.Comment: Journal: Theory and Practice of Logic Programming Journal Acronym:
TPLP Category: Paper for Special Issue (Verification and Computational Logic)
Submitted: 18 March 2002, revised: 14 Januari 2003, accepted: 5 September
200
Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators
We show that DNN accelerator micro-architectures and their program mappings
represent specific choices of loop order and hardware parallelism for computing
the seven nested loops of DNNs, which enables us to create a formal taxonomy of
all existing dense DNN accelerators. Surprisingly, the loop transformations
needed to create these hardware variants can be precisely and concisely
represented by Halide's scheduling language. By modifying the Halide compiler
to generate hardware, we create a system that can fairly compare these prior
accelerators. As long as proper loop blocking schemes are used, and the
hardware can support mapping replicated loops, many different hardware
dataflows yield similar energy efficiency with good performance. This is
because the loop blocking can ensure that most data references stay on-chip
with good locality and the processing units have high resource utilization. How
resources are allocated, especially in the memory system, has a large impact on
energy and performance. By optimizing hardware resource allocation while
keeping throughput constant, we achieve up to 4.2X energy improvement for
Convolutional Neural Networks (CNNs), 1.6X and 1.8X improvement for Long
Short-Term Memories (LSTMs) and multi-layer perceptrons (MLPs), respectively.Comment: Published as a conference paper at ASPLOS 202
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
A parallel dynamic programming algorithm for unranking set partitions
In this paper, an O(n) parallel algorithm is presented for unranking set partitions in Hutchinsonâs representation. A simple sequential algorithm is derived on the basis of a dynamic programming paradigm. In the parallel algorithm, processing is performed in a dedicated parallel architecture combining certain systolic and associative features. The algorithm consists of two phases. In the first phase, a coefficient table is created by systolic computations. Then, n subsequent elements of a partition codeword are computed, in O(1) time each, through associative search operations
Testability Properties of Divergent Trees
The testability of a class of regular circuits calleddivergent trees is investigated under a functional fault model. Divergent trees include such practical circuits as decoders anddemultiplexers. We prove that uncontrolled divergent trees aretestable with a fixed number of test patterns (C-testable) if andonly if the module function is surjective. Testable controlled treesare also surjective but require sensitizing vectors for errorpropagation. We derive the conditions for testing controlleddivergent trees with a test set whose size is proportional to thenumber of levels p found in the tree (L-testability). By viewing a tree as overlapping arrays of various types, we also deriveconditions for a controlled divergent tree to be C-testable. Typicaldecoders/demultiplexers are shown to only partially satisfy L- andC-testability conditions but a design modification that ensuresL-testability is demonstrated.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/43009/1/10836_2004_Article_146935.pd
Wall Orientation and Shear Stress in the Lattice Boltzmann Model
The wall shear stress is a quantity of profound importance for clinical
diagnosis of artery diseases. The lattice Boltzmann is an easily parallelizable
numerical method of solving the flow problems, but it suffers from errors of
the velocity field near the boundaries which leads to errors in the wall shear
stress and normal vectors computed from the velocity. In this work we present a
simple formula to calculate the wall shear stress in the lattice Boltzmann
model and propose to compute wall normals, which are necessary to compute the
wall shear stress, by taking the weighted mean over boundary facets lying in a
vicinity of a wall element. We carry out several tests and observe an increase
of accuracy of computed normal vectors over other methods in two and three
dimensions. Using the scheme we compute the wall shear stress in an inclined
and bent channel fluid flow and show a minor influence of the normal on the
numerical error, implying that that the main error arises due to a corrupted
velocity field near the staircase boundary. Finally, we calculate the wall
shear stress in the human abdominal aorta in steady conditions using our method
and compare the results with a standard finite volume solver and experimental
data available in the literature. Applications of our ideas in a simplified
protocol for data preprocessing in medical applications are discussed.Comment: 9 pages, 11 figure
- âŠ