Search CORE

801 research outputs found

Temporalized logics and automata for time granularity

Author: Franceschet M.
Montanari A.
Publication venue
Publication date: 17/11/2003
Field of study

Suitable extensions of the monadic second-order theory of k successors have been proposed in the literature to capture the notion of time granularity. In this paper, we provide the monadic second-order theories of downward unbounded layered structures, which are infinitely refinable structures consisting of a coarsest domain and an infinite number of finer and finer domains, and of upward unbounded layered structures, which consist of a finest domain and an infinite number of coarser and coarser domains, with expressively complete and elementarily decidable temporal logic counterparts. We obtain such a result in two steps. First, we define a new class of combined automata, called temporalized automata, which can be proved to be the automata-theoretic counterpart of temporalized logics, and show that relevant properties, such as closure under Boolean operations, decidability, and expressive equivalence with respect to temporal logics, transfer from component automata to temporalized ones. Then, we exploit the correspondence between temporalized logics and automata to reduce the task of finding the temporal logic counterparts of the given theories of time granularity to the easier one of finding temporalized automata counterparts of them.Comment: Journal: Theory and Practice of Logic Programming Journal Acronym: TPLP Category: Paper for Special Issue (Verification and Computational Logic) Submitted: 18 March 2002, revised: 14 Januari 2003, accepted: 5 September 200

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università degli Studi di Udine

Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators

Author: Bell Steven Emberton
Cao Kaidi
Gao Mingyu
Ha Heonjae
Horowitz Mark
Kozyrakis Christos
Liu Qiaoyi
Nayak Ankita
Pu Jing
Raina Priyanka
Setter Jeff Ou
Yang Xuan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/04/2020
Field of study

We show that DNN accelerator micro-architectures and their program mappings represent specific choices of loop order and hardware parallelism for computing the seven nested loops of DNNs, which enables us to create a formal taxonomy of all existing dense DNN accelerators. Surprisingly, the loop transformations needed to create these hardware variants can be precisely and concisely represented by Halide's scheduling language. By modifying the Halide compiler to generate hardware, we create a system that can fairly compare these prior accelerators. As long as proper loop blocking schemes are used, and the hardware can support mapping replicated loops, many different hardware dataflows yield similar energy efficiency with good performance. This is because the loop blocking can ensure that most data references stay on-chip with good locality and the processing units have high resource utilization. How resources are allocated, especially in the memory system, has a large impact on energy and performance. By optimizing hardware resource allocation while keeping throughput constant, we achieve up to 4.2X energy improvement for Convolutional Neural Networks (CNNs), 1.6X and 1.8X improvement for Long Short-Term Memories (LSTMs) and multi-layer perceptrons (MLPs), respectively.Comment: Published as a conference paper at ASPLOS 202

arXiv.org e-Print Archive

Crossref

A bibliography on parallel and vector numerical algorithms

Author: Ortega J. M.
Voigt R. G.
Publication venue
Publication date
Field of study

This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

NASA Technical Reports Server

A parallel dynamic programming algorithm for unranking set partitions

Author: Kokosiński Zbigniew
Publication venue: Czasopismo Techniczne
Publication date: 21/04/2015
Field of study

In this paper, an O(n) parallel algorithm is presented for unranking set partitions in Hutchinson’s representation. A simple sequential algorithm is derived on the basis of a dynamic programming paradigm. In the parallel algorithm, processing is performed in a dedicated parallel architecture combining certain systolic and associative features. The algorithm consists of two phases. In the first phase, a coefficient table is created by systolic computations. Then, n subsequent elements of a partition codeword are computed, in O(1) time each, through associative search operations

Portal Czasopism Naukowych (E-Journals)

Testability Properties of Divergent Trees

Author: Blanton R. D. (Shawn)
Hayes John P. (John Patrick)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/1997
Field of study

The testability of a class of regular circuits calleddivergent trees is investigated under a functional fault model. Divergent trees include such practical circuits as decoders anddemultiplexers. We prove that uncontrolled divergent trees aretestable with a fixed number of test patterns (C-testable) if andonly if the module function is surjective. Testable controlled treesare also surjective but require sensitizing vectors for errorpropagation. We derive the conditions for testing controlleddivergent trees with a test set whose size is proportional to thenumber of levels p found in the tree (L-testability). By viewing a tree as overlapping arrays of various types, we also deriveconditions for a controlled divergent tree to be C-testable. Typicaldecoders/demultiplexers are shown to only partially satisfy L- andC-testability conditions but a design modification that ensuresL-testability is demonstrated.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/43009/1/10836_2004_Article_146935.pd

Deep Blue Documents at the University of Michigan

Wall Orientation and Shear Stress in the Lattice Boltzmann Model

Author: Aidun
Anor
Artoli
Axner
Berger
Bernsdorf
Bonert
Boussel
Bouzidi
Boyd
Carallo
Caro
Cebral
Cecchi
Chopard
Christ
Cosgrove
Dai
d’Humieres
Gohil
Harvey
He
He
Higuera
Jou
Krüger
Krüger
Ku
Ku
Luo
Maciej Matyka
Malek
McNamara
Meyerson
Milner
Nesbitt
Nickolls
Pantos
Papaioannou
Paszkowiak
Patankar
Peng
Peters
Pohl
Pontrelli
Quarteroni
Reneman
Shaaban
Stahl
Tse
Tölke
Xian
Zbigniew Koza
Łukasz Mirosław
Publication venue: 'Elsevier BV'
Publication date: 14/03/2012
Field of study

The wall shear stress is a quantity of profound importance for clinical diagnosis of artery diseases. The lattice Boltzmann is an easily parallelizable numerical method of solving the flow problems, but it suffers from errors of the velocity field near the boundaries which leads to errors in the wall shear stress and normal vectors computed from the velocity. In this work we present a simple formula to calculate the wall shear stress in the lattice Boltzmann model and propose to compute wall normals, which are necessary to compute the wall shear stress, by taking the weighted mean over boundary facets lying in a vicinity of a wall element. We carry out several tests and observe an increase of accuracy of computed normal vectors over other methods in two and three dimensions. Using the scheme we compute the wall shear stress in an inclined and bent channel fluid flow and show a minor influence of the normal on the numerical error, implying that that the main error arises due to a corrupted velocity field near the staircase boundary. Finally, we calculate the wall shear stress in the human abdominal aorta in steady conditions using our method and compare the results with a standard finite volume solver and experimental data available in the literature. Applications of our ideas in a simplified protocol for data preprocessing in medical applications are discussed.Comment: 9 pages, 11 figure

arXiv.org e-Print Archive

Crossref