485 research outputs found

    An O(n) time discrete relaxation architecture for real-time processing of the consistent labeling problem

    Get PDF
    technical reportDiscrete relaxation techniques have proven useful in solving a wide range of problems in digital signal and digital image processing, artificial intelligence, operations research, and machine vision. Much work has been devoted to finding efficient hardware architectures. This paper shows that a conventional hardware design for a Discrete Relaxation Algorithm (DRA) suffers from 0(n2m3 ) time complexity and Oinhn2) space complexity. By reformulating DRA into a parallel computational tree and using a multiple tree-root pipelining scheme, time complexity is reduced to O(nm), while the space complexity is reduced by a factor of 2. For certain relaxation processing, the space complexity can even be decreased to O(nm). Furthermore, a technique for dynamic configuring an architectural wavefront is used which leads to an O(n) time highly configurable DRA3 architecture

    Finite automata and composite realisations.

    Get PDF
    SIGLEAvailable from British Library Document Supply Centre- DSC:D34350/81 / BLDSC - British Library Document Supply CentreGBUnited Kingdo

    Emerging trends proceedings of the 17th International Conference on Theorem Proving in Higher Order Logics: TPHOLs 2004

    Get PDF
    technical reportThis volume constitutes the proceedings of the Emerging Trends track of the 17th International Conference on Theorem Proving in Higher Order Logics (TPHOLs 2004) held September 14-17, 2004 in Park City, Utah, USA. The TPHOLs conference covers all aspects of theorem proving in higher order logics as well as related topics in theorem proving and verification. There were 42 papers submitted to TPHOLs 2004 in the full research cate- gory, each of which was refereed by at least 3 reviewers selected by the program committee. Of these submissions, 21 were accepted for presentation at the con- ference and publication in volume 3223 of Springer?s Lecture Notes in Computer Science series. In keeping with longstanding tradition, TPHOLs 2004 also offered a venue for the presentation of work in progress, where researchers invite discussion by means of a brief introductory talk and then discuss their work at a poster session. The work-in-progress papers are held in this volume, which is published as a 2004 technical report of the School of Computing at the University of Utah

    Efficient Computation and FPGA implementation of Fully Homomorphic Encryption with Cloud Computing Significance

    Get PDF
    Homomorphic Encryption provides unique security solution for cloud computing. It ensures not only that data in cloud have confidentiality but also that data processing by cloud server does not compromise data privacy. The Fully Homomorphic Encryption (FHE) scheme proposed by Lopez-Alt, Tromer, and Vaikuntanathan (LTV), also known as NTRU(Nth degree truncated polynomial ring) based method, is considered one of the most important FHE methods suitable for practical implementation. In this thesis, an efficient algorithm and architecture for LTV Fully Homomorphic Encryption is proposed. Conventional linear feedback shift register (LFSR) structure is expanded and modified for performing the truncated polynomial ring multiplication in LTV scheme in parallel. Novel and efficient modular multiplier, modular adder and modular subtractor are proposed to support high speed processing of LFSR operations. In addition, a family of special moduli are selected for high speed computation of modular operations. Though the area keeps the complexity of O(Nn^2) with no advantage in circuit level. The proposed architecture effectively reduces the time complexity from O(N log N) to linear time, O(N), compared to the best existing works. An FPGA implementation of the proposed architecture for LTV FHE is achieved and demonstrated. An elaborate comparison of the existing methods and the proposed work is presented, which shows the proposed work gains significant speed up over existing works

    Symmetry in Finite Combinatorial Objects: Scalable Methods and Applications.

    Full text link
    Symmetries of combinatorial objects are known to complicate search algorithms, but such obstacles can often be removed by detecting symmetries early and discarding symmetric subproblems. Canonical labeling of combinatorial objects facilitates easy equivalence checking through quick matching. All existing canonical-labeling software also finds symmetries, but the fastest symmetry-finding software does not perform canonical labeling. In this thesis, we describe highly scalable symmetry-detection algorithms for two widely-used combinatorial objects: graphs and Boolean functions. Our algorithms are based on a decision tree that combines elements of group-theoretic computation with branching and backtracking search. Moreover, we contrast the search for graph symmetries and a canonical labeling to dissect typical algorithms and identify their similarities and differences. We develop a novel approach to graph canonical labeling where symmetries are found first and then used to speed up the canonical-labeling routines. Empirical results are given for graphs with millions of vertices and Boolean functions with hundreds of I/Os, where our algorithms can often find all symmetry group generators or a canonical labeling in seconds.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/100003/1/hadik_1.pd

    Coding approaches to fault tolerance in dynamic systems

    Get PDF
    Also issued as Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (p. 189-196).Sponsored through a contract with Sanders, A Lockheed Martin Company.Christoforos N. Hadjicostis

    On the test complexity of VLSI-systems

    Get PDF
    The wide use of computers in various fields of society makes it clear - computers must be more and more reliable. The reliability of a computer depends strongly upon testing its basic components - VLSI systems. Through test one knows whether the VLSI systems have been manufactured properly and behave correctly. Generally speaking, A VLSI system is made up of the sequential circuit part and combinational circuit part. The test generation for sequential circuits is usually much more difficult than that for combinational ones, since the controllability and observability of sequential circuits are poor. In order to overcome the difficulty, some design technique have been developed. By using those techniques a sequential logic can be so designed that its test can be reduced to that for some combinational logics. Hence the key to the test of a VLSI system lies in the test of its combinational logic part. This thesis focuses on the test problem of the combinational part of VLSI systems. The test of a VLSI system includes mainly the generation of a test set and the application of the test set to the system. The test complexity can be classified into the complexity of the test set generation and the complexity of the test set application. The former can be estimated by the computing complexity of generating the test set. The latter is measured by the cardinality of the test set. The test generation approaches can be divided roughly into structural and functional methods. A structural method generates test patterns for a circuit with reference to the concrete logic structure of the circuit, while a functional method produces test patterns for a circuit without reference to the concrete logic structure of the circuit. With the rapid development of VLSI technology the circuit density is increasing dramatically. The test of VLSI systems is becoming increasingly difficult and expensive. Although some techniques such as design for testability, new fault models and new test generation approaches have been proposed to moderate these problems, there is a great need to develop new design methodologies and test approaches. The complexity of test generation and application of a VLSI system is related to the concrete structure of the system. Theory and practical experiences show that it seems to be impossible to find a universal method for treating various VLSI systems efficiently. One of the alternatives is to develop a suitable method for a kind of VLSI systems. This thesis studies extensively the test problems related to tree systems, pseudoexhaustive and pseudorandom testable circuit systems. It develops several techniques for generating optimal test sets for different kinds of circuits.Mit dem zunehmenden Einsatz von VLSI-Systemen sind die Anforderungen an ihre Zuverlässigkeit immer mehr gestiegen. Die Zuverlässigkeit eines VLSI-Systems hängt von seinen Komponenten - hochintegrierten Schaltkreisen - ab. Leider ist der Fertigungsprozeß hochintergrierter Schaltkreise extrem fehleranfällig. Nach inoffiziellen Angaben beträgt die Defektrate für große Schaltkreise bei einem neuen Fertigungsprozeß über 60%. Daher ist ein Test der Schaltkreise unbedingt notwendig. Allerdings beträgt der Aufwand für solche Tests mehr als 25% der Gesamtkosten. Normalerweise enthält ein VLSI-System sowohl kombinatorische als auch sequentielle Schaltkreise. Mit Hilfe von Prüfbussen kann das Testproblem für die sequentiellen Komponenten auf den kombinatorischen Fall zurückgeführt werden. Deshalb spielt der Test von kombinatorischen Schaltkreisen eine große Rolle. Diese Arbeit betrachtet das Testproblem kombinatorischer Schaltkreise. Ein vollständiger Test eines Schaltkreises durch Anlegen aller Eingaben ist in der Praxis fast immer unmöglich. Deswegen müssen Annahmen über die Art der am häufigsten vorkommenden Fehler gemacht werden, die dann in einem Fehlermodell zusammengefaßt werden. Das am häufigsten in der Praxis verwendete Fehlermodell ist das Single-Stuck-at-Fehlermodell. Hier wird angenommen, daß innerhalb des ganzen Schaltkreises höchstens eine Leitung ständig auf einem festen logischen Wert (d.h. 0 oder 1) liegt. Dieses populäre Fehlermodell kann jedoch nicht alle auftretenden Fehler uberdecken. In dieser Arbeit betrachten wir daher zusätzlich das mächtigere Einzel-Zellenfehlermodell. Die Testkosten werden bestimmt durch die Kosten der Testerzeugung und der Testdurchführung. Wir definieren die Testkomplexität eines Schaltkreises S als die minimale Anzahl von Testmustern, die man benötigt, um S nach dem gegebenen Fehlermodell zu prüfen. Das Schwergewicht der vorliegenden Arbeit liegt auf der Untersuchung der Testprobleme bezüglich baumartiger Schaltkreise, pseudoerschöpfend und pseudozufällig testbarer Schaltkreise, sowie auf der Entwicklung von Verfahren zur Erzeugung optimaler Testmustermengen

    FPGA-based Query Acceleration for Non-relational Databases

    Get PDF
    Database management systems are an integral part of today’s everyday life. Trends like smart applications, the internet of things, and business and social networks require applications to deal efficiently with data in various data models close to the underlying domain. Therefore, non-relational database systems provide a wide variety of database models, like graphs and documents. However, current non-relational database systems face performance challenges due to the end of Dennard scaling and therefore performance scaling of CPUs. In the meanwhile, FPGAs have gained traction as accelerators for data management. Our goal is to tackle the performance challenges of non-relational database systems with FPGA acceleration and, at the same time, address design challenges of FPGA acceleration itself. Therefore, we split this thesis up into two main lines of work: graph processing and flexible data processing. Because of the lacking benchmark practices for graph processing accelerators, we propose GraphSim. GraphSim is able to reproduce runtimes of these accelerators based on a memory access model of the approach. Through this simulation environment, we extract three performance-critical accelerator properties: asynchronous graph processing, compressed graph data structure, and multi-channel memory. Since these accelerator properties have not been combined in one system, we propose GraphScale. GraphScale is the first scalable, asynchronous graph processing accelerator working on a compressed graph and outperforms all state-of-the-art graph processing accelerators. Focusing on accelerator flexibility, we propose PipeJSON as the first FPGA-based JSON parser for arbitrary JSON documents. PipeJSON is able to achieve parsing at line-speed, outperforming the fastest, vectorized parsers for CPUs. Lastly, we propose the subgraph query processing accelerator GraphMatch which outperforms state-of-the-art CPU systems for subgraph query processing and is able to flexibly switch queries during runtime in a matter of clock cycles

    Robust and reliable hardware accelerator design through high-level synthesis

    Get PDF
    System-on-chip design is becoming increasingly complex as technology scaling enables more and more functionality on a chip. This scaling-driven complexity has resulted in a variety of reliability and validation challenges including logic bugs, hot spots, wear-out, and soft errors. To make matters worse, as we reach the limits of Dennard scaling, efforts to improve system performance and energy efficiency have resulted in the integration of a wide variety of complex hardware accelerators in SoCs. Thus the challenge is to design complex, custom hardware that is efficient, but also correct and reliable. High-level synthesis shows promise to address the problem of complex hardware design by providing a bridge from the high-productivity software domain to the hardware design process. Much research has been done on high-level synthesis efficiency optimizations. This dissertation shows that high-level synthesis also has the power to address validation and reliability challenges through three automated solutions targeting three key stages in the hardware design and use cycle: pre-silicon debugging, post-silicon validation, and post-deployment error detection. Our solution for rapid pre-silicon debugging of accelerator designs is hybrid tracing: comparing a datapath-level trace of hardware execution with a reference software implementation at a fine temporal and spatial granularity to detect logic bugs. An integrated backtrace process delivers source-code meaning to the hardware designer, pinpointing the location of bug activation and providing a strong hint for potential bug fixes. Experimental results show that we are able to detect and aid in localization of logic bugs from both C/C++ specifications as well as the high-level synthesis engine itself. A variation of this solution tailored for rapid post-silicon validation of accelerator designs is hybrid hashing: inserting signature generation logic in a hardware design to create a heavily compressed signature stream that captures the internal behavior of the design at a fine temporal and spatial granularity for comparison with a reference set of signatures generated by high-level simulation to detect bugs. Using hybrid hashing, we demonstrate an improvement in error detection latency (time elapsed from when a bug is activated to when it manifests as an observable failure) of two orders of magnitude and a threefold improvement in bug coverage compared to traditional post-silicon validation techniques. Hybrid hashing also uncovered previously unknown bugs in the CHStone benchmark suite, which is widely used by the HLS community. Hybrid hashing incurs less than 10% area overhead for the accelerator it validates with negligible performance impact, and we also introduce techniques to minimize any possible intrusiveness introduced by hybrid hashing. Finally, our solution for post-deployment error detection is modulo-3 shadow datapaths: performing lightweight shadow computations in modulo-3 space for each main computation. We leverage the binding and scheduling flexibility of high-level synthesis to detect control errors through diverse binding and minimize area cost through intelligent checkpoint scheduling and modulo-3 reducer sharing. We introduce logic and dataflow optimizations to further reduce cost. We evaluated our technique with 12 high-level synthesis benchmarks from the arithmetic-oriented PolyBench benchmark suite using FPGA emulated netlist-level error injection. We observe coverages of 99.1% for stuck-at faults, 99.5% for soft errors, and 99.6% for timing errors with a 25.7% area cost and negligible performance impact. Leveraging a mean error detection latency of 12.75 cycles (4150× faster than end result check) for soft errors, we also explore a rollback recovery method with an additional area cost of 28.0%, observing a 175× increase in reliability against soft errors. While the area cost of our modulo shadow datapaths is much better than traditional modular redundancy approaches, we want to maximize the applicability of our approach. To this end, we take a dive into gate-level architectural design for modulo arithmetic functional units. We introduce new low-cost gate-level architectures for all four key functional units in a shadow datapath: (1) a modulo reduction algorithm that generates architectures consisting entirely of full-adder standard cells; (2) minimum-area modulo adder and subtractor architectures; (3) an array-based modulo multiplier design; and (4) a modulo equality comparator that handles the residue encoding produced by the above. We compare our new functional units to the previous state-of-the-art approach, observing a 12.5% reduction in area and a 47.1% reduction in delay for a 32-bit mod-3 reducer; that our reducer costs, which tend to dominate shadow datapath costs, do not increase with larger modulo bases; and that for modulo-15 and above, all of our modulo functional units have better area and delay then their previous counterparts. We also demonstrate the practicality of our approach by designing a custom shadow datapath for error detection of a multiply accumulate functional unit, which has an area overhead of only 12% for a 32-bit main datapath and 2-bit modulo-3 shadow datapath. Taking our reliability solution further, we look at the bigger picture of modulo shadow datapaths combined with other solutions at different abstraction layers, looking to answer the following question: Given all of the existing reliability improvement techniques for application-specific hardware accelerators, what techniques or combinations of techniques are the most cost-effective? To answer this question, we consider a soft error fault model and empirically evaluate cross-layer combinations of ABFT, EDDI, and modulo shadow datapaths in the context of high-level synthesis; parity in logic synthesis; and flip-flop hardening techniques at the physical design level. We measure the reliability benefit and area, energy, and performance cost of each technique individually and for interesting technique combinations through FPGA emulated fault-injection and physical place-and-route. Our results show that a combination of parity and flip-flop hardening is the most cost-effective in general with an average 1.3% area cost and 5.7% energy cost for a 50× improvement in reliability. The addition of modulo-3 shadow datapaths to this combination provides some additional benefit for some applications, even without considering its combinational logic, stuck-at fault, and timing error protection benefits. We also observe new efficiency challenges for ABFT and EDDI when used for hardware accelerators

    Algebraic approach to hardware description and verification

    Get PDF
    • …
    corecore