Search CORE

1,842 research outputs found

autoAx: An Automatic Design Space Exploration and Circuit Building Methodology utilizing Libraries of Approximate Components

Author: Hanif Muhammad Abdullah
Mrazek Vojtech
Sekanina Lukas
Shafique Muhammad
Vasicek Zdenek
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/04/2019
Field of study

Approximate computing is an emerging paradigm for developing highly energy-efficient computing systems such as various accelerators. In the literature, many libraries of elementary approximate circuits have already been proposed to simplify the design process of approximate accelerators. Because these libraries contain from tens to thousands of approximate implementations for a single arithmetic operation it is intractable to find an optimal combination of approximate circuits in the library even for an application consisting of a few operations. An open problem is "how to effectively combine circuits from these libraries to construct complex approximate accelerators". This paper proposes a novel methodology for searching, selecting and combining the most suitable approximate circuits from a set of available libraries to generate an approximate accelerator for a given application. To enable fast design space generation and exploration, the methodology utilizes machine learning techniques to create computational models estimating the overall quality of processing and hardware cost without performing full synthesis at the accelerator level. Using the methodology, we construct hundreds of approximate accelerators (for a Sobel edge detector) showing different but relevant tradeoffs between the quality of processing and hardware cost and identify a corresponding Pareto-frontier. Furthermore, when searching for approximate implementations of a generic Gaussian filter consisting of 17 arithmetic operations, the proposed approach allows us to identify approximately

10^3

highly important implementations from

10^{23}

possible solutions in a few hours, while the exhaustive search would take four months on a high-end processor.Comment: Accepted for publication at the Design Automation Conference 2019 (DAC'19), Las Vegas, Nevada, US

arXiv.org e-Print Archive

Crossref

Model-based symbolic design space exploration at the electronic system level: a systematic approach

Author: Neubauer Kai (gnd: 1256385433)
Publication venue: Universität Rostock Rostock
Publication date
Field of study

In this thesis, a novel, fully systematic approach is proposed that addresses the automated design space exploration at the electronic system level. The problem is formulated as multi-objective optimization problem and is encoded symbolically using Answer Set Programming (ASP). Several specialized solvers are tightly coupled as background theories with the foreground ASP solver under the ASP modulo Theories (ASPmT) paradigm. By utilizing the ASPmT paradigm, the search is executed entirely systematically and the disparate synthesis steps can be coupled to explore the search space effectively.In dieser Arbeit wird ein vollständig systematischer Ansatz präsentiert, der sich mit der Entwurfsraumexploration auf der elektronischen Systemebene befasst. Das Problem wird als multikriterielles Optimierungsproblem formuliert und symbolisch mit Hilfe von Answer Set Programming (ASP) kodiert. Spezialisierte Solver sind im Rahmen des ASP modulo Theories (ASPmT) Paradigmas als Hintergrundtheorien eng mit dem ASP Solver gekoppelt. Durch die Verwendung von ASPmT wird die Suche systematisch ausgeführt und die individuellen Schritte können gekoppelt werden, um den Suchraum effektiv zu durchsuchen

Rostocker Dokumentenserver

Analyzing and Predicting Processor Vulnerability to Soft Errors Using Statistical Techniques

Author: Duan Lide
Publication venue: LSU Digital Commons
Publication date: 01/01/2011
Field of study

The shrinking processor feature size, lower threshold voltage and increasing on-chip transistor density make current processors highly vulnerable to soft errors. Architectural Vulnerability Factor (AVF) reflects the probability that a raw soft error eventually causes a visible error in the program output, indicating the processor’s susceptibility to soft errors at architectural level. The awareness of the AVF, both at the early design stage and during program runtime, is greatly useful for designing reliable processors. However, measuring the AVF is extremely costly, resulting in large overheads in hardware, computation, and power. The situation is further exacerbated in a multi-threaded processor environment where resource contention and data sharing exist among different threads. Consequently, predicting the AVF from other easily-measured metrics becomes extraordinarily attractive to computer designers. We propose a series of AVF modeling and prediction works via using advanced statistical techniques. First, we utilize the Boosted Regression Trees (BRT) scheme to dynamically predict the AVF during program execution from a variety of performance metrics. This correlation is generalized to be across different workloads, program phases, and processor configurations on a single-threaded superscalar processor. Second, the AVF prediction is extended to multi-threaded processors where the inter-thread resource contention shows significant and non-uniform impacts on different programs; we propose a two-level predictive mechanism using BRT as building blocks to characterize the contention behavior. Finally, we employ a rule search strategy named Patient Rule Induction Method (PRIM) to explore a large processor design space at the early design stage. We are capable of generating selective rules on important configuration parameters. These rules quantify the design space subregion yielding lowest values of the response, thereby providing useful guidelines for designing reliable processors while achieving high performance

Louisiana State University

Spatial CPU-GPU data structures for interactive rendering of large particle data

Author: Geringer Sergej
Publication venue
Publication date: 01/01/2017
Field of study

In this work, I investigate the interactive visualization of arbitrarily large particle data sets which ft into system memory, but not into GPU memory. With conventional rendering techniques, interactivity of visualizations is drastically reduced when rendering tens- or hundreds of millions of objects. At the same time, graphics hardware memory capabilities limit the size of data sets which can be placed in GPU memory for rendering. To circumvent these obstacles, a progressive rendering approach is employed, which gradually streams and renders all particle data to the GPU without reducing or altering the particle data itself. The particle data is rendered according to a visibility sorting derived from occlusion relations between different parts of the data set, leading to a rendering order of scene contents guided by importance for the rendered image. I analyze and compare possible implementation choices for rendering particles as opaque spheres in OpenGL, which forms the basis of the particle rendering application developed within this work. The application utilizes a multi-threaded architecture, where data preprocessing on a CPU-thread and a rendering algorithm on a GPU-thread ensure that the user can interact with the application at any time. In particular it is guaranteed that the user can explore the particle data interactively, by ensuring minimal latency from user input to seeing the effects of that input. This is achieved by favoring user inputs over completeness of the rendered image at all stages during rendering. At the same time the user is provided with an immediate feedback about interactions by re-projecting all currently visible particles to the next rendered image. The re-projection is realized with an on-GPU particle-cache of visible particles that is built during particle data streaming and rendering, and drawn upon user interaction using the most recent camera confguration according to user inputs. The combination of the developed techniques allows interactive exploration of particle data sets with up to 1.5 billion particles on a commodity computer.In dieser Arbeit wird die interaktive Visualisierung beliebig großer Partikeldaten untersucht, wobei die Partikeldaten im Arbeitsspeicher hinterlegt sind, aber nicht zwangsläufig in den Grafikspeicher passen. Mit üblichen Rendering Methoden büßen Visualisierungen drastisch an Interaktivität ein, wenn mehrere zehn- bis hunderte Millionen Objekte dargestellt werden. Gleichzeitig ist die Größe möglicher zu visualisierender Datensätze begrenzt durch den Videospeicher von Grafikkarten, auf dem zu visualisierende Daten vorliegen müssen. Um diese Einschränkungen zu umgehen, wird in dieser Arbeit ein progressiver Rendering Ansatz verfolgt, der sukzessive alle Partikeldaten zur Grafikkarte hochlädt und rendert, ohne die Partikeldaten zu reduzieren oder anderweitig zu verändern. Die Partikeldaten werden entsprechend einer vorgenommenen Sichtbarkeitssortierung gerendert, die aus gegenseitigen Verdeckungen verschiedener Teile des Partikeldatensatzes berechnet wird. Dies führt dazu, dass Teile der Szene nach ihrer Wichtigkeit für das aktuelle Bild sortiert und dargestellt werden. Es werden verschiedene Möglichkeiten analysiert und verglichen, Partikel als opake Kugeln in OpenGL zu rendern. Dies formt die Grundlage für die Partikel-Rendering Software, die in dieser Arbeit entwickelt wurde. Die Architektur der Rendering-Software benutzt mehrere Threads, sodass durch eine Daten-Vorverarbeitung auf einem CPUThread und durch Rendering-Algorithmen auf einem GPU-Thread sichergestellt ist, dass der Benutzer mit der Software jederzeit interagieren kann. Insbesondere ist sichergestellt, dass der Benutzer die Partikeldaten interaktiv untersuchen kann, indem die Latenz zwischen Benutzereingaben und dem Anzeigen der daraus resultierenden Veränderungen minimal gehalten wird. Dies wird erreicht indem der Verarbeitung von Benutzereingaben an allen Stellen des Rendering-Prozesses höhere Priorität eingeräumt wird als der Vollständigkeit des gerenderten Bildes. Gleichzeitig wird dem Benutzer eine sofortige Rückmeldung über getätigte Benutzereingaben gegeben, indem alle sichtbaren Partikel in das nächste gerenderte Bild neu projeziert werden. Diese Neu-Projektion wird durch einen GPU-seitigen Partikel-Cache aller aktuell sichtbaren Partikel realisiert, der während des sukzessiven Partikelstreamings und -renderns aufgebaut wird. Sobald der Benutzer eine Eingabe tätigt, wird der auf der GPU liegende Partikel-Cache unter der aktuellsten benutzerdefinierten Kameraposition neu gerendert. Die Kombination dieser entwickelten Methoden erlaubt ein interaktives Betrachten von Partikeldaten mit bis zu 1,5 Milliarden Partikeln auf einem handelsüblichen Computer

Exploring Alternative Restoration Techniques in Constraint Programming

Author: LIN YONG
Publication venue
Publication date: 24/01/2014
Field of study

Master'sMASTER OF SCIENC

ScholarBank@NUS

Graphics Processing Unit-Based Computer-Aided Design Algorithms for Electronic Design Automation

Author: Han Yiding
Publication venue: DigitalCommons@USU
Publication date: 01/05/2014
Field of study

The electronic design automation (EDA) tools are a specific set of software that play important roles in modern integrated circuit (IC) design. These software automate the design processes of IC with various stages. Among these stages, two important EDA design tools are the focus of this research: floorplanning and global routing. Specifically, the goal of this study is to parallelize these two tools such that their execution time can be significantly shortened on modern multi-core and graphics processing unit (GPU) architectures. The GPU hardware is a massively parallel architecture, enabling thousands of independent threads to execute concurrently. Although a small set of EDA tools can benefit from using GPU to accelerate their speed, most algorithms in this field are designed with the single-core paradigm in mind. The floorplanning and global routing algorithms are among the latter, and difficult to render any speedup on the GPU due to their inherent sequential nature. This work parallelizes the floorplanning and global routing algorithm through a novel approach and results in significant speedups for both tools implemented on the GPU hardware. Specifically, with a complete overhaul of solution space and design space exploration, a GPU-based floorplanning algorithm is able to render 4-166X speedup, while achieving similar or improved solutions compared with the sequential algorithm. The GPU-based global routing algorithm is shown to achieve significant speedup against existing state-of-the-art routers, while delivering competitive solution quality. Importantly, this parallel model for global routing renders a stable solution that is independent from the level of parallelism. In summary, this research has shown that through a design paradigm overhaul, sequential algorithms can also benefit from the massively parallel architecture. The findings of this study have a positive impact on the efficiency and design quality of modern EDA design flow

DigitalCommons@USU

Recommended from our members

Complex Query Operators on Modern Parallel Architectures

Author: Zois Vasileios
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Identifying interesting objects from a large data collection is a fundamental problem for multi-criteria decision making applications.In Relational Database Management Systems (RDBMS), the most popular complex query operators used to solve this type of problem are the Top-K selection operator and the Skyline operator.Top-K selection is tasked with retrieving the k-highest ranking tuples from a given relation, as determined by a user-defined aggregation function.Skyline selection retrieves those tuples with attributes offering (pareto) optimal trade-offs in a given relation.Efficient Top-K query processing entails minimizing tuple evaluations by utilizing elaborate processing schemes combined with sophisticated data structures that enable early termination.Skyline query evaluation involves supporting processing strategies which are geared towards early termination and incomparable tuple pruning.The rapid increase in memory capacity and decreasing costs have been the main drivers behind the development of main-memory database systems.Although the act of migrating query processing in-memory has created many opportunities to improve the associated query latency, attaining such improvements has been very challenging due to the growing gap between processor and main memory speeds.Addressing this limitation has been made easier by the rapid proliferation of multi-core and many-core architectures.However, their utilization in real systems has been hindered by the lack of suitable parallel algorithms that focus on algorithmic efficiency.In this thesis, we study in depth the Top-K and Skyline selection operators, in the context of emerging parallel architectures.Our ultimate goal is to provide practical guidelines for developing work-efficient algorithms suitable for parallel main memory processing.We concentrate on multi-core (CPU), many-core (GPU), and processing-in-memory architectures (PIM), developing solutions optimized for high throughout and low latency.The first part of this thesis focuses on Top-K selection, presenting the specific details of early termination algorithms that we developed specifically for parallel architectures and various types of accelerators (i.e. GPU, PIM).The second part of this thesis, concentrates on Skyline selection and the development of a massively parallel load balanced algorithm for PIM architectures.Our work consolidates performance results across different parallel architectures using synthetic and real data on variable query parameters and distributions for both of the aforementioned problems.The experimental results demonstrate several orders of magnitude better throughput and query latency, thus validating the effectiveness of our proposed solutions for the Top-K and Skyline selection operators

eScholarship - University of California

Test analysis & fault simulation of microfluidic systems

Author: Myers Thomas Oliver
Publication venue
Publication date: 01/10/2010
Field of study

This work presents a design, simulation and test methodology for microfluidic systems, with particular focus on simulation for test. A Microfluidic Fault Simulator (MFS) has been created based around COMSOL which allows a fault-free system model to undergo fault injection and provide test measurements. A post MFS test analysis procedure is also described.A range of fault-free system simulations have been cross-validated to experimental work to gauge the accuracy of the fundamental simulation approach prior to further investigation and development of the simulation and test procedure.A generic mechanism, termed a fault block, has been developed to provide fault injection and a method of describing a low abstraction behavioural fault model within the system. This technique has allowed the creation of a fault library containing a range of different microfluidic fault conditions. Each of the fault models has been cross-validated to experimental conditions or published results to determine their accuracy.Two test methods, namely, impedance spectroscopy and Levich electro-chemical sensors have been investigated as general methods of microfluidic test, each of which has been shown to be sensitive to a multitude of fault. Each method has successfully been implemented within the simulation environment and each cross-validated by first-hand experimentation or published work.A test analysis procedure based around the Neyman-Pearson criterion has been developed to allow a probabilistic metric for each test applied for a given fault condition, providing a quantitive assessment of each test. These metrics are used to analyse the sensitivity of each test method, useful when determining which tests to employ in the final system. Furthermore, these probabilistic metrics may be combined to provide a fault coverage metric for the complete system.The complete MFS method has been applied to two system cases studies; a hydrodynamic “Y” channel and a flow cytometry system for prognosing head and neck cancer.Decision trees are trained based on the test measurement data and fault conditions as a means of classifying the systems fault condition state. The classification rules created by the decision trees may be displayed graphically or as a set of rules which can be loaded into test instrumentation. During the course of this research a high voltage power supply instrument has been developed to aid electro-osmotic experimentation and an impedance spectrometer to provide embedded test

Repository@Hull - Worktribe

An input centric paradigm for program dynamic optimizations and lifetime evolvement

Author: Tian Kai
Publication venue: W&M ScholarWorks
Publication date: 01/01/2012
Field of study

Accurately predicting program behaviors (e.g., memory locality, method calling frequency) is fundamental for program optimizations and runtime adaptations. Despite decades of remarkable progress, prior studies have not systematically exploited the use of program inputs, a deciding factor of program behaviors, to help in program dynamic optimizations. Triggered by the strong and predictive correlations between program inputs and program behaviors that recent studies have uncovered, the dissertation work aims to bring program inputs into the focus of program behavior analysis and program dynamic optimization, cultivating a new paradigm named input-centric program behavior analysis and dynamic optimization.;The new optimization paradigm consists of three components, forming a three-layer pyramid. at the base is program input characterization, a component for resolving the complexity in program raw inputs and extracting important features. In the middle is input-behavior modeling, a component for recognizing and modeling the correlations between characterized input features and program behaviors. These two components constitute input-centric program behavior analysis, which (ideally) is able to predict the large-scope behaviors of a program\u27s execution as soon as the execution starts. The top layer is input-centric adaptation, which capitalizes on the novel opportunities created by the first two components to facilitate proactive adaptation for program optimizations.;This dissertation aims to develop this paradigm in two stages. In the first stage, we concentrate on exploring the implications of program inputs for program behaviors and dynamic optimization. We construct the basic input-centric optimization framework based on of line training to realize the basic functionalities of the three major components of the paradigm. For the second stage, we focus on making the paradigm practical by addressing multi-facet issues in handling input complexities, transparent training data collection, predictive model evolvement across production runs. The techniques proposed in this stage together cultivate a lifelong continuous optimization scheme with cross-input adaptivity.;Fundamentally the new optimization paradigm provides a brand new solution for program dynamic optimization. The techniques proposed in the dissertation together resolve the adaptivity-proactivity dilemma that has been limiting the effectiveness of existing optimization techniques. its benefits are demonstrated through proactive dynamic optimizations in Jikes RVM and version selection using IBM XL C Compiler, yielding significant performance improvement on a set of Java and C/C++ programs. It may open new opportunities for a broad range of runtime optimizations and adaptations. The evaluation results on both Java and C/C++ applications demonstrate the new paradigm is promising in advancing the current state of program optimizations

College of William & Mary: W&M Publish