Search CORE

9 research outputs found

Computational Redundancy in Image Processing

Author: Khalvati Farzad
Publication venue: 'University of Waterloo'
Publication date: 01/01/2008
Field of study

This research presents a new performance improvement technique, window memoization, for software and hardware implementations of local image processing algorithms. Window memoization combines the memoization techniques proposed in software and hardware with a characteristic of image data, computational redundancy, to improve the performance (in software) and efficiency (in hardware) of local image processing algorithms. The computational redundancy of an image indicates the percentage of computations that can be skipped when performing a local image processing algorithm on the image. Our studies show that computational redundancy is inherited from two principal redundancies in image data: coding redundancy and interpixel redundancy. We have shown mathematically that the amount of coding and interpixel redundancy of an image has a positive effect on the computational redundancy of the image where a higher coding and interpixel redundancy leads to a higher computational redundancy. We have also demonstrated (mathematically and empirically) that the amount of coding and interpixel redundancy of an image has a positive effect on the speedup obtained for the image by window memoization in both software and hardware. Window memoization minimizes the number of redundant computations performed on an image by identifying similar neighborhoods of pixels in the image. It uses a memory, reuse table, to store the results of previously performed computations. When a set of computations has to be performed for the first time, the computations are performed and the corresponding result is stored in the reuse table. When the same set of computations has to be performed again in the future, the previously calculated result is reused and the actual computations are skipped. Implementing the window memoization technique in software speeds up the computations required to complete an image processing task. In software, we have developed an optimized architecture for window memoization and applied it to six image processing algorithms: Canny edge detector, morphological gradient, Kirsch edge detector, Trajkovic corner detector, median filter, and local variance. The typical speedups range from 1.2 to 7.9 with a maximum factor of 40. We have also presented a performance model to predict the speedups obtained by window memoization in software. In hardware, we have developed an optimized architecture that embodies the window memoization technique. Our hardware design for window memoization achieves high speedups with an overhead in hardware area that is significantly less than that of conventional performance improvement techniques. As case studies in hardware, we have applied window memoization to the Kirsch edge detector and median filter. The typical and maximum speedup factors in hardware are 1.6 and 1.8, respectively, with 40% less hardware in comparison to conventional optimization techniques

University of Waterloo's Institutional Repository

Microarchitectural techniques to reduce energy consumption in the memory hierarchy

Author: Ghosh Mrinmoy
Publication venue: Georgia Institute of Technology
Publication date: 03/04/2009
Field of study

This thesis states that dynamic profiling of the memory reference stream can improve energy and performance in the memory hierarchy. The research presented in this theses provides multiple instances of using lightweight hardware structures to profile the memory reference stream. The objective of this research is to develop microarchitectural techniques to reduce energy consumption at different levels of the memory hierarchy. Several simple and implementable techniques were developed as a part of this research. One of the techniques identifies and eliminates redundant refresh operations in DRAM and reduces DRAM refresh power. Another, reduces leakage energy in L2 and higher level caches for multiprocessor systems. The emphasis of this research has been to develop several techniques of obtaining energy savings in caches using a simple hardware structure called the counting Bloom filter (CBF). CBFs have been used to predict L2 cache misses and obtain energy savings by not accessing the L2 cache on a predicted miss. A simple extension of this technique allows CBFs to do way-estimation of set associative caches to reduce energy in cache lookups. Another technique using CBFs track addresses in a Virtual Cache and reduce false synonym lookups. Finally this thesis presents a technique to reduce dynamic power consumption in level one caches using significance compression. The significant energy and performance improvements demonstrated by the techniques presented in this thesis suggest that this work will be of great value for designing memory hierarchies of future computing platforms.Ph.D.Committee Chair: Lee, Hsien-Hsin S.; Committee Member: Cahtterjee,Abhijit; Committee Member: Mukhopadhyay, Saibal; Committee Member: Pande, Santosh; Committee Member: Yalamanchili, Sudhaka

Scholarly Materials And Research @ Georgia Tech

Dynamic Orchestration of Massively Data Parallel Execution.

Author: Samadiarakhshbahar Mehrzad
Publication venue
Publication date: 01/01/2014
Field of study

Graphics processing units (GPUs) are specialized hardware accelerators capable of rendering graphics much faster than conventional general-purpose processors. They are widely used in personal computers, tablets, mobile phones, and game consoles. Modern GPUs are not only efficient at manipulating computer graphics, but also are more effective than CPUs for algorithms where processing of large data blocks can be done in parallel. This is mainly due to their highly parallel architecture. While GPUs provide low-cost and efficient platforms for accelerating massively parallel applications, tedious performance tuning is required to maximize application execution efficiency. Achieving high performance requires the programmers to manually manage the amount of on-chip memory used per thread, the total number of threads per multiprocessor, the pattern of off-chip memory accesses, etc. In addition to a complex programming model, there is a lack of performance portability across various systems with different runtime properties. Programmers usually make assumptions about runtime properties when they write code and optimize that code based on those assumptions. However, if any of these properties changes during execution, the optimized code performs poorly. To alleviate these limitations, several implementations of the application are needed to maximize performance for different runtime properties. However, it is not practical for the programmer to write several different versions of the same code which are optimized for each individual runtime condition. In this thesis, we propose a static and dynamic compiler framework to take the burden of fine tuning different implementations of the same code off the programmer. This framework enables the programmer to write the program once and allow a static compiler to generate different versions of a data parallel application with several tuning parameters. The runtime system selects the best version and fine tunes its parameters based on runtime properties such as device configuration, input size, dependency, and data values.PhDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/108805/1/mehrzads_1.pd

CiteSeerX

Deep Blue Documents at the University of Michigan

Memory and functional unit design for vector microprocessors

Author: Boettcher Matthias
Publication venue
Publication date: 02/05/2014
Field of study

Southampton (e-Prints Soton)

Dependable Embedded Systems

Author: Dutt Nikil
Henkel Jörg
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems

OAPEN Library

Kiel Declarative Programming Days 2013

Author
Publication venue
Publication date: 01/01/2013
Field of study

This report contains the papers presented at the Kiel Declarative Programming Days 2013, held in Kiel (Germany) during September 11-13, 2013. The Kiel Declarative Programming Days 2013 unified the following events: * 20th International Conference on Applications of Declarative Programming and Knowledge Management (INAP 2013) * 22nd International Workshop on Functional and (Constraint) Logic Programming (WFLP 2013) * 27th Workshop on Logic Programming (WLP 2013) All these events are centered around declarative programming, an advanced paradigm for the modeling and solving of complex problems. These specification and implementation methods attracted increasing attention over the last decades, e.g., in the domains of databases and natural language processing, for modeling and processing combinatorial problems, and for high-level programming of complex, in particular, knowledge-based systems

MACAU: Open Access Repository of Kiel University

Effective testing for concurrency bugs

Author: Sousa da Fonseca Pedro José
Publication venue: Fakultät 6 - Naturwissenschaftlich-Technische Fakultät I. Fachrichtung 6.2 - Informatik
Publication date: 01/01/2015
Field of study

In the current multi-core era, concurrency bugs are a serious threat to software reliability. As hardware becomes more parallel, concurrent programming will become increasingly pervasive. However, correct concurrent programming is known to be extremely challenging for developers and can easily lead to the introduction of concurrency bugs. This dissertation addresses this challenge by proposing novel techniques to help developers expose and detect concurrency bugs. We conducted a bug study to better understand the external and internal effects of real-world concurrency bugs. Our study revealed that a significant fraction of concurrency bugs qualify as semantic or latent bugs, which are two particularly challenging classes of concurrency bugs. Based on the insights from the study, we propose a concurrency bug detector, PIKE that analyzes the behavior of program executions to infer whether concurrency bugs have been triggered during a concurrent execution. In addition, we present the design of a testing tool, SKI, that allows developers to test operating system kernels for concurrency bugs in a practical manner. SKI bridges the gap between user-mode testing and kernel-mode testing by enabling the systematic exploration of the kernel thread interleaving space. Our evaluation shows that both PIKE and SKI are effective at finding concurrency bugs.Im gegenwärtigen Multicore-Zeitalter sind Fehler aufgrund von Nebenläufigkeit eine ernsthafte Bedrohung der Zuverlässigkeit von Software. Mit der wachsenden Parallelisierung von Hardware wird nebenläufiges Programmieren nach und nach allgegenwärtig. Diese Art von Programmieren ist jedoch als äußerst schwierig bekannt und kann leicht zu Programmierfehlern führen. Die vorliegende Dissertation nimmt sich dieser Herausforderung an indem sie neuartige Techniken vorschlägt, die Entwicklern beim Aufdecken von Nebenläufigkeitsfehlern helfen. Wir führen eine Studie von Fehlern durch, um die externen und internen Effekte von in der Praxis vorkommenden Nebenläufigkeitsfehlern besser zu verstehen. Diese ergibt, dass ein bedeutender Anteil von solchen Fehlern als semantisch bzw. latent zu charakterisieren ist -- zwei besonders herausfordernde Klassen von Nebenläufigkeitsfehlern. Basierend auf den Erkenntnissen der Studie entwickeln wir einen Detektor (PIKE), der Programmausführungen daraufhin analysiert, ob Nebenläufigkeitsfehler aufgetreten sind. Weiterhin präsentieren wir das Design eines Testtools (SKI), das es Entwicklern ermöglicht, Betriebssystemkerne praktikabel auf Nebenläufigkeitsfehler zu überprüfen. SKI füllt die Lücke zwischen Testen im Benutzermodus und Testen im Kernelmodus, indem es die systematische Erkundung der Kernel-Thread-Verschachtelungen erlaubt. Unsere Auswertung zeigt, dass sowohl PIKE als auch SKI effektiv Nebenläufigkeitsfehler finden

Universaar

MPG.PuRe

Acronym

32nd International Symposium on Theoretical Aspects of Computer Science: STACS '15, March 4 - 7, 2015, Garching, Germany

Author: STAC <32. 2015, Garching>
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/02/2015
Field of study

Digitale Bibliothek Thüringen

LIPIcs, Volume 274, ESA 2023, Complete Volume

Author: Farach-Colton Martin
Herman Grzegorz
Puglisi Simon J.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st Annual European Symposium on Algorithms (ESA 2023)
Publication date: 01/01/2023
Field of study

LIPIcs, Volume 274, ESA 2023, Complete Volum

Dagstuhl Research Online Publication Server