9 research outputs found
Computational Redundancy in Image Processing
This research presents a new performance improvement technique, window memoization, for software and hardware implementations of local image processing algorithms. Window memoization combines the memoization techniques proposed in software and hardware with a characteristic of image data, computational redundancy, to improve the performance (in software) and efficiency (in hardware) of local image processing algorithms.
The computational redundancy of an image indicates the percentage of computations that can be skipped when performing a local image processing algorithm on the image. Our studies show that computational redundancy is inherited from two principal redundancies in image data: coding redundancy and interpixel redundancy. We have shown mathematically that the amount of coding and interpixel redundancy of an image has a positive effect on the computational redundancy of the image where a higher coding and interpixel redundancy leads to a higher computational redundancy. We have also demonstrated (mathematically and empirically) that the amount of coding and interpixel redundancy of an image has a positive effect on the speedup obtained for the image by window memoization in both software and hardware.
Window memoization minimizes the number of redundant computations performed on an image by identifying similar neighborhoods of pixels in the image. It uses a memory, reuse table, to store the results of previously performed computations. When a set of computations has to be performed for the first time, the computations are performed and the corresponding result is stored in the reuse table. When the same set of computations has to be performed again in the future, the previously calculated result is reused and the actual computations are skipped.
Implementing the window memoization technique in software speeds up the computations required to complete an image processing task. In software, we have developed an optimized architecture for window memoization and applied it to six image processing algorithms: Canny edge detector, morphological gradient, Kirsch edge detector, Trajkovic corner detector, median filter, and local variance. The typical speedups range from 1.2 to 7.9 with a maximum factor of 40. We have also presented a performance model to predict the speedups obtained by window memoization in software.
In hardware, we have developed an optimized architecture that embodies the window memoization technique. Our hardware design for window memoization achieves high speedups with an overhead in hardware area that is significantly less than that of conventional performance improvement techniques. As case studies in hardware, we have applied window memoization to the Kirsch edge detector and median filter. The typical and maximum speedup factors in hardware are 1.6 and 1.8, respectively, with 40% less hardware in comparison to conventional optimization techniques
Microarchitectural techniques to reduce energy consumption in the memory hierarchy
This thesis states that dynamic profiling of the memory reference stream can improve energy
and performance in the memory hierarchy. The research presented in this theses provides
multiple instances of using lightweight hardware structures to profile the memory
reference stream. The objective of this research is to develop microarchitectural techniques
to reduce energy consumption at different levels of the memory hierarchy. Several simple
and implementable techniques were developed as a part of this research. One of the
techniques identifies and eliminates redundant refresh operations in DRAM and reduces
DRAM refresh power. Another, reduces leakage energy in L2 and higher level caches for
multiprocessor systems. The emphasis of this research has been to develop several techniques
of obtaining energy savings in caches using a simple hardware structure called the
counting Bloom filter (CBF). CBFs have been used to predict L2 cache misses and obtain
energy savings by not accessing the L2 cache on a predicted miss. A simple extension of
this technique allows CBFs to do way-estimation of set associative caches to reduce energy
in cache lookups. Another technique using CBFs track addresses in a Virtual Cache and
reduce false synonym lookups. Finally this thesis presents a technique to reduce dynamic
power consumption in level one caches using significance compression. The significant
energy and performance improvements demonstrated by the techniques presented in this
thesis suggest that this work will be of great value for designing memory hierarchies of
future computing platforms.Ph.D.Committee Chair: Lee, Hsien-Hsin S.; Committee Member: Cahtterjee,Abhijit; Committee Member: Mukhopadhyay, Saibal; Committee Member: Pande, Santosh; Committee Member: Yalamanchili, Sudhaka
Dynamic Orchestration of Massively Data Parallel Execution.
Graphics processing units (GPUs) are specialized hardware accelerators
capable of rendering graphics much faster than conventional
general-purpose processors. They are widely used in personal computers,
tablets, mobile phones, and game consoles. Modern GPUs are not only
efficient at manipulating computer graphics, but also are more effective
than CPUs for algorithms where processing of large data blocks can be done
in parallel. This is mainly due to their highly parallel architecture.
While GPUs provide low-cost and efficient
platforms for accelerating massively parallel applications, tedious
performance tuning is required to maximize application execution
efficiency. Achieving high performance requires the programmers to
manually manage the amount of on-chip memory used per thread, the total
number of threads per multiprocessor, the pattern of off-chip memory
accesses, etc.
In addition to a complex programming model, there is a lack of performance
portability across various systems with different runtime properties. Programmers usually make assumptions about
runtime properties when they write code and optimize that code based
on those assumptions. However, if any of these properties changes
during execution, the optimized code performs poorly. To alleviate these
limitations, several implementations of the application are needed to
maximize performance for different runtime properties. However, it
is not practical for the programmer to write several different versions of the
same code which are optimized for each individual runtime condition.
In this thesis, we propose a static and dynamic compiler framework to
take the burden of fine tuning different implementations of the same code
off the programmer. This framework enables the programmer to write the
program once and allow a static compiler to generate different versions of
a data parallel application with several tuning parameters. The runtime
system selects the best version and fine tunes its parameters based on
runtime properties such as device configuration, input size, dependency,
and data values.PhDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/108805/1/mehrzads_1.pd
Dependable Embedded Systems
This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems
Kiel Declarative Programming Days 2013
This report contains the papers presented at the Kiel Declarative Programming Days 2013, held in Kiel (Germany) during September 11-13, 2013. The Kiel Declarative Programming Days 2013 unified the following events: * 20th International Conference on Applications of Declarative Programming and Knowledge Management (INAP 2013) * 22nd International Workshop on Functional and (Constraint) Logic Programming (WFLP 2013) * 27th Workshop on Logic Programming (WLP 2013) All these events are centered around declarative programming, an advanced paradigm for the modeling and solving of complex problems. These specification and implementation methods attracted increasing attention over the last decades, e.g., in the domains of databases and natural language processing, for modeling and processing combinatorial problems, and for high-level programming of complex, in particular, knowledge-based systems
Effective testing for concurrency bugs
In the current multi-core era, concurrency bugs are a serious threat to software reliability. As hardware becomes more parallel, concurrent programming will become increasingly pervasive. However, correct concurrent programming is known to be extremely challenging for developers and can easily lead to the introduction of concurrency bugs. This dissertation addresses this challenge by proposing novel techniques to help developers expose and detect concurrency bugs.
We conducted a bug study to better understand the external and internal effects of real-world concurrency bugs. Our study revealed that a significant fraction of concurrency bugs qualify as semantic or latent bugs, which are two particularly challenging classes of concurrency bugs. Based on the insights from the study, we propose a concurrency bug detector, PIKE that analyzes the behavior of program executions to infer whether concurrency bugs have been triggered during a concurrent execution. In addition, we present the design of a testing tool, SKI, that allows developers to test operating system kernels for concurrency bugs in a practical manner. SKI bridges the gap between user-mode testing and kernel-mode testing by enabling the systematic exploration of the kernel thread interleaving space. Our evaluation shows that both PIKE and SKI are effective at finding concurrency bugs.Im gegenwärtigen Multicore-Zeitalter sind Fehler aufgrund von Nebenläufigkeit eine ernsthafte Bedrohung der Zuverlässigkeit von Software. Mit der wachsenden Parallelisierung von Hardware wird nebenläufiges Programmieren nach und nach allgegenwärtig. Diese Art von Programmieren ist jedoch als äußerst schwierig bekannt und kann leicht zu Programmierfehlern führen. Die vorliegende Dissertation nimmt sich dieser Herausforderung an indem sie neuartige Techniken vorschlägt, die Entwicklern beim Aufdecken von Nebenläufigkeitsfehlern helfen.
Wir führen eine Studie von Fehlern durch, um die externen und internen Effekte von in der Praxis vorkommenden Nebenläufigkeitsfehlern besser zu verstehen. Diese ergibt, dass ein bedeutender Anteil von solchen Fehlern als semantisch bzw. latent zu charakterisieren ist -- zwei besonders herausfordernde Klassen von Nebenläufigkeitsfehlern. Basierend auf den Erkenntnissen der Studie entwickeln wir einen Detektor (PIKE), der Programmausführungen daraufhin analysiert, ob Nebenläufigkeitsfehler aufgetreten sind. Weiterhin präsentieren wir das Design eines Testtools (SKI), das es Entwicklern ermöglicht, Betriebssystemkerne praktikabel auf Nebenläufigkeitsfehler zu überprüfen. SKI füllt die Lücke zwischen Testen im Benutzermodus und Testen im Kernelmodus, indem es die systematische Erkundung der Kernel-Thread-Verschachtelungen erlaubt. Unsere Auswertung zeigt, dass sowohl PIKE als auch SKI effektiv Nebenläufigkeitsfehler finden
LIPIcs, Volume 274, ESA 2023, Complete Volume
LIPIcs, Volume 274, ESA 2023, Complete Volum