Search CORE

252 research outputs found

Improving Programming Support for Hardware Accelerators Through Automata Processing Abstractions

Author: Angstadt Kevin
Publication venue
Publication date
Field of study

The adoption of hardware accelerators, such as Field-Programmable Gate Arrays, into general-purpose computation pipelines continues to rise, driven by recent trends in data collection and analysis as well as pressure from challenging physical design constraints in hardware. The architectural designs of many of these accelerators stand in stark contrast to the traditional von Neumann model of CPUs. Consequently, existing programming languages, maintenance tools, and techniques are not directly applicable to these devices, meaning that additional architectural knowledge is required for effective programming and configuration. Current programming models and techniques are akin to assembly-level programming on a CPU, thus placing significant burden on developers tasked with using these architectures. Because programming is currently performed at such low levels of abstraction, the software development process is tedious and challenging and hinders the adoption of hardware accelerators. This dissertation explores the thesis that theoretical finite automata provide a suitable abstraction for bridging the gap between high-level programming models and maintenance tools familiar to developers and the low-level hardware representations that enable high-performance execution on hardware accelerators. We adopt a principled hardware/software co-design methodology to develop a programming model providing the key properties that we observe are necessary for success, namely performance and scalability, ease of use, expressive power, and legacy support. First, we develop a framework that allows developers to port existing, legacy code to run on hardware accelerators by leveraging automata learning algorithms in a novel composition with software verification, string solvers, and high-performance automata architectures. Next, we design a domain-specific programming language to aid programmers writing pattern-searching algorithms and develop compilation algorithms to produce finite automata, which supports efficient execution on a wide variety of processing architectures. Then, we develop an interactive debugger for our new language, which allows developers to accurately identify the locations of bugs in software while maintaining support for high-throughput data processing. Finally, we develop two new automata-derived accelerator architectures to support additional applications, including the detection of security attacks and the parsing of recursive and tree-structured data. Using empirical studies, logical reasoning, and statistical analyses, we demonstrate that our prototype artifacts scale to real-world applications, maintain manageable overheads, and support developers' use of hardware accelerators. Collectively, the research efforts detailed in this dissertation help ease the adoption and use of hardware accelerators for data analysis applications, while supporting high-performance computation.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155224/1/angstadt_1.pd

Techniques For Accelerating Large-Scale Automata Processing

Author: Liu Hongyuan
Publication venue: W&M ScholarWorks
Publication date: 01/01/2022
Field of study

The big-data era has brought new challenges to computer architectures due to the large-scale computation and data. Moreover, this problem becomes critical in several domains where the computation is also irregular, among which we focus on automata processing in this dissertation. Automata are widely used in applications from different domains such as network intrusion detection, machine learning, and parsing. Large-scale automata processing is challenging for traditional von Neumann architectures. To this end, many accelerator prototypes have been proposed. Micron\u27s Automata Processor (AP) is an example. However, as a spatial architecture, it is unable to handle large automata programs without repeated reconfiguration and re-execution. We found a large number of automata states are never enabled in the execution but still configured on the AP chips, leading to its underutilization. To address this issue, we proposed a lightweight offline profiling technique to predict the never-enabled states and keep them out of the AP. Furthermore, we develop SparseAP, a new execution mode for AP to handle the misprediction efficiently. Our software and hardware co-optimization obtains 2.1x speedup over the baseline AP execution across 26 applications. Since the AP is not publicly available, we aim to reduce the performance gap between a general-purpose accelerator---Graphics Processing Unit (GPU) and AP. We identify excessive data movement in the GPU memory hierarchy and propose optimization techniques to reduce the data movement. Although our optimization techniques significantly alleviate these memory-related bottlenecks, a side effect of them is the static assignment of work to cores. This leads to poor compute utilization as GPU cores are wasted on idle automata states. Therefore, we propose a new dynamic scheme that effectively balances compute utilization with reduced memory usage. Our combined optimizations provide a significant improvement over the previous state-of-the-art GPU implementations of automata. Moreover, they enable current GPUs to outperform the AP across several applications while performing within an order of magnitude for the rest of them. To make automata processing on GPU more generic to tasks with different amounts of parallelism, we propose AsyncAP, a lightweight approach that scales with the input length. Threads run asynchronously in AsyncAP, alleviating the bottleneck of thread block synchronization. The evaluation and detailed analysis demonstrate that AsyncAP achieves significant speedup or at least comparable performance under various scenarios for most of the applications. The future work aims to design automatic ways to generate optimizations and mappings between automata and computation resources for different GPUs. We will broaden the scope of this dissertation to domains such as graph computing

College of William & Mary: W&M Publish

Space Communications: Theory and Applications. Volume 3: Information Processing and Advanced Techniques. A Bibliography, 1958 - 1963

Author: Bickford L. C.
Filipowsky R. F.
Publication venue
Publication date
Field of study

Annotated bibliography on information processing and advanced communication techniques - theory and applications of space communication

Potential and Challenges of Analog Reconfigurable Computation in Modern and Future CMOS

Author: Pänkäälä Mikko
Publication venue: Turku Centre for Computer Science
Publication date: 05/12/2014
Field of study

In this work, the feasibility of the floating-gate technology in analog computing platforms in a scaled down general-purpose CMOS technology is considered. When the technology is scaled down the performance of analog circuits tends to get worse because the process parameters are optimized for digital transistors and the scaling involves the reduction of supply voltages. Generally, the challenge in analog circuit design is that all salient design metrics such as power, area, bandwidth and accuracy are interrelated. Furthermore, poor flexibility, i.e. lack of reconfigurability, the reuse of IP etc., can be considered the most severe weakness of analog hardware. On this account, digital calibration schemes are often required for improved performance or yield enhancement, whereas high flexibility/reconfigurability can not be easily achieved. Here, it is discussed whether it is possible to work around these obstacles by using floating-gate transistors (FGTs), and analyze problems associated with the practical implementation. FGT technology is attractive because it is electrically programmable and also features a charge-based built-in non-volatile memory. Apart from being ideal for canceling the circuit non-idealities due to process variations, the FGTs can also be used as computational or adaptive elements in analog circuits. The nominal gate oxide thickness in the deep sub-micron (DSM) processes is too thin to support robust charge retention and consequently the FGT becomes leaky. In principle, non-leaky FGTs can be implemented in a scaled down process without any special masks by using “double”-oxide transistors intended for providing devices that operate with higher supply voltages than general purpose devices. However, in practice the technology scaling poses several challenges which are addressed in this thesis. To provide a sufficiently wide-ranging survey, six prototype chips with varying complexity were implemented in four different DSM process nodes and investigated from this perspective. The focus is on non-leaky FGTs, but the presented autozeroing floating-gate amplifier (AFGA) demonstrates that leaky FGTs may also find a use. The simplest test structures contain only a few transistors, whereas the most complex experimental chip is an implementation of a spiking neural network (SNN) which comprises thousands of active and passive devices. More precisely, it is a fully connected (256 FGT synapses) two-layer spiking neural network (SNN), where the adaptive properties of FGT are taken advantage of. A compact realization of Spike Timing Dependent Plasticity (STDP) within the SNN is one of the key contributions of this thesis. Finally, the considerations in this thesis extend beyond CMOS to emerging nanodevices. To this end, one promising emerging nanoscale circuit element - memristor - is reviewed and its applicability for analog processing is considered. Furthermore, it is discussed how the FGT technology can be used to prototype computation paradigms compatible with these emerging two-terminal nanoscale devices in a mature and widely available CMOS technology.Siirretty Doriast

Advanced Automation for Space Missions

Author: Freitas R. A., Jr.
Gilbreath W. P.
Publication venue
Publication date
Field of study

The feasibility of using machine intelligence, including automation and robotics, in future space missions was studied

Literature review of the remote sensing of natural resources

Author: Fears C. B.
Inglis M. H.
Publication venue
Publication date
Field of study

Abstracts of 596 documents related to remote sensors or the remote sensing of natural resources by satellite, aircraft, or ground-based stations are presented. Topics covered include general theory, geology and hydrology, agriculture and forestry, marine sciences, urban land use, and instrumentation. Recent documents not yet cited in any of the seven information sources used for the compilation are summarized. An author/key word index is provided

Quarterly literature review of the remote sensing of natural resources

Author: Fears C. B.
Inglis M. H.
Publication venue
Publication date
Field of study

The Technology Application Center reviewed abstracted literature sources, and selected document data and data gathering techniques which were performed or obtained remotely from space, aircraft or groundbased stations. All of the documentation was related to remote sensing sensors or the remote sensing of the natural resources. Sensors were primarily those operating within the 10 to the minus 8 power to 1 meter wavelength band. Included are NASA Tech Briefs, ARAC Industrial Applications Reports, U.S. Navy Technical Reports, U.S. Patent reports, and other technical articles and reports