10 research outputs found
Improving Programming Support for Hardware Accelerators Through Automata Processing Abstractions
The adoption of hardware accelerators, such as Field-Programmable Gate Arrays,
into general-purpose computation pipelines continues to rise, driven by recent
trends in data collection and analysis as well as pressure from challenging
physical design constraints in hardware. The architectural designs of many of
these accelerators stand in stark contrast to the traditional von Neumann model
of CPUs. Consequently, existing programming languages, maintenance tools, and
techniques are not directly applicable to these devices, meaning that additional
architectural knowledge is required for effective programming and configuration.
Current programming models and techniques are akin to assembly-level programming
on a CPU, thus placing significant burden on developers tasked with using these
architectures. Because programming is currently performed at such low levels of
abstraction, the software development process is tedious and challenging and
hinders the adoption of hardware accelerators.
This dissertation explores the thesis that theoretical finite automata provide a
suitable abstraction for bridging the gap between high-level programming models
and maintenance tools familiar to developers and the low-level hardware
representations that enable high-performance execution on hardware accelerators.
We adopt a principled hardware/software co-design methodology to develop a
programming model providing the key properties that we observe are necessary for success,
namely performance and scalability, ease of use, expressive power, and legacy
support.
First, we develop a framework that allows developers to port existing, legacy
code to run on hardware accelerators by leveraging automata learning algorithms
in a novel composition with software verification, string solvers, and
high-performance automata architectures. Next, we design a domain-specific
programming language to aid programmers writing pattern-searching algorithms and
develop compilation algorithms to produce finite automata, which supports
efficient execution on a wide variety of processing architectures. Then, we
develop an interactive debugger for our new language, which allows developers to
accurately identify the locations of bugs in software while maintaining support
for high-throughput data processing. Finally, we develop two new
automata-derived accelerator architectures to support additional applications,
including the detection of security attacks and the parsing of recursive and
tree-structured data. Using empirical studies, logical reasoning, and
statistical analyses, we demonstrate that our prototype artifacts scale to
real-world applications, maintain manageable overheads, and support developers'
use of hardware accelerators. Collectively, the research efforts detailed in
this dissertation help ease the adoption and use of hardware accelerators for
data analysis applications, while supporting high-performance computation.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155224/1/angstadt_1.pd
Techniques For Accelerating Large-Scale Automata Processing
The big-data era has brought new challenges to computer architectures due to the large-scale computation and data. Moreover, this problem becomes critical in several domains where the computation is also irregular, among which we focus on automata processing in this dissertation. Automata are widely used in applications from different domains such as network intrusion detection, machine learning, and parsing. Large-scale automata processing is challenging for traditional von Neumann architectures. To this end, many accelerator prototypes have been proposed. Micron\u27s Automata Processor (AP) is an example. However, as a spatial architecture, it is unable to handle large automata programs without repeated reconfiguration and re-execution. We found a large number of automata states are never enabled in the execution but still configured on the AP chips, leading to its underutilization. To address this issue, we proposed a lightweight offline profiling technique to predict the never-enabled states and keep them out of the AP. Furthermore, we develop SparseAP, a new execution mode for AP to handle the misprediction efficiently. Our software and hardware co-optimization obtains 2.1x speedup over the baseline AP execution across 26 applications. Since the AP is not publicly available, we aim to reduce the performance gap between a general-purpose accelerator---Graphics Processing Unit (GPU) and AP. We identify excessive data movement in the GPU memory hierarchy and propose optimization techniques to reduce the data movement. Although our optimization techniques significantly alleviate these memory-related bottlenecks, a side effect of them is the static assignment of work to cores. This leads to poor compute utilization as GPU cores are wasted on idle automata states. Therefore, we propose a new dynamic scheme that effectively balances compute utilization with reduced memory usage. Our combined optimizations provide a significant improvement over the previous state-of-the-art GPU implementations of automata. Moreover, they enable current GPUs to outperform the AP across several applications while performing within an order of magnitude for the rest of them. To make automata processing on GPU more generic to tasks with different amounts of parallelism, we propose AsyncAP, a lightweight approach that scales with the input length. Threads run asynchronously in AsyncAP, alleviating the bottleneck of thread block synchronization. The evaluation and detailed analysis demonstrate that AsyncAP achieves significant speedup or at least comparable performance under various scenarios for most of the applications. The future work aims to design automatic ways to generate optimizations and mappings between automata and computation resources for different GPUs. We will broaden the scope of this dissertation to domains such as graph computing
Mineração de informação biomédica a partir de literatura científica
Doutoramento conjunto MAP-iThe rapid evolution and proliferation of a world-wide computerized network,
the Internet, resulted in an overwhelming and constantly growing
amount of publicly available data and information, a fact that was also verified
in biomedicine. However, the lack of structure of textual data inhibits
its direct processing by computational solutions. Information extraction is
the task of text mining that intends to automatically collect information
from unstructured text data sources. The goal of the work described in this
thesis was to build innovative solutions for biomedical information extraction
from scientific literature, through the development of simple software
artifacts for developers and biocurators, delivering more accurate, usable
and faster results. We started by tackling named entity recognition - a crucial
initial task - with the development of Gimli, a machine-learning-based
solution that follows an incremental approach to optimize extracted linguistic
characteristics for each concept type. Afterwards, Totum was built to
harmonize concept names provided by heterogeneous systems, delivering a
robust solution with improved performance results. Such approach takes
advantage of heterogenous corpora to deliver cross-corpus harmonization
that is not constrained to specific characteristics. Since previous solutions
do not provide links to knowledge bases, Neji was built to streamline the
development of complex and custom solutions for biomedical concept name
recognition and normalization. This was achieved through a modular and
flexible framework focused on speed and performance, integrating a large
amount of processing modules optimized for the biomedical domain. To
offer on-demand heterogenous biomedical concept identification, we developed
BeCAS, a web application, service and widget. We also tackled relation
mining by developing TrigNER, a machine-learning-based solution for
biomedical event trigger recognition, which applies an automatic algorithm
to obtain the best linguistic features and model parameters for each event
type. Finally, in order to assist biocurators, Egas was developed to support
rapid, interactive and real-time collaborative curation of biomedical documents,
through manual and automatic in-line annotation of concepts and
relations. Overall, the research work presented in this thesis contributed
to a more accurate update of current biomedical knowledge bases, towards
improved hypothesis generation and knowledge discovery.A rápida evolução e proliferação de uma rede mundial de computadores, a
Internet, resultou num esmagador e constante crescimento na quantidade
de dados e informação publicamente disponíveis, o que também se verificou
na biomedicina. No entanto, a inexistência de estrutura em dados textuais
inibe o seu processamento direto por parte de soluções informatizadas. Extração
de informação é a tarefa de mineração de texto que pretende extrair
automaticamente informação de fontes de dados de texto não estruturados.
O objetivo do trabalho descrito nesta tese foi essencialmente focado em
construir soluções inovadoras para extração de informação biomédica a partir
da literatura científica, através do desenvolvimento de aplicações simples
de usar por programadores e bio-curadores, capazes de fornecer resultados
mais precisos, usáveis e de forma mais rápida. Começámos por abordar o
reconhecimento de nomes de conceitos - uma tarefa inicial e fundamental -
com o desenvolvimento de Gimli, uma solução baseada em inteligência artificial
que aplica uma estratégia incremental para otimizar as características
linguísticas extraídas do texto para cada tipo de conceito. Posteriormente,
Totum foi implementado para harmonizar nomes de conceitos provenientes
de sistemas heterogéneos, oferecendo uma solução mais robusta e com melhores
resultados. Esta aproximação recorre a informação contida em corpora
heterogéneos para disponibilizar uma solução não restrita às característica
de um único corpus. Uma vez que as soluções anteriores não oferecem
ligação dos nomes a bases de conhecimento, Neji foi construído para facilitar
o desenvolvimento de soluções complexas e personalizadas para o
reconhecimento de conceitos nomeados e respectiva normalização. Isto foi
conseguido através de uma plataforma modular e flexível focada em rapidez
e desempenho, integrando um vasto conjunto de módulos de processamento
optimizados para o domínio biomédico. De forma a disponibilizar identificação
de conceitos biomédicos em tempo real, BeCAS foi desenvolvido para
oferecer um serviço, aplicação e widget Web. A extracção de relações entre
conceitos também foi abordada através do desenvolvimento de TrigNER,
uma solução baseada em inteligência artificial para o reconhecimento de
palavras que desencadeiam a ocorrência de eventos biomédicos. Esta ferramenta
aplica um algoritmo automático para encontrar as melhores características
linguísticas e parâmetros para cada tipo de evento. Finalmente,
de forma a auxiliar o trabalho de bio-curadores, Egas foi desenvolvido para
suportar a anotação rápida, interactiva e colaborativa em tempo real de
documentos biomédicos, através da anotação manual e automática de conceitos
e relações de forma contextualizada. Resumindo, este trabalho contribuiu
para a actualização mais precisa das actuais bases de conhecimento,
auxiliando a formulação de hipóteses e a descoberta de novo conhecimento
Automatic Generation of Models of Microarchitectures
Detailed microarchitectural models are necessary to predict, explain, or optimize the performance of software running on modern microprocessors. Building such models often requires a significant manual effort, as the documentation provided by hardware manufacturers is typically not precise enough. The goal of this thesis is to develop techniques for generating microarchitectural models automatically. In the first part, we focus on recent x86 microarchitectures. We implement a tool to accurately evaluate small microbenchmarks using hardware performance counters. We then describe techniques to automatically generate microbenchmarks for measuring the performance of individual instructions and for characterizing cache architectures. We apply our implementations to more than a dozen different microarchitectures. In the second part of the thesis, we study more general techniques to obtain models of hardware components. In particular, we propose the concept of gray-box learning, and we develop a learning algorithm for Mealy machines that exploits prior knowledge about the system to be learned. Finally, we show how this algorithm can be adapted to minimize incompletely specified Mealy machines—a well-known NP-complete problem. Our implementation outperforms existing exact minimization techniques by several orders of magnitude on a number of hard benchmarks; it is even competitive with state-of-the-art heuristic approaches.Zur Vorhersage, Erklärung oder Optimierung der Leistung von Software auf modernen Mikroprozessoren werden detaillierte Modelle der verwendeten Mikroarchitekturen benötigt. Das Erstellen derartiger Modelle ist oft mit einem hohen Aufwand verbunden, da die erforderlichen Informationen von den Prozessorherstellern typischerweise nicht zur Verfügung gestellt werden. Das Ziel der vorliegenden Arbeit ist es, Techniken zu entwickeln, um derartige Modelle automatisch zu erzeugen. Im ersten Teil beschäftigen wir uns mit aktuellen x86-Mikroarchitekturen. Wir entwickeln zuerst ein Tool, das kleine Microbenchmarks mithilfe von Performance Countern auswerten kann. Danach beschreiben wir Techniken, um automatisch Microbenchmarks zu erzeugen, mit denen die Leistung einzelner Instruktionen gemessen sowie die Cache-Architektur charakterisiert werden kann. Im zweiten Teil der Arbeit betrachten wir allgemeinere Techniken, um Hardwaremodelle zu erzeugen. Wir schlagen das Konzept des “Gray-Box Learning” vor, und wir entwickeln einen Lernalgorithmus für Mealy-Maschinen, der bekannte Informationen über das zu lernende System berücksichtigt. Zum Abschluss zeigen wir, wie dieser Algorithmus auf das Problem der Minimierung unvollständig spezifizierter Mealy-Maschinen übertragen werden kann. Hierbei handelt es sich um ein bekanntes NP-vollständiges Problem. Unsere Implementierung ist in mehreren Benchmarks um Größenordnungen schneller als vorherige Ansätze
Fundamental Approaches to Software Engineering
This open access book constitutes the proceedings of the 23rd International Conference on Fundamental Approaches to Software Engineering, FASE 2020, which took place in Dublin, Ireland, in April 2020, and was held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020. The 23 full papers, 1 tool paper and 6 testing competition papers presented in this volume were carefully reviewed and selected from 81 submissions. The papers cover topics such as requirements engineering, software architectures, specification, software quality, validation, verification of functional and non-functional properties, model-driven development and model transformation, software processes, security and software evolution
Pseudo-contractions as Gentle Repairs
Updating a knowledge base to remove an unwanted consequence is a challenging task. Some of the original sentences must be either deleted or weakened in such a way that the sentence to be removed is no longer entailed by the resulting set. On the other hand, it is desirable that the existing knowledge be preserved as much as possible, minimising the loss of information. Several approaches to this problem can be found in the literature. In particular, when the knowledge is represented by an ontology, two different families of frameworks have been developed in the literature in the past decades with numerous ideas in common but with little interaction between the communities: applications of AGM-like Belief Change and justification-based Ontology Repair. In this paper, we investigate the relationship between pseudo-contraction operations and gentle repairs. Both aim to avoid the complete deletion of sentences when replacing them with weaker versions is enough to prevent the entailment of the unwanted formula. We show the correspondence between concepts on both sides and investigate under which conditions they are equivalent. Furthermore, we propose a unified notation for the two approaches, which might contribute to the integration of the two areas