194 research outputs found

    Automatic program analysis in a Prolog Intelligent Teaching System

    Get PDF

    Improving Programming Support for Hardware Accelerators Through Automata Processing Abstractions

    Full text link
    The adoption of hardware accelerators, such as Field-Programmable Gate Arrays, into general-purpose computation pipelines continues to rise, driven by recent trends in data collection and analysis as well as pressure from challenging physical design constraints in hardware. The architectural designs of many of these accelerators stand in stark contrast to the traditional von Neumann model of CPUs. Consequently, existing programming languages, maintenance tools, and techniques are not directly applicable to these devices, meaning that additional architectural knowledge is required for effective programming and configuration. Current programming models and techniques are akin to assembly-level programming on a CPU, thus placing significant burden on developers tasked with using these architectures. Because programming is currently performed at such low levels of abstraction, the software development process is tedious and challenging and hinders the adoption of hardware accelerators. This dissertation explores the thesis that theoretical finite automata provide a suitable abstraction for bridging the gap between high-level programming models and maintenance tools familiar to developers and the low-level hardware representations that enable high-performance execution on hardware accelerators. We adopt a principled hardware/software co-design methodology to develop a programming model providing the key properties that we observe are necessary for success, namely performance and scalability, ease of use, expressive power, and legacy support. First, we develop a framework that allows developers to port existing, legacy code to run on hardware accelerators by leveraging automata learning algorithms in a novel composition with software verification, string solvers, and high-performance automata architectures. Next, we design a domain-specific programming language to aid programmers writing pattern-searching algorithms and develop compilation algorithms to produce finite automata, which supports efficient execution on a wide variety of processing architectures. Then, we develop an interactive debugger for our new language, which allows developers to accurately identify the locations of bugs in software while maintaining support for high-throughput data processing. Finally, we develop two new automata-derived accelerator architectures to support additional applications, including the detection of security attacks and the parsing of recursive and tree-structured data. Using empirical studies, logical reasoning, and statistical analyses, we demonstrate that our prototype artifacts scale to real-world applications, maintain manageable overheads, and support developers' use of hardware accelerators. Collectively, the research efforts detailed in this dissertation help ease the adoption and use of hardware accelerators for data analysis applications, while supporting high-performance computation.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155224/1/angstadt_1.pd

    Methodologies and Toolflows for the Predictable Design of Reliable and Low-Power NoCs

    Get PDF
    There is today the unmistakable need to evolve design methodologies and tool ows for Network-on-Chip based embedded systems. In particular, the quest for low-power requirements is nowadays a more-than-ever urgent dilemma. Modern circuits feature billion of transistors, and neither power management techniques nor batteries capacity are able to endure the increasingly higher integration capability of digital devices. Besides, power concerns come together with modern nanoscale silicon technology design issues. On one hand, system failure rates are expected to increase exponentially at every technology node when integrated circuit wear-out failure mechanisms are not compensated for. However, error detection and/or correction mechanisms have a non-negligible impact on the network power. On the other hand, to meet the stringent time-to-market deadlines, the design cycle of such a distributed and heterogeneous architecture must not be prolonged by unnecessary design iterations. Overall, there is a clear need to better discriminate reliability strategies and interconnect topology solutions upfront, by ranking designs based on power metric. In this thesis, we tackle this challenge by proposing power-aware design technologies. Finally, we take into account the most aggressive and disruptive methodology for embedded systems with ultra-low power constraints, by migrating NoC basic building blocks to asynchronous (or clockless) design style. We deal with this challenge delivering a standard cell design methodology and mainstream CAD tool ows, in this way partially relaxing the requirement of using asynchronous blocks only as hard macros

    Towards understanding the challenges faced by machine learning software developers and enabling automated solutions

    Get PDF
    Modern software systems are increasingly including machine learning (ML) as an integral component. However, we do not yet understand the difficulties faced by software developers when learning about ML libraries and using them within their systems. To fill that gap this thesis reports on a detailed (manual) examination of 3,243 highly-rated Q&A posts related to ten ML libraries, namely Tensorflow, Keras, scikitlearn, Weka, Caffe, Theano, MLlib, Torch, Mahout, and H2O, on Stack Overflow, a popular online technical Q&A forum. Our findings reveal the urgent need for software engineering (SE) research in this area. The second part of the thesis particularly focuses on understanding the Deep Neural Network (DNN) bug characteristics. We study 2,716 high-quality posts from Stack Overflow and 500 bug fix commits from Github about five popular deep learning libraries Caffe, Keras, Tensorflow, Theano, and Torch to understand the types of bugs, their root causes and impacts, bug-prone stage of deep learning pipeline as well as whether there are some common antipatterns found in this buggy software. While exploring the bug characteristics, our findings imply that repairing software that uses DNNs is one such unmistakable SE need where automated tools could be beneficial; however, we do not fully understand challenges to repairing and patterns that are utilized when manually repairing DNNs. So, the third part of this thesis presents a comprehensive study of bug fix patterns to address these questions. We have studied 415 repairs from Stack Overflow and 555 repairs from Github for five popular deep learning libraries Caffe, Keras, Tensorflow, Theano, and Torch to understand challenges in repairs and bug repair patterns. Our key findings reveal that DNN bug fix patterns are distinctive compared to traditional bug fix patterns and the most common bug fix patterns are fixing data dimension and neural network connectivity. Finally, we propose an automatic technique to detect ML Application Programming Interface (API) misuses. We started with an empirical study to understand ML API misuses. Our study shows that ML API misuse is prevalent and distinct compared to non-ML API misuses. Inspired by these findings, we contributed Amimla (Api Misuse In Machine Learning Apis) an approach and a tool for ML API misuse detection. Amimla relies on several technical innovations. First, we proposed an abstract representation of ML pipelines to use in misuse detection. Second, we proposed an abstract representation of neural networks for deep learning related APIs. Third, we have developed a representation strategy for constraints on ML APIs. Finally, we have developed a misuse detection strategy for both single and multi-APIs. Our experimental evaluation shows that Amimla achieves a high average accuracy of ∼80% on two benchmarks of misuses from Stack Overflow and Github

    Protecting Systems From Exploits Using Language-Theoretic Security

    Get PDF
    Any computer program processing input from the user or network must validate the input. Input-handling vulnerabilities occur in programs when the software component responsible for filtering malicious input---the parser---does not perform validation adequately. Consequently, parsers are among the most targeted components since they defend the rest of the program from malicious input. This thesis adopts the Language-Theoretic Security (LangSec) principle to understand what tools and research are needed to prevent exploits that target parsers. LangSec proposes specifying the syntactic structure of the input format as a formal grammar. We then build a recognizer for this formal grammar to validate any input before the rest of the program acts on it. To ensure that these recognizers represent the data format, programmers often rely on parser generators or parser combinators tools to build the parsers. This thesis propels several sub-fields in LangSec by proposing new techniques to find bugs in implementations, novel categorizations of vulnerabilities, and new parsing algorithms and tools to handle practical data formats. To this end, this thesis comprises five parts that tackle various tenets of LangSec. First, I categorize various input-handling vulnerabilities and exploits using two frameworks. First, I use the mismorphisms framework to reason about vulnerabilities. This framework helps us reason about the root causes leading to various vulnerabilities. Next, we built a categorization framework using various LangSec anti-patterns, such as parser differentials and insufficient input validation. Finally, we built a catalog of more than 30 popular vulnerabilities to demonstrate the categorization frameworks. Second, I built parsers for various Internet of Things and power grid network protocols and the iccMAX file format using parser combinator libraries. The parsers I built for power grid protocols were deployed and tested on power grid substation networks as an intrusion detection tool. The parser I built for the iccMAX file format led to several corrections and modifications to the iccMAX specifications and reference implementations. Third, I present SPARTA, a novel tool I built that generates Rust code that type checks Portable Data Format (PDF) files. The type checker I helped build strictly enforces the constraints in the PDF specification to find deviations. Our checker has contributed to at least four significant clarifications and corrections to the PDF 2.0 specification and various open-source PDF tools. In addition to our checker, we also built a practical tool, PDFFixer, to dynamically patch type errors in PDF files. Fourth, I present ParseSmith, a tool to build verified parsers for real-world data formats. Most parsing tools available for data formats are insufficient to handle practical formats or have not been verified for their correctness. I built a verified parsing tool in Dafny that builds on ideas from attribute grammars, data-dependent grammars, and parsing expression grammars to tackle various constructs commonly seen in network formats. I prove that our parsers run in linear time and always terminate for well-formed grammars. Finally, I provide the earliest systematic comparison of various data description languages (DDLs) and their parser generation tools. DDLs are used to describe and parse commonly used data formats, such as image formats. Next, I conducted an expert elicitation qualitative study to derive various metrics that I use to compare the DDLs. I also systematically compare these DDLs based on sample data descriptions available with the DDLs---checking for correctness and resilience

    Developing Secure Software With C And C++: A Different Approach

    Get PDF
    Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2005Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2005Ağa bağlı bilgisayarlar yaygınlaştıkça, günlük işlerin yürütülmesinden devlet sistemlerinin otomasyonuna kadar her seviyede rol almaya başlamışlar ve bu sistemlerin güvenliği de kritik hal almıştır. Bilgi işlem sistemlerinin güvene layık olabilmesi için bütün bileşenlerinin güvenli olması gerekir, yazılım da bu bileşenlerden belki de en önemlisidir. Yazılımların, yaşam süreçlerinin bütün aşamalarında güvenli bir yapıyla sonuçlanacak şekilde tasarlanmaları gerekmektedir. Bu makale, bir yazılımın yaşam sürecini baştan sona ele almaktadır. Güvene layık bir yazılım için her aşamada, nelere dikkat edilmesi gerektiği anlatılmış, hangi tasarım seçeneklerinin olduğu sıralanmış, farklı metotlardan hangilerinin izlenmesinin daha iyi olacağı tartışılmış ve hangi araçların kullanılabileceği incelenmiştir. Bu sayede geliştirme veya bakım gibi değişik aşamalardaki projelere referans kaynağı olarak hizmet verebilmektedir. Bu makalede ele alınan yaşam süreci, yazılım mühendisliğinde sıklıkla başvuru olarak kullanılan, süreci isteklerin tanımı, tasarım, geliştirme, kontrol etme ve bakım olarak bölümleyen “Şelale Yaşam Süreci”dir. Yeni nesil programlama dilleri çıktıkça, C/C++ ve Birleştirici gibi düşük seviye dillerin yeni öğrencilerce benimsenmesi azalmaktadır. Buna ve başka sebeplere de bağlı olarak bu dillerde tecrübeli eleman eksikliği baş gösterdikçe, zaten güvenliğin sağlanmasının göreceli olarak daha zor olduğu bu ortamlarda ciddi güvenlik açıkları oluşmaktadır. Dünya üzerindeki kod tabanının çoğunluğunun halen bu dillerden oluşması durumu daha kritik yapmaktadır. Bu makalede bahsedilen konuların çoğunluğu dilden bağımsız olsa da, ilgili bölümlerde, az önce bahsedilen sorunu göz önüne alarak C/C++ ve Birleştirici dilleri üstünde durulmuştur. Sonuç olarak, yazılım güvenliğinin etkin olarak sağlanabilmesi için, güvenliğin bütün yaşam süreci evrelerinde ele alınması gerekliliği gösterilmiştir. Ayrıca, yaşam sürecinin aşamalarından bir çoğuna, daha önce bu kapsamda uygulanmamış olan yeni yöntemler önerilmiştir.As networked computing penetrates daily life more and more, it becomes more common in every level from daily life to automation of government systems. In order computing systems to be secure, each and every of their components must be secure, too. Software is most important component among those. Each phase of software lifecycle must be implemented in a secure fashion. This thesis is inspecting lifecycle of software from beginning to the end and aligns the new ideas that it is bringing to the lifecycle. After giving necessary background information about the subject, new ideas have been presented, examples have been given and possible other options have been discussed. During explaining most of the subjects, the topics that is considered to be complimentary is either added or referred to. Thanks to that, this thesis can be a reference source to projects in different phases like implementation and maintenance. Waterfall lifecycle model, which is used frequently in software development projects and divides software projects into phases as analysis of requirements, design, implementation, verification and maintenance, is used as a template in this thesis. As new generations of programming languages emerge, adoption of low-level languages such as C/C++ and assembly by new students is decreasing. As lack of experienced staff shows up itself due to this and other causes, severe vulnerabilities are happening in such environments, where developing of secure software is already proven to be hard. The fact that majority of current code base in the world is in those languages makes the situation even more critical. Although most of the subjects in this thesis are programming language independent, C/C++ and assembler language problems are especially covered because of the reasons just mentioned. As a result, it has been shown that security countermeasures must be taken in all phases of software lifecycle in order to ensure high level of security throughout the application. Furthermore, new ideas of security countermeasures have been brought to many of the phases of software lifecycle.Yüksek LisansM.Sc

    Space station data system analysis/architecture study. Task 2: Options development, DR-5. Volume 3: Programmatic options

    Get PDF
    Task 2 in the Space Station Data System (SSDS) Analysis/Architecture Study is the development of an information base that will support the conduct of trade studies and provide sufficient data to make design/programmatic decisions. This volume identifies the preferred options in the programmatic category and characterizes these options with respect to performance attributes, constraints, costs, and risks. The programmatic category includes methods used to administrate/manage the development, operation and maintenance of the SSDS. The specific areas discussed include standardization/commonality; systems management; and systems development, including hardware procurement, software development and system integration, test and verification
    corecore