2,988 research outputs found

    Applying Formal Methods to Networking: Theory, Techniques and Applications

    Full text link
    Despite its great importance, modern network infrastructure is remarkable for the lack of rigor in its engineering. The Internet which began as a research experiment was never designed to handle the users and applications it hosts today. The lack of formalization of the Internet architecture meant limited abstractions and modularity, especially for the control and management planes, thus requiring for every new need a new protocol built from scratch. This led to an unwieldy ossified Internet architecture resistant to any attempts at formal verification, and an Internet culture where expediency and pragmatism are favored over formal correctness. Fortunately, recent work in the space of clean slate Internet design---especially, the software defined networking (SDN) paradigm---offers the Internet community another chance to develop the right kind of architecture and abstractions. This has also led to a great resurgence in interest of applying formal methods to specification, verification, and synthesis of networking protocols and applications. In this paper, we present a self-contained tutorial of the formidable amount of work that has been done in formal methods, and present a survey of its applications to networking.Comment: 30 pages, submitted to IEEE Communications Surveys and Tutorial

    Approximations in Learning & Program Analysis

    Get PDF
    In this work we compare and contrast the approximations made in the problems of Data Compression, Program Analysis and Supervised Machine Learning. G\uf6del\u2019s Incompleteness Theorem mandates that any formal system rich enough to include integers will have unprovable truths. Thus non computable problems abound, including, but not limited to, Program Analysis, Data Compression and Machine Learning. Indeed, it can be shown that there are more non-computable functions than computable. Due to non- computability, precise solutions for these problems are not feasible, and only approximate solutions may be computed. Presently, each of these problems of Data Compression, Machine Learning and Program Analysis is studied independently. Each problem has it\u2019s own multitude of abstractions, algorithms and notions of tradeoffs among the various parameters. It would be interesting to have a unified framework, across disciplines, that makes explicit the abstraction specifications and ensuing tradeoffs. Such a framework would promote inter-disciplinary research and develop a unified body of knowledge to tackle non-computable problems. As a small step to that larger goal, we propose an Information Oriented Model of Computation that allows comparing the approximations used in Data Compression, Program Analysis and Machine Learning. To the best of our knowledge, this is the first work to propose a method for systematic comparison of approximations across disciplines. The model describes computation as set reconstruction. Non-computability is then presented as inability to perfectly reconstruct sets. In an effort to compare and contrast the approximations, select algorithms for Data Compression, Machine Learning and Program Analysis are analyzed using our model. We were able to relate the problems of Data Compression, Machine Learning and Program Analysis as specific instances of the general problem of approximate set reconstruction. We demonstrate the use of abstract interpreters in compression schemes. We then compare and contrast the approximations in Program Analysis and Supervised Machine Learning. We demonstrate the use of ordered structures, fixpoint equations and least fixpoint approximation computations, all characteristic of Abstract Interpretation (Program Analysis) in Machine Learning algorithms. We also present the idea that widening, like regression, is an inductive learner. Regression generalizes known states to a hypothesis. Widening generalizes abstract states on a iteration chain to a fixpoint. While Regression usually aims to minimize the total error (sum of false positives and false negatives), Widening aims for soundness and hence errs on the side of false positives to have zero false negatives. We use this duality to derive a generic widening operator from regression on the set of abstract states. The results of the dissertation are the first steps towards a unified approach to approximate computation. Consequently, our preliminary results lead to a lot more interesting questions, some of which we have tried to discuss in the concluding chapter

    Formal Runtime Error Detection During Development in the Automotive Industry

    Full text link
    Modern automotive software is highly complex and consists of millions lines of code. For safety-relevant automotive software, it is recommended to use sound static program analysis to prove the absence of runtime errors. However, the analysis is often perceived as burdensome by developers because it runs for a long time and produces many false alarms. If the analysis is performed on the integrated software system, there is a scalability problem, and the analysis is only possible at a late stage of development. If the analysis is performed on individual modules instead, this is possible at an early stage of development, but the usage context of modules is missing, which leads to too many false alarms. In this case study, we present how automatically inferred contracts add context to module-level analysis. Leveraging these contracts with an off-the-shelf tool for abstract interpretation makes module-level analysis more precise and more scalable. We evaluate this framework quantitatively on industrial case studies from different automotive domains. Additionally, we report on our qualitative experience for the verification of large-scale embedded software projects.Comment: to be published in VMCAI 202

    A Review of Formal Methods applied to Machine Learning

    Full text link
    We review state-of-the-art formal methods applied to the emerging field of the verification of machine learning systems. Formal methods can provide rigorous correctness guarantees on hardware and software systems. Thanks to the availability of mature tools, their use is well established in the industry, and in particular to check safety-critical applications as they undergo a stringent certification process. As machine learning is becoming more popular, machine-learned components are now considered for inclusion in critical systems. This raises the question of their safety and their verification. Yet, established formal methods are limited to classic, i.e. non machine-learned software. Applying formal methods to verify systems that include machine learning has only been considered recently and poses novel challenges in soundness, precision, and scalability. We first recall established formal methods and their current use in an exemplar safety-critical field, avionic software, with a focus on abstract interpretation based techniques as they provide a high level of scalability. This provides a golden standard and sets high expectations for machine learning verification. We then provide a comprehensive and detailed review of the formal methods developed so far for machine learning, highlighting their strengths and limitations. The large majority of them verify trained neural networks and employ either SMT, optimization, or abstract interpretation techniques. We also discuss methods for support vector machines and decision tree ensembles, as well as methods targeting training and data preparation, which are critical but often neglected aspects of machine learning. Finally, we offer perspectives for future research directions towards the formal verification of machine learning systems

    A gentle introduction to formal verification of computer systems by abstract interpretation

    Get PDF
    International audienceWe introduce and illustrate basic notions of abstract interpretation theory and its applications by relying on the readers general scientific culture and basic knowledge of computer programming

    Real-time human ambulation, activity, and physiological monitoring:taxonomy of issues, techniques, applications, challenges and limitations

    Get PDF
    Automated methods of real-time, unobtrusive, human ambulation, activity, and wellness monitoring and data analysis using various algorithmic techniques have been subjects of intense research. The general aim is to devise effective means of addressing the demands of assisted living, rehabilitation, and clinical observation and assessment through sensor-based monitoring. The research studies have resulted in a large amount of literature. This paper presents a holistic articulation of the research studies and offers comprehensive insights along four main axes: distribution of existing studies; monitoring device framework and sensor types; data collection, processing and analysis; and applications, limitations and challenges. The aim is to present a systematic and most complete study of literature in the area in order to identify research gaps and prioritize future research directions

    Partial (In)Completeness in Abstract Interpretation

    Get PDF
    In the abstract interpretation framework, completeness represents an optimal simulation by the abstract operators over the behavior of the concrete operators. This corresponds to an ideal (often rare) feature where there is no loss of information accumulated in abstract computations with respect to the properties encoded by the underlying abstract domains. In this thesis, we deal with the opposite notion of completeness in abstract interpretation, that is, incompleteness, applied to two different contexts: static program analysis and formal languages over the Chomsky's hierarchy. In static program analysis, completeness is a very rare condition to be satisfied in practice and only the straightforward abstractions are complete for all programs, thus, we usually deal with incompleteness. For this reason, we introduce the notion of partial completeness. Partial completeness is a weaker notion of completeness which requires the imprecision of the analysis to be limited. A partially complete abstract interpretation allows some false alarms to be reported, but their number is bounded by a constant. We collect in partial completeness classes all the programs whose abstract interpretations share the same upper bound of imprecision. We then focus on the investigation of the computational limits of the class of partially complete programs with respect to a given abstract domain. Moreover, we show that the class of all partially complete programs is non-recursively enumerable, and its complement is productive whenever we allow an unlimited imprecision in the abstract domain. Finally, we formalize the local partial completeness class within which we require partial completeness only on some specific inputs. We prove that this last class of programs is a recursively enumerable set under a structural hypothesis on the underlying abstract domain, by showing an algorithm capable of proving the local partial completeness of a program with respect to a given abstract domain and an upper bound of imprecision. In formal language theory, we want to study a possible reformulation, by abstract interpretation, of classes of languages in the Chomsky's hierarchy, and, by exploiting the incompleteness of languages abstractions, we want to define separation results between classes of languages. To this end, we do a first step into this direction by studying the relation between indexed languages (recognized by indexed grammars) and context-free languages. Indexed grammars are a generalization of context-free grammars which recognize a proper subset of context-sensitive languages, the so called indexed languages. %The class of languages recognized by indexed grammars is called indexed languages and they correspond to the languages recognized by nested stack automata. For example, indexed grammars can recognize the language anbncnmidngeq1{a^nb^nc^n mid ngeq 1 } which is not context-free, but they cannot recognize (abn)nmidngeq1{ (ab^n)^n mid ngeq 1} which is context-sensitive. Indexed grammars identify a set of languages that are more expressive than context-free languages, while having decidability results that lie in between the ones of context-free and context-sensitive languages. We provide a fixpoint characterization of the languages recognized by an indexed grammar and we study possible ways to abstract, in the abstract interpretation sense, these languages and their grammars into context-free and regular languages. We formalize the separation class between indexed and context-free languages, i.e., all the languages that cannot be generated by a context-free grammar, as an instance of incompleteness of stack elimination abstraction over indexed grammars

    Automating test oracles generation

    Get PDF
    Software systems play a more and more important role in our everyday life. Many relevant human activities nowadays involve the execution of a piece of software. Software has to be reliable to deliver the expected behavior, and assessing the quality of software is of primary importance to reduce the risk of runtime errors. Software testing is the most common quality assessing technique for software. Testing consists in running the system under test on a finite set of inputs, and checking the correctness of the results. Thoroughly testing a software system is expensive and requires a lot of manual work to define test inputs (stimuli used to trigger different software behaviors) and test oracles (the decision procedures checking the correctness of the results). Researchers have addressed the cost of testing by proposing techniques to automatically generate test inputs. While the generation of test inputs is well supported, there is no way to generate cost-effective test oracles: Existing techniques to produce test oracles are either too expensive to be applied in practice, or produce oracles with limited effectiveness that can only identify blatant failures like system crashes. Our intuition is that cost-effective test oracles can be generated using information produced as a byproduct of the normal development activities. The goal of this thesis is to create test oracles that can detect faults leading to semantic and non-trivial errors, and that are characterized by a reasonable generation cost. We propose two ways to generate test oracles, one derives oracles from the software redundancy and the other from the natural language comments that document the source code of software systems. We present a technique that exploits redundant sequences of method calls encoding the software redundancy to automatically generate test oracles named CCOracles. We describe how CCOracles are automatically generated, deployed, and executed. We prove the effectiveness of CCOracles by measuring their fault-finding effectiveness when combined with both automatically generated and hand-written test inputs. We also present Toradocu, a technique that derives executable specifications from Javadoc comments of Java constructors and methods. From such specifications, Toradocu generates test oracles that are then deployed into existing test suites to assess the outputs of given test inputs. We empirically evaluate Toradocu, showing that Toradocu accurately translates Javadoc comments into procedure specifications. We also show that Toradocu oracles effectively identify semantic faults in the SUT. CCOracles and Toradocu oracles stem from independent information sources and are complementary in the sense that they check different aspects of the system undertest
    • 

    corecore