18 research outputs found
Verifying And Interpreting Neural Networks using Finite Automata
Verifying properties and interpreting the behaviour of deep neural networks
(DNN) is an important task given their ubiquitous use in applications,
including safety-critical ones, and their blackbox nature. We propose an
automata-theoric approach to tackling problems arising in DNN analysis. We show
that the input-output behaviour of a DNN can be captured precisely by a
(special) weak B\"uchi automaton of exponential size. We show how these can be
used to address common verification and interpretation tasks like adversarial
robustness, minimum sufficient reasons etc. We report on a proof-of-concept
implementation translating DNN to automata on finite words for better
efficiency at the cost of losing precision in analysis
RML: Runtime Monitoring Language
Runtime verification is a relatively new software verification technique that aims to prove the correctness of a specific run of a program, rather than statically verify the code. The program is instrumented in order to collect all the relevant information, and the resulting trace of events is inspected by a monitor that verifies its compliance with respect to a specification of the expected properties of the system under scrutiny. Many languages exist that can be used to formally express the expected
behavior of a system, with different design choices and degrees of expressivity.
This thesis presents RML, a specification language designed for runtime verification, with the goal of being completely modular and independent from the instrumentation and the kind of system being monitored. RML is highly expressive, and allows one to express complex, parametric, non-context-free properties concisely. RML is compiled down to TC, a lower level calculus, which is fully formalized with a deterministic, rewriting-based semantics.
In order to evaluate the approach, an open source implementation has been developed, and several examples with Node.js programs have been tested. Benchmarks show the ability of the monitors automatically generated from RML specifications to effectively and efficiently verify complex properties
Survey on Instruction Selection: An Extensive and Modern Literature Review
Instruction selection is one of three optimisation problems involved in the
code generator backend of a compiler. The instruction selector is responsible
of transforming an input program from its target-independent representation
into a target-specific form by making best use of the available machine
instructions. Hence instruction selection is a crucial part of efficient code
generation.
Despite on-going research since the late 1960s, the last, comprehensive
survey on the field was written more than 30 years ago. As new approaches and
techniques have appeared since its publication, this brings forth a need for a
new, up-to-date review of the current body of literature. This report addresses
that need by performing an extensive review and categorisation of existing
research. The report therefore supersedes and extends the previous surveys, and
also attempts to identify where future research should be directed.Comment: Major changes: - Merged simulation chapter with macro expansion
chapter - Addressed misunderstandings of several approaches - Completely
rewrote many parts of the chapters; strengthened the discussion of many
approaches - Revised the drawing of all trees and graphs to put the root at
the top instead of at the bottom - Added appendix for listing the approaches
in a table See doc for more inf
Taming Strings in Dynamic Languages - An Abstract Interpretation-based Static Analysis Approach
In the recent years, dynamic languages such as JavaScript, Python or PHP, have found several fields of applications, thanks to the multiple features provided, the agility of deploying software and the seeming facility of learning such languages. In particular, strings play a central role in dynamic languages, as they can be implicitly converted to other type values, used to access object properties or transformed at run-time into executable code. In particular, the possibility to dynamically generate code as strings transformation breaks the typical assumption in static program analysis that the code is an immutable object, indeed static. This happens because program\u2019s essential data structures, such as the control-flow graph and the system of equation associated with the program to analyze, are themselves dynamically mutating objects. In a sentence: "You can\u2019t check the code you don\u2019t see". For all these reasons, dynamic languages still pone a big challenge for static program analysis, making it drastically hard and imprecise. The goal of this thesis is to tackle the problem of statically analyzing dynamic code by treating the code as any other data structure that can be statically analyzed, and by treating the static analyzer as any other function that can be recursively called. Since, in dynamically-generated code, the program code can be encoded as strings and then transformed into executable code, we first define a novel and suitable string abstraction, and the corresponding abstract semantics, able to both keep enough information to analyze string properties, in general, and keep enough information about the possible executable strings that may be converted to code. Such string abstraction will permits us to distill from a string abstract value the executable program expressed by it, allowing us to recursively call the static analyzer on the synthesized program. The final result of this thesis is an important first step towards a sound-by- construction abstract interpreter for real-world dynamic string manipulation languages, analyzing also string-to-code statements, that is the code that standard static analysis "can\u2019t see"
Finite state models in information extraction
Disertacija je posvećena istraživanju naučne oblasti nazvane ekstrakcija
informacija (engl. information extraction), koja predstavlja podoblast veštačke
inteligencije, a u sebi kombinuje i koristi tehnike i dostignuća više različitih oblasti
računarstva. Termin "ekstrakcija informacija" će biti korišćen u dva različita konteksta.
U jednom od njih misli se na ekstrakciju informacije kao naučnu oblast i tada će se
koristiti skraćenica IE, preuzeta iz anglosaksonske literature u značenju "Information
Extraction". U drugom slučaju, kada se bude mislilo na sam proces i postupak
izdvajanja informacija iz teksta, koristiće se oblik "ekstrakcija informacija".
Ova disertacija predstavlja, pored pregleda postojećih metoda iz ove oblasti, i
jedan originalni pristup i metod za ekstrakciju informacija baziran na konačnim
transduktorima. Tokom istraživanja i rada na disertaciji, a primenom pomenutog
metoda, kao rezultat formirana je baza podataka o mikroorganizmima koja sadrži
fenotipske i genotipske karakteristike za 2412 vrsta i 873 rodova, namenjena za
istraživanja iz oblasti bioinformatike i genetike. Baza i korišćeni metod su detaljno
prikazani u nekoliko radova, publikovanih u časopisima ili izlaganih na međunarodnim
konferencijama (Pajić, 2011; Pajić i sar. 2011a; Pajić i sar. 2011b)
U glavi 1 dat je uvod u oblast ekstrakcije informacije, unutar koga je opisan
istorijat i razvoj metoda ove oblasti. Dalje je opisana klasifikacija tekstualnih resursa
nad kojima se vrši ekstrakcija informacija, kao i klasifikacija samih informacija. Na
kraju glave 1 oblast ekstrakcije informacije je upoređena sa drugim srodnim
disciplinama računarstva.
Glava 2 je posvećena prikazu teorijskih osnova na kojima su zasnovana
istraživanja ove disertacije. Razmatrana je teorija formalnih jezika i modela konačnih
stanja, kao i njihova uzajamna veza i veza sa ekstrakcijom informacija. Akcenat je
stavljen na konačne modele i metode koji su zasnovani na modelima konačnih stanja.
Ovi metodi pokazuju veću preciznost od drugih metoda za ekstrakciju informacije, te su
nezamenljivi u situacijama kada je tačnost izdvojenih podataka iz teksta od presudnog
značaja. Pojedini pojmovi ekstrakcije informacija - jezik relevantnih informacija, jezik
izdvojenih informacija, pravila ekstrakcije, definisani su iz ugla teorije formalnih jezika.
Formulisano je i dokazano osnovno svojstvo relacije transdukcije za zadato pravilo
ekstrakcije. Definisan je i pojam jezika konteksta informacija i dokazano je njegovo
svojstvo regularnosti...This dissertation is on research and studying in scientific field called
information extraction, which can be seen as a sub-area of artificial intelligence and
which combines and uses techniques and achievements of several computer science
areas. The term „information extraction“ will be used in two different contexts. In the
first one, the term will refer to the scientific area and the acronym IE will be used in that
case. In the second case, this term will refer to the very process of extracting
information.
Beside the IE state-of-the-art survey, an original approach and a method for
information extraction based on finite state transducers are presented. A database with
microbial phenotype and genotype characteristics, for 2412 species and 873 genera has
been created, as a result of the research and the work on the dissertation. The database is
intended for research, in bioinformatics and genetics. The method used for the creation
of the database and the database itself are described in details and published in several
journals and conference proceedings (Pajić, 2011; Pajić et al. 2011a; Pajić et al. 2011b).
In the Section 1, the introduction to IE is given, together with the history of
development of methods in this area. The classification of textual resources that are
used for information extraction and classification of the information itself are described.
At the end of the Section 1, IE is compared with other related disciplines of computer
science.
Section 2 contains some excerpts from formal language theory and abstract
automata, on which the dissertation is based. The mutual relationship between these two
areas and their connection with IE are described. The emphasis is put on the final state
models and methods based on them. These methods show higher precision than other
methods for extracting information, and are indispensable in situations where the
accuracy of data extracted from the text is of crucial importance. Some specific terms of
information extraction - the language of the relevant information, the language of
extracted information and extraction rules, are defined from the perspective of formal
language theory. The basic feature of the transduction relation for the given rule
extraction is formulated and proved. The language of information context is defined and
its regularilty is proven..
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum