19 research outputs found

    Debugging Native Extensions of Dynamic Languages

    Full text link
    Many dynamic programming languages such as Ruby and Python enable developers to use so called native extensions, code implemented in typically statically compiled languages like C and C++. However, debuggers for these dynamic languages usually lack support for also debugging these native extensions. GraalVM can execute programs implemented in various dynamic programming languages and, by using the LLVM-IR interpreter Sulong, also their native extensions. We added support for source-level debugging to Sulong based on GraalVM's debugging framework by associating run-time debug information from the LLVM-IR level to the original program code. As a result, developers can now use GraalVM to debug source code written in multiple LLVM-based programming languages as well as programs implemented in various dynamic languages that invoke it in a common debugger front-end.Comment: 7 pages, 7 figures, accepted at 15th International Conference on Managed Languages & Runtimes (ManLang'18

    Sub-method, partial behavioral reflection with Reflectivity: Looking back on 10 years of use

    Full text link
    Context. Refining or altering existing behavior is the daily work of every developer, but that cannot be always anticipated, and software sometimes cannot be stopped. In such cases, unanticipated adaptation of running systems is of interest for many scenarios, ranging from functional upgrades to on-the-fly debugging or monitoring of critical applications. Inquiry. A way of altering software at run time is using behavioral reflection, which is particularly well-suited for unanticipated adaptation of real-world systems. Partial behavioral reflection is not a new idea, and for years many efforts have been made to propose a practical way of expressing it. All these efforts resulted in practical solutions, but which introduced a semantic gap between the code that requires adaptation and the expression of the partial behavior. For example, in Aspect-Oriented Programming, a pointcut description is expressed in another language, which introduces a new distance between the behavior expression (the Advice) and the source code in itself. Approach. Ten years ago, the idea of closing the gap between the code and the expression of the partial behavior led to the implementation of the Reflectivity framework. Using Reflectivity, developers annotate Abstract Syntax Tree (AST) nodes with meta-behavior which is taken into account by the compiler to produce behavioral variations. In this paper, we present Reflectivity, its API, its implementation and its usage in Pharo. We reflect on ten years of use of Reflectivity, and show how it has been used as a basic building block of many innovative ideas. Knowledge. Reflectivity brings a practical way of working at the AST level, which is a high-level representation of the source code manipulated by software developers. It enables a powerful way of dynamically add and modify behavior. Reflectivity is also a flexible mean to bridge the gap between the expression of the meta-behavior and the source code. This ability to apply unanticipated adaptation and to provide behavioral reflection led to many experiments and projects during this last decade by external users. Existing work use Reflectivity to implement reflective libraries or languages extensions, featherweight code instrumentation, dynamic software update, debugging tools and visualization and software analysis tools. Grounding. Reflectivity is actively used in research projects. During the past ten years, it served as a support, either for implementation or as a fundamental base, for many research work including PhD theses, conference, journal and workshop papers. Reflectivity is now an important library of the Pharo language, and is integrated at the heart of the platform. Importance. Reflectivity exposes powerful abstractions to deal with partial behavioral adaptation, while providing a mature framework for unanticipated, non-intrusive and partial behavioral reflection based on AST annotation. Furthermore, even if Reflectivity found its home inside Pharo, it is not a pure Smalltalk-oriented solution. As validation over the practical use of Reflectivity in dynamic object-oriented languages, the API has been ported to Python. Finally, the AST annotation feature of Reflectivity opens new experimentation opportunities about the control that developers could gain on the behavior of their own software

    Execution vs Parse-based Language Servers: Tradeoffs and Opportunities for Language-Agnostic Tooling for Dynamic Languages

    Get PDF
    With the wide adoption of the language server protocol, the desire to have IDE-style tooling even for niche and research languages has exploded. The Truffle language framework facilitates this desire by offering an almost zero-effort approach to language implementers to providing IDE features. However, this existing approach needs to execute the code being worked on to capture much of the information needed for an IDE, ideally with full unit-test coverage. To capture information more reliably and avoid the need to execute the code being worked on, we propose a new parse-based design for language servers. Our solution provides a language-agnostic interface for structural information, with which we can support most common IDE features for dynamic languages. Comparing the two approaches, we find that our new parse-based approach requires only a modest development effort for each language and has only minor tradeoffs for precision, for instance for code completion, compared to Truffle's execution-based approach. Further, we show that less than 1,000 lines of code capture enough details to provide much of the typical IDE functionality, with an order of magnitude less code than ad hoc language servers. We tested our approach for the custom parsers of Newspeak and SOM, as well as SimpleLanguage's ANTLR grammar without any changes to it. Combining both parse and execution-based approaches has the potential to provide good and precise IDE tooling for a wide range of languages with only small development effort. By itself, our approach would be a good addition to the many libraries implementing the language server protocol to enable low-effort implementations of IDE features

    Towards Self-Adaptable Languages

    Get PDF
    International audienceOver recent years, self-adaptation has become a concern for many software systems that have to operate in complex and changing environments. At the core of self-adaptation, there is a feedback loop and associated trade-off reasoning to decide on the best course of action. However, existing software languages do not abstract the development and execution of such feedback loops for self-adaptable systems. Developers have to fall back to ad-hoc solutions to implement self-adaptable systems, often with wide-ranging design implications (e.g., explicit MAPE-K loop). Furthermore, existing software languages do not capitalize on monitored usage data of a language and its modeling environment. This hinders the continuous and automatic evolution of a software language based on feedback loops from the modeling environment and runtime software system. To address the aforementioned issues, this paper introduces the concept of Self-Adaptable Language (SAL) to abstract the feedback loops at both system and language levels. We propose L-MODA (Language, Models, and Data) as a conceptual reference framework that characterizes the possible feedback loops abstracted into a SAL. To demonstrate SALs, we present emerging results on the abstraction of the system feedback loop into the language semantics. We report on the concept of Self-Adaptable Virtual Machines as an example of semantic adaptation in a language interpreter and present a roadmap for SALs

    Avatar: A Framework to Support Dynamic Security Analysis of Embedded Systems’ Firmwares

    Full text link

    Transparent and Precise Malware Analysis Using Virtualization: From Theory to Practice

    Get PDF
    Dynamic analysis is an important technique used in malware analysis and is complementary to static analysis. Thus far, virtualization has been widely adopted for building fine-grained dynamic analysis tools and this trend is expected to continue. Unlike User/Kernel space malware analysis platforms that essentially co-exist with malware, virtualization based platforms benefit from isolation and fine-grained instrumentation support. Isolation makes it more difficult for malware samples to disrupt analysis and fine-grained instrumentation provides analysts with low level details, such as those at the machine instruction level. This in turn supports the development of advanced analysis tools such as dynamic taint analysis and symbolic execution for automatic path exploration. The major disadvantage of virtualization based malware analysis is the loss of semantic information, also known as the semantic gap problem. To put it differently, since analysis takes place at the virtual machine monitor where only the raw system state (e.g., CPU and memory) is visible, higher level constructs such as processes and files must be reconstructed using the low level information. The collection of techniques used to bridge semantic gaps is known as Virtual Machine Introspection. Virtualization based analysis platforms can be further separated into emulation and hardware virtualization. Emulators have the advantages of flexibility of analysis tool development and efficiency for fine-grained analysis; however, emulators suffer from the transparency problem. That is, malware can employ methods to determine whether it is executing in an emulated environment versus real hardware and cease operations to disrupt analysis if the machine is emulated. In brief, emulation based dynamic analysis has advantages over User/Kernel space and hardware virtualization based techniques, but it suffers from semantic gap and transparency problems. These problems have been exacerbated by recent discoveries of anti-emulation malware that detects emulators and Android malware with two semantic gaps, Java and native. Also, it is foreseeable that malware authors will have a similar response to taint analysis. In other words, once taint analysis becomes widely used to understand how malware operates, the authors will create new malware that attacks the imprecisions in taint analysis implementations and induce false-positives and false-negatives in an effort to frustrate analysts. This dissertation addresses these problems by presenting concepts, methods and techniques that can be used to transparently and precisely analyze both desktop and mobile malware using virtualization. This is achieved in three parts. First, precise heterogeneous record and replay is presented as a means to help emulators benefit from the transparency characteristics of hardware virtualization. This technique is implemented in a tool called V2E that uses KVM for recording and TEMU for replaying and analysis. It was successfully used to analyze real-world anti-emulation malware that evaded analysis using TEMU alone. Second, the design of an emulation based Android malware analysis platform that uses virtual machine introspection to bridge both the Java and native level semantic gaps as well as seamlessly bind the two views together into a single view is presented. The core introspection and instrumentation techniques were implemented in a new analysis platform called DroidScope that is based on the Android emulator. It was successfully used to analyze two real-world Android malware samples that have cooperating Java and native level components. Taint analysis was also used to study their information ex-filtration behaviors. Third, formal methods for studying the sources of false-positives and false-negatives in dynamic taint analysis designs and for verifying the correctness of manually defined taint propagation rules are presented. These definitions and methods were successfully used to analyze and compare previously published taint analysis platforms in terms of false-positives and false-negatives

    Enhancing the efficiency and practicality of software transactional memory on massively multithreaded systems

    Get PDF
    Chip Multithreading (CMT) processors promise to deliver higher performance by running more than one stream of instructions in parallel. To exploit CMT's capabilities, programmers have to parallelize their applications, which is not a trivial task. Transactional Memory (TM) is one of parallel programming models that aims at simplifying synchronization by raising the level of abstraction between semantic atomicity and the means by which that atomicity is achieved. TM is a promising programming model but there are still important challenges that must be addressed to make it more practical and efficient in mainstream parallel programming. The first challenge addressed in this dissertation is that of making the evaluation of TM proposals more solid with realistic TM benchmarks and being able to run the same benchmarks on different STM systems. We first introduce a benchmark suite, RMS-TM, a comprehensive benchmark suite to evaluate HTMs and STMs. RMS-TM consists of seven applications from the Recognition, Mining and Synthesis (RMS) domain that are representative of future workloads. RMS-TM features current TM research issues such as nesting and I/O inside transactions, while also providing various TM characteristics. Most STM systems are implemented as user-level libraries: the programmer is expected to manually instrument not only transaction boundaries, but also individual loads and stores within transactions. This library-based approach is increasingly tedious and error prone and also makes it difficult to make reliable performance comparisons. To enable an "apples-to-apples" performance comparison, we then develop a software layer that allows researchers to test the same applications with interchangeable STM back ends. The second challenge addressed is that of enhancing performance and scalability of TM applications running on aggressive multi-core/multi-threaded processors. Performance and scalability of current TM designs, in particular STM desings, do not always meet the programmer's expectation, especially at scale. To overcome this limitation, we propose a new STM design, STM2, based on an assisted execution model in which time-consuming TM operations are offloaded to auxiliary threads while application threads optimistically perform computation. Surprisingly, our results show that STM2 provides, on average, speedups between 1.8x and 5.2x over state-of-the-art STM systems. On the other hand, we notice that assisted-execution systems may show low processor utilization. To alleviate this problem and to increase the efficiency of STM2, we enriched STM2 with a runtime mechanism that automatically and adaptively detects application and auxiliary threads' computing demands and dynamically partition hardware resources between the pair through the hardware thread prioritization mechanism implemented in POWER machines. The third challenge is to define a notion of what it means for a TM program to be correctly synchronized. The current definition of transactional data race requires all transactions to be totally ordered "as if'' serialized by a global lock, which limits the scalability of TM designs. To remove this constraint, we first propose to relax the current definition of transactional data race to allow a higher level of concurrency. Based on this definition we propose the first practical race detection algorithm for C/C++ applications (TRADE) and implement the corresponding race detection tool. Then, we introduce a new definition of transactional data race that is more intuitive, transparent to the underlying TM implementation, can be used for a broad set of C/C++ TM programs. Based on this new definition, we proposed T-Rex, an efficient and scalable race detection tool for C/C++ TM applications. Using TRADE and T-Rex, we have discovered subtle transactional data races in widely-used STAMP applications which have not been reported in the past
    corecore