68 research outputs found
Left Recursion in Parsing Expression Grammars
Parsing Expression Grammars (PEGs) are a formalism that can describe all
deterministic context-free languages through a set of rules that specify a
top-down parser for some language. PEGs are easy to use, and there are
efficient implementations of PEG libraries in several programming languages.
A frequently missed feature of PEGs is left recursion, which is commonly used
in Context-Free Grammars (CFGs) to encode left-associative operations. We
present a simple conservative extension to the semantics of PEGs that gives
useful meaning to direct and indirect left-recursive rules, and show that our
extensions make it easy to express left-recursive idioms from CFGs in PEGs,
with similar results. We prove the conservativeness of these extensions, and
also prove that they work with any left-recursive PEG.
PEGs can also be compiled to programs in a low-level parsing machine. We
present an extension to the semantics of the operations of this parsing machine
that let it interpret left-recursive PEGs, and prove that this extension is
correct with regards to our semantics for left-recursive PEGs.Comment: Extended version of the paper "Left Recursion in Parsing Expression
Grammars", that was published on 2012 Brazilian Symposium on Programming
Language
PARALLELIZING TIME-SERIES SESSION DATA ANALYSIS WITH A TYPE-ERASURE BASED DSEL
The Science Information Network (SINET) is a Japanese academic backbone network. SINET consists of more than 800 universities and research institutions. In the operation of a huge academic backbone network, more flexible querying technology is required to cope with massive time series session data and analysis of sophisticated cyber-attacks. This paper proposes a parallelizing DSEL (Domain Specific Embedded Language) processing for huge time-series session data. In our DESL, the function object is implemented by type erasure for constructing internal DSL for processing time-series data. Type erasure enables our parser to store function pointer and function object into the same *void type with class templates. We apply to scatter/gather pattern for concurrent DSEL parsing. Each thread parses DSEL to extract the tuple timestamp, source IP, and destination IP in the gather phase. In the scattering phase, we use a concurrent hash map to handle multiple thread outputs with our DSEL.
In the experiment, we have measured the elapsed time in parsing and inserting IPv4 address and timestamp data format ranging from 1,000 to 50,000 lines with 24-row items. We have also measured CPU idle time in processing 100,000,000 lines of session data with 5, 10 and 20 multiple threads. It has been turned out that the proposed method can work in feasible computing time in both cases
Well-Formed and Scalable Invasive Software Composition
Software components provide essential means to structure and organize software effectively. However, frequently, required component abstractions are not available in a programming language or system, or are not adequately combinable with each other. Invasive software composition (ISC) is a general approach to software composition that unifies component-like abstractions such as templates, aspects and macros. ISC is based on fragment composition, and composes programs and other software artifacts at the level of syntax trees. Therefore, a unifying fragment component model is related to the context-free grammar of a language to identify extension and variation points in syntax trees as well as valid component types. By doing so, fragment components can be composed by transformations at respective extension and variation points so that always valid composition results regarding the underlying context-free grammar are yielded. However, given a languageâs context-free grammar, the composition result may still be incorrect.
Context-sensitive constraints such as type constraints may be violated so that the program cannot be compiled and/or interpreted correctly. While a compiler can detect such errors after composition, it is difficult to relate them back to the original transformation step in the composition system, especially in the case of complex compositions with several hundreds of such steps. To tackle this problem, this thesis proposes well-formed ISCâan extension to ISC that uses reference attribute grammars (RAGs) to specify fragment component models and fragment contracts to guard compositions with context-sensitive constraints. Additionally, well-formed ISC provides composition strategies as a means to configure composition algorithms and handle interferences between composition steps.
Developing ISC systems for complex languages such as programming languages is a complex undertaking. Composition-system developers need to supply or develop adequate language and parser specifications that can be processed by an ISC composition engine. Moreover, the specifications may need to be extended with rules for the intended composition abstractions.
Current approaches to ISC require complete grammars to be able to compose fragments in the respective languages. Hence, the specifications need to be developed exhaustively before any component model can be supplied. To tackle this problem, this thesis introduces scalable ISCâa variant of ISC that uses island component models as a means to define component models for partially specified languages while still the whole language is supported. Additionally, a scalable workflow for agile composition-system development is proposed which supports a development of ISC systems in small increments using modular extensions.
All theoretical concepts introduced in this thesis are implemented in the Skeletons and Application Templates framework SkAT. It supports âclassicâ, well-formed and scalable ISC by leveraging RAGs as its main specification and implementation language. Moreover, several composition systems based on SkAT are discussed, e.g., a well-formed composition system for Java and a C preprocessor-like macro language. In turn, those composition systems are used as composers in several example applications such as a library of parallel algorithmic skeletons
Grammars for Indentation-Sensitive Parsing
Adams\u27 extension of parsing expression grammars enables specifying
indentation sensitivity using two non-standard grammar constructs - indentation by a binary relation and alignment. This paper is a theoretical study of Adams\u27 grammars. It proposes a
step-by-step transformation of well-formed Adams\u27 grammars for
elimination of the alignment construct from the grammar. The idea that alignment could be avoided was suggested by Adams but no
process for achieving this aim has been described before.
This paper also establishes general conditions that binary
relations used in indentation constructs must satisfy in order to enable efficient parsing
PyVerDetector: A Chrome Extension Detecting the Python Version of Stack Overflow Code Snippets
Over the years, Stack Overflow (SO) has accumulated numerous code snippets, with developers going to SO for problem solutions and code references. However, in the case of the Python programming language, Python 3 is not necessarily backward compatible with Python 2. The major implication of this versioning problem is that code written in Python 2 may not be interpreted by Python 3 without modifications. This issue may affect the usability of Python code snippets on SO. We investigate how many Python code snippets on SO suffer from version compatibility issues, and find that about 10% of the snippets exhibit this problem. Moreover, of the code snippets that are interpretable only by Python 2 or Python 3, less than 17% are tagged with the Python version.In this paper, we present a Chrome extension called PyVerDetector. This extension allows the user to select a given version of Python and verifies whether the code snippets on a given SO question are compatible with the user's selected Python version, providing error messages if not. The tool parses snippets and can determine versioning errors due to differences in syntax and also provides the user with a list of Python versions capable of interpreting each code snippet.Yang S., Kanda T., Pizzolotto D., et al. PyVerDetector: A Chrome Extension Detecting the Python Version of Stack Overflow Code Snippets. IEEE International Conference on Program Comprehension 2023-May, 25 (2023); https://doi.org/10.1109/ICPC58990.2023.00013
- âŠ