1,294 research outputs found
Parallel parsing made practical
The property of local parsability allows to parse inputs through inspecting only a bounded-length string around the current token. This in turn enables the construction of a scalable, data-parallel parsing algorithm, which is presented in this work. Such an algorithm is easily amenable to be automatically generated via a parser generator tool, which was realized, and is also presented in the following. Furthermore, to complete the framework of a parallel input analysis, a parallel scanner can also combined with the parser. To prove the practicality of a parallel lexing and parsing approach, we report the results of the adaptation of JSON and Lua to a form fit for parallel parsing (i.e. an operator-precedence grammar) through simple grammar changes and scanning transformations. The approach is validated with performance figures from both high performance and embedded multicore platforms, obtained analyzing real-world inputs as a test-bench. The results show that our approach matches or dominates the performances of production-grade LR parsers in sequential execution, and achieves significant speedups and good scaling on multi-core machines. The work is concluded by a broad and critical survey of the past work on parallel parsing and future directions on the integration with semantic analysis and incremental parsing
Producing power-law distributions and damping word frequencies with two-stage language models
Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statisticalmodels that can generically produce power laws, breaking generativemodels into two stages. The first stage, the generator, can be any standard probabilistic model, while the second stage, the adaptor, transforms the word frequencies of this model to provide a closer match to natural language. We show that two commonly used Bayesian models, the Dirichlet-multinomial model and the Dirichlet process, can be viewed as special cases of our framework. We discuss two stochastic processes-the Chinese restaurant process and its two-parameter generalization based on the Pitman-Yor process-that can be used as adaptors in our framework to produce power-law distributions over word frequencies. We show that these adaptors justify common estimation procedures based on logarithmic or inverse-power transformations of empirical frequencies. In addition, taking the Pitman-Yor Chinese restaurant process as an adaptor justifies the appearance of type frequencies in formal analyses of natural language and improves the performance of a model for unsupervised learning of morphology.48 page(s
Polynomial tuning of multiparametric combinatorial samplers
Boltzmann samplers and the recursive method are prominent algorithmic
frameworks for the approximate-size and exact-size random generation of large
combinatorial structures, such as maps, tilings, RNA sequences or various
tree-like structures. In their multiparametric variants, these samplers allow
to control the profile of expected values corresponding to multiple
combinatorial parameters. One can control, for instance, the number of leaves,
profile of node degrees in trees or the number of certain subpatterns in
strings. However, such a flexible control requires an additional non-trivial
tuning procedure. In this paper, we propose an efficient polynomial-time, with
respect to the number of tuned parameters, tuning algorithm based on convex
optimisation techniques. Finally, we illustrate the efficiency of our approach
using several applications of rational, algebraic and P\'olya structures
including polyomino tilings with prescribed tile frequencies, planar trees with
a given specific node degree distribution, and weighted partitions.Comment: Extended abstract, accepted to ANALCO2018. 20 pages, 6 figures,
colours. Implementation and examples are available at [1]
https://github.com/maciej-bendkowski/boltzmann-brain [2]
https://github.com/maciej-bendkowski/multiparametric-combinatorial-sampler
Lily: A parser generator for LL(1) languages
This paper discusses the design and implementation of Lily, a language for generating LL(1) language parsers, originally designed by Dr. Thomas J. Sager of the University of Missouri--Rolla. A method for the automatic generation of parser tables is described which creates small, highly optimized tables, suitable for conversion to minimal perfect hash functions.
An implementation of Lily is discussed with attention to design goals, implementation of parser table generation, and table optimization techniques. Proposals are made detailing possibilities for further augmentation of the system. Examples of Lily programs are given as well as a manual for the system
A Domain-Specific Language and Editor for Parallel Particle Methods
Domain-specific languages (DSLs) are of increasing importance in scientific
high-performance computing to reduce development costs, raise the level of
abstraction and, thus, ease scientific programming. However, designing and
implementing DSLs is not an easy task, as it requires knowledge of the
application domain and experience in language engineering and compilers.
Consequently, many DSLs follow a weak approach using macros or text generators,
which lack many of the features that make a DSL a comfortable for programmers.
Some of these features---e.g., syntax highlighting, type inference, error
reporting, and code completion---are easily provided by language workbenches,
which combine language engineering techniques and tools in a common ecosystem.
In this paper, we present the Parallel Particle-Mesh Environment (PPME), a DSL
and development environment for numerical simulations based on particle methods
and hybrid particle-mesh methods. PPME uses the meta programming system (MPS),
a projectional language workbench. PPME is the successor of the Parallel
Particle-Mesh Language (PPML), a Fortran-based DSL that used conventional
implementation strategies. We analyze and compare both languages and
demonstrate how the programmer's experience can be improved using static
analyses and projectional editing. Furthermore, we present an explicit domain
model for particle abstractions and the first formal type system for particle
methods.Comment: Submitted to ACM Transactions on Mathematical Software on Dec. 25,
201
- …