5,666 research outputs found
Programming errors in traversal programs over structured data
Traversal strategies \'a la Stratego (also \'a la Strafunski and 'Scrap Your
Boilerplate') provide an exceptionally versatile and uniform means of querying
and transforming deeply nested and heterogeneously structured data including
terms in functional programming and rewriting, objects in OO programming, and
XML documents in XML programming. However, the resulting traversal programs are
prone to programming errors. We are specifically concerned with errors that go
beyond conservative type errors; examples we examine include divergent
traversals, prematurely terminated traversals, and traversals with dead code.
Based on an inventory of possible programming errors we explore options of
static typing and static analysis so that some categories of errors can be
avoided. This exploration generates suggestions for improvements to strategy
libraries as well as their underlying programming languages. Haskell is used
for illustrations and specifications with sufficient explanations to make the
presentation comprehensible to the non-specialist. The overall ideas are
language-agnostic and they are summarized accordingly
Recommended from our members
Applying an abstract data structure description approach to parallelizing scientific pointer programs
Even though impressive progress has been made in the area of parallelizing scientific programs with arrays, the application of similar techniques to programs with pointer data structures has remained difficult. Unlike arrays which have a small number of well-defined properties that can be utilized by a parallelizing compiler, pointer data structures are used to implement a wide variety of structures that exhibit a much more diverse set of properties. The complexity and diversity of such properties means that, in general, scientific programs with pointer data structures cannot be effectively analyzed by an optimizing and parallelizing compiler.In order to provide a system in which the compiler can fully utilize the properties of different types of pointer data structures, we have developed a mechanism for the Abstract Description of Data Structures (ADDS). With our approach, the programmer can explicitly describe important properties such as dimensionality of the pointer data structure, independence of dimensions, and direction of traversal. These abstract descriptions of pointer data structures are then used by the compiler to guide analysis, optimization, and parallelization.In this paper we summarize the ADDS approach through the use of numerous examples of data structures used in scientific computations, we illustrate how such declarations are natural and non-tedious to specify, and we show how the ADDS declarations can be used to improve compile-time analysis. In order to demonstrate the viability of our approach, we show how such techniques can be used to parallelize an important class of scientific codes which naturally use recursive pointer data structures. In particular, we use our approach to develop the parallelization of an N-body simulation that is based on a relatively complicated pointer data structure, and we report the speedup results for a Sequent multiprocessor
The Parallelism Motifs of Genomic Data Analysis
Genomic data sets are growing dramatically as the cost of sequencing
continues to decline and small sequencing devices become available. Enormous
community databases store and share this data with the research community, but
some of these genomic data analysis problems require large scale computational
platforms to meet both the memory and computational requirements. These
applications differ from scientific simulations that dominate the workload on
high end parallel systems today and place different requirements on programming
support, software libraries, and parallel architectural design. For example,
they involve irregular communication patterns such as asynchronous updates to
shared data structures. We consider several problems in high performance
genomics analysis, including alignment, profiling, clustering, and assembly for
both single genomes and metagenomes. We identify some of the common
computational patterns or motifs that help inform parallelization strategies
and compare our motifs to some of the established lists, arguing that at least
two key patterns, sorting and hashing, are missing
Trustworthy Refactoring via Decomposition and Schemes: A Complex Case Study
Widely used complex code refactoring tools lack a solid reasoning about the
correctness of the transformations they implement, whilst interest in proven
correct refactoring is ever increasing as only formal verification can provide
true confidence in applying tool-automated refactoring to industrial-scale
code. By using our strategic rewriting based refactoring specification
language, we present the decomposition of a complex transformation into smaller
steps that can be expressed as instances of refactoring schemes, then we
demonstrate the semi-automatic formal verification of the components based on a
theoretical understanding of the semantics of the programming language. The
extensible and verifiable refactoring definitions can be executed in our
interpreter built on top of a static analyser framework.Comment: In Proceedings VPT 2017, arXiv:1708.0688
Finding The Lazy Programmer's Bugs
Traditionally developers and testers created huge numbers of explicit tests, enumerating interesting cases, perhaps
biased by what they believe to be the current boundary conditions of the function being tested. Or at
least, they were supposed to.
A major step forward was the development of property testing. Property testing requires the user to write a few
functional properties that are used to generate tests, and requires an external library or tool to create test data
for the tests. As such many thousands of tests can be created for a single property. For the purely functional
programming language Haskell there are several such libraries; for example QuickCheck [CH00], SmallCheck
and Lazy SmallCheck [RNL08].
Unfortunately, property testing still requires the user to write explicit tests. Fortunately, we note there are
already many implicit tests present in programs. Developers may throw assertion errors, or the compiler may
silently insert runtime exceptions for incomplete pattern matches.
We attempt to automate the testing process using these implicit tests. Our contributions are in four main
areas: (1) We have developed algorithms to automatically infer appropriate constructors and functions needed
to generate test data without requiring additional programmer work or annotations. (2) To combine the
constructors and functions into test expressions we take advantage of Haskell's lazy evaluation semantics by
applying the techniques of needed narrowing and lazy instantiation to guide generation. (3) We keep the type
of test data at its most general, in order to prevent committing too early to monomorphic types that cause
needless wasted tests. (4) We have developed novel ways of creating Haskell case expressions to inspect elements
inside returned data structures, in order to discover exceptions that may be hidden by laziness, and to make
our test data generation algorithm more expressive.
In order to validate our claims, we have implemented these techniques in Irulan, a fully automatic tool for
generating systematic black-box unit tests for Haskell library code. We have designed Irulan to generate high
coverage test suites and detect common programming errors in the process
ć¨ăç¨ăăć§é ĺ丌ĺăăă°ăŠăăłă°
High-level abstractions for parallel programming are still immature. Computations on complicated data structures such as pointer structures are considered as irregular algorithms. General graph structures, which irregular algorithms generally deal with, are difficult to divide and conquer. Because the divide-and-conquer paradigm is essential for load balancing in parallel algorithms and a key to parallel programming, general graphs are reasonably difficult. However, trees lead to divide-and-conquer computations by definition and are sufficiently general and powerful as a tool of programming. We therefore deal with abstractions of tree-based computations. Our study has started from Matsuzakiâs work on tree skeletons. We have improved the usability of tree skeletons by enriching their implementation aspect. Specifically, we have dealt with two issues. We first have implemented the loose coupling between skeletons and data structures and developed a flexible tree skeleton library. We secondly have implemented a parallelizer that transforms sequential recursive functions in C into parallel programs that use tree skeletons implicitly. This parallelizer hides the complicated API of tree skeletons and makes programmers to use tree skeletons with no burden. Unfortunately, the practicality of tree skeletons, however, has not been improved. On the basis of the observations from the practice of tree skeletons, we deal with two application domains: program analysis and neighborhood computation. In the domain of program analysis, compilers treat input programs as control-flow graphs (CFGs) and perform analysis on CFGs. Program analysis is therefore difficult to divide and conquer. To resolve this problem, we have developed divide-and-conquer methods for program analysis in a syntax-directed manner on the basis of Rosenâs high-level approach. Specifically, we have dealt with data-flow analysis based on Tarjanâs formalization and value-graph construction based on a functional formalization. In the domain of neighborhood computations, a primary issue is locality. A naive parallel neighborhood computation without locality enhancement causes a lot of cache misses. The divide-and-conquer paradigm is known to be useful also for locality enhancement. We therefore have applied algebraic formalizations and a tree-segmenting technique derived from tree skeletons to the locality enhancement of neighborhood computations.éťć°é俥大ĺŚ201
Usability issues and design principles for visual programming languages
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Despite two decades of empirical studies focusing on programmers and the problems with programming, usability of textual programming languages is still hard to achieve. Its younger relation, visual programming languages (VPLs) also share the same problem of poor usability. This research explores and investigates the usability issues relating to VPLs in order to suggest a set of design principles that emphasise usability. The approach adopted focuses on issues arising from the interaction and communication between the human (programmers), the computer (user interface), and the program. Being exploratory in nature, this PhD reviews the literature as a starting point for stimulating and developing research questions and hypotheses that experimental studies were conducted to investigate. However, the literature alone cannot provide a fully comprehensive list of possible usability problems in VPLs so that design principles can be confidently recommended. A commercial VPL was, therefore, holistically evaluated and a comprehensive list of usability problems was obtained from the research. Six empirical studies employing both quantitative and qualitative methodology were undertaken as dictated by the nature of the research. Five of these were controlled experiments and one was qualitative-naturalistic. The experiments studied the effect of a programming paradigm and of representation of program flow on novices' performances. The results indicated superiority of control-flow programs in relation to data-flow programs; a control-flow preference among novices; and in addition that directional representation does not affect performance while traversal direction does - due to cognitive demands imposed upon programmers. Results of the qualitative study included a list of 145 usability problems and these were further categorised into ten problem areas. These findings were integrated with other analytical work based upon the review of the literature in a structured fashion to form a checklist and a set of design principles for VPLs that are empirically grounded and evaluated against existing research in the literature. Furthermore, an extended framework for Cognitive Dimensions of Notations is also discussed and proposed as an evaluation method for diagrammatic VPLs on the basis of the qualitative study. The above consists of the major findings and deliverables of this research. Nevertheless, there are several other findings identified on the basis of the substantial amount of data obtained in the series of experiments carried out, which have made a novel contribution to knowledge in the fields of Human-Computer Interaction, Psychology of Programming, and Visual Programming Languages
- âŚ