1,124 research outputs found
Functional programming and graph algorithms
This thesis is an investigation of graph algorithms in the non-strict purely functional language Haskell. Emphasis is placed on the importance of achieving an asymptotic complexity as good as with conventional languages. This is achieved by using the monadic model for including actions on the state. Work on the monadic model was carried out at Glasgow University by Wadler, Peyton Jones, and Launchbury in the early nineties and has opened up many diverse application areas. One area is the ability to express data structures that require sharing. Although graphs are not presented in this style, data structures that graph algorithms use are expressed in this style. Several examples of stateful algorithms are given including union/find for disjoint sets, and the linear time sort binsort.
The graph algorithms presented are not new, but are traditional algorithms recast in a functional setting. Examples include strongly connected components, biconnected components, Kruskal's minimum cost spanning tree, and Dijkstra's shortest paths. The presentation is lucid giving more insight than usual. The functional setting allows for complete calculational style correctness proofs - which is demonstrated with many examples.
The benefits of using a functional language for expressing graph algorithms are quantified by looking at the issues of execution times, asymptotic complexity, correctness, and clarity, in comparison with traditional approaches. The intention is to be as objective as possible, pointing out both the weaknesses and the strengths of using a functional language
Algebras for weighted search
Weighted search is an essential component of many fundamental and useful algorithms. Despite this, it is relatively under explored as a computational effect, receiving not nearly as much attention as either depth- or breadth-first search. This paper explores the algebraic underpinning of weighted search, and demonstrates how to implement it as a monad transformer. The development first explores breadth-first search, which can be expressed as a polynomial over semirings. These polynomials are generalised to the free semi module monad to capture a wide range of applications, including probability monads, polynomial monads, and monads for weighted search. Finally, a monad trans-former based on the free semi module monad is introduced. Applying optimisations to this type yields an implementation of pairing heaps, which is then used to implement Dijkstra’s algorithm and efficient probabilistic sampling. The construction is formalised in Cubical Agda and implemented in Haskell
Elaboration in Dependent Type Theory
To be usable in practice, interactive theorem provers need to provide
convenient and efficient means of writing expressions, definitions, and proofs.
This involves inferring information that is often left implicit in an ordinary
mathematical text, and resolving ambiguities in mathematical expressions. We
refer to the process of passing from a quasi-formal and partially-specified
expression to a completely precise formal one as elaboration. We describe an
elaboration algorithm for dependent type theory that has been implemented in
the Lean theorem prover. Lean's elaborator supports higher-order unification,
type class inference, ad hoc overloading, insertion of coercions, the use of
tactics, and the computational reduction of terms. The interactions between
these components are subtle and complex, and the elaboration algorithm has been
carefully designed to balance efficiency and usability. We describe the central
design goals, and the means by which they are achieved
Towards MKM in the Large: Modular Representation and Scalable Software Architecture
MKM has been defined as the quest for technologies to manage mathematical
knowledge. MKM "in the small" is well-studied, so the real problem is to scale
up to large, highly interconnected corpora: "MKM in the large". We contend that
advances in two areas are needed to reach this goal. We need representation
languages that support incremental processing of all primitive MKM operations,
and we need software architectures and implementations that implement these
operations scalably on large knowledge bases.
We present instances of both in this paper: the MMT framework for modular
theory-graphs that integrates meta-logical foundations, which forms the base of
the next OMDoc version; and TNTBase, a versioned storage system for XML-based
document formats. TNTBase becomes an MMT database by instantiating it with
special MKM operations for MMT.Comment: To appear in The 9th International Conference on Mathematical
Knowledge Management: MKM 201
Parallel evaluation strategies for lazy data structures in Haskell
Conventional parallel programming is complex and error prone. To improve programmer
productivity, we need to raise the level of abstraction with a higher-level
programming model that hides many parallel coordination aspects. Evaluation
strategies use non-strictness to separate the coordination and computation aspects
of a Glasgow parallel Haskell (GpH) program. This allows the specification of high
level parallel programs, eliminating the low-level complexity of synchronisation and
communication associated with parallel programming.
This thesis employs a data-structure-driven approach for parallelism derived through
generic parallel traversal and evaluation of sub-components of data structures. We
focus on evaluation strategies over list, tree and graph data structures, allowing
re-use across applications with minimal changes to the sequential algorithm.
In particular, we develop novel evaluation strategies for tree data structures, using
core functional programming techniques for coordination control, achieving more
flexible parallelism. We use non-strictness to control parallelism more flexibly. We
apply the notion of fuel as a resource that dictates parallelism generation, in particular,
the bi-directional flow of fuel, implemented using a circular program definition,
in a tree structure as a novel way of controlling parallel evaluation. This is the first
use of circular programming in evaluation strategies and is complemented by a lazy
function for bounding the size of sub-trees.
We extend these control mechanisms to graph structures and demonstrate performance
improvements on several parallel graph traversals. We combine circularity
for control for improved performance of strategies with circularity for computation
using circular data structures. In particular, we develop a hybrid traversal strategy
for graphs, exploiting breadth-first order for exposing parallelism initially, and
then proceeding with a depth-first order to minimise overhead associated with a full
parallel breadth-first traversal.
The efficiency of the tree strategies is evaluated on a benchmark program, and
two non-trivial case studies: a Barnes-Hut algorithm for the n-body problem and
sparse matrix multiplication, both using quad-trees. We also evaluate a graph search
algorithm implemented using the various traversal strategies.
We demonstrate improved performance on a server-class multicore machine with
up to 48 cores, with the advanced fuel splitting mechanisms proving to be more
flexible in throttling parallelism. To guide the behaviour of the strategies, we develop
heuristics-based parameter selection to select their specific control parameters
Jutge.org: characteristics and experiences
Jutge.org is an open educational online programming judge designed for students and instructors, featuring a repository of problems that is well organized by courses, topics and difficulty. Internally, Jutge.org uses a secure and efficient architecture and integrates modern verification techniques, formal methods, static code analysis and data mining. Jutge.org has exhaustively been used during the last decade at the Universitat Politecnica de Catalunya to strengthen the learn-by-doing approach in several courses. This paper presents the main characteristics of Jutge.org and shows its use and impact in a wide range of courses covering basic programming, data structures, algorithms, artificial intelligence, functional programming and circuit design.Peer ReviewedPostprint (author's final draft
Formal Derivation of Concurrent Garbage Collectors
Concurrent garbage collectors are notoriously difficult to implement
correctly. Previous approaches to the issue of producing correct collectors
have mainly been based on posit-and-prove verification or on the application of
domain-specific templates and transformations. We show how to derive the upper
reaches of a family of concurrent garbage collectors by refinement from a
formal specification, emphasizing the application of domain-independent design
theories and transformations. A key contribution is an extension to the
classical lattice-theoretic fixpoint theorems to account for the dynamics of
concurrent mutation and collection.Comment: 38 pages, 21 figures. The short version of this paper appeared in the
Proceedings of MPC 201
Spelling correction in the NLP system 'LOLITA: dictionary organisation and search algorithms
This thesis describes the design and implementation of a spelling correction system and associated dictionaries, for the Natural Language Processing System 'LOLITA'. The dictionary storage is based upon a trie (M-ary tree) data-structure. The design of the dictionary is described, and the way in which the data-structure is implemented is also discussed. The spelling correction system makes use of the trie structure in order to limit repetition and "garden path' searching. The spelling correction algorithms used are a variation on the 'reverse minimum edit-distance' technique. These algorithms have been modified in order to place more emphasis on generation in order of likelihood. The system will correct up to two simple errors {i.e. insertion, omission, substitution or transposition of characters) per word. The individual algorithms are presented in turn and their combination into a unified strategy to correct misspellings is demonstrated. The system was implemented in the programming language Haskell; a pure functional, class-based language, with non-strict semantics and polymorphic type-checking. The use of several features of this language, in particular lazy evaluation, and their corresponding advantages over more traditional languages are described. The dictionaries and spelling correcting facilities are in use in the LOLITA system. Issues pertaining to 'real word' error correction, arising from the system's use in an NLP context, axe also discussed
A heuristic-based approach to code-smell detection
Encapsulation and data hiding are central tenets of the object oriented paradigm. Deciding what data and behaviour to form into a class and where to draw the line between its public and private details can make the difference between a class that is an understandable, flexible and reusable abstraction and one which is not. This decision is a difficult one and may easily result in poor encapsulation which can then have serious implications for a number of system qualities. It is often hard to identify such encapsulation problems within large software systems until they cause a maintenance problem (which is usually too late) and attempting to perform such analysis manually can also be tedious and error prone. Two of the common encapsulation problems that can arise as a consequence of this decomposition process are data classes and god classes. Typically, these two problems occur together – data classes are lacking in functionality that has typically been sucked into an over-complicated and domineering god class. This paper describes the architecture of a tool which automatically detects data and god classes that has been developed as a plug-in for the Eclipse IDE. The technique has been evaluated in a controlled study on two large open source systems which compare the tool results to similar work by Marinescu, who employs a metrics-based approach to detecting such features. The study provides some valuable insights into the strengths and weaknesses of the two approache
- …