675 research outputs found
MELT - a Translated Domain Specific Language Embedded in the GCC Compiler
The GCC free compiler is a very large software, compiling source in several
languages for many targets on various systems. It can be extended by plugins,
which may take advantage of its power to provide extra specific functionality
(warnings, optimizations, source refactoring or navigation) by processing
various GCC internal representations (Gimple, Tree, ...). Writing plugins in C
is a complex and time-consuming task, but customizing GCC by using an existing
scripting language inside is impractical. We describe MELT, a specific
Lisp-like DSL which fits well into existing GCC technology and offers
high-level features (functional, object or reflexive programming, pattern
matching). MELT is translated to C fitted for GCC internals and provides
various features to facilitate this. This work shows that even huge, legacy,
software can be a posteriori extended by specifically tailored and translated
high-level DSLs.Comment: In Proceedings DSL 2011, arXiv:1109.032
An investigation of dead-zone pattern matching algorithms
Thesis (MA)--Stellenbosch, 2016.ENGLISH ABSTRACT: Pattern matching allows us to search some text for a word or for a sequence
of characters|a popular feature of computer programs such as text editors.
Traditionally, three distinct families of pattern matching algorithms exist: the
Boyer-Moore (BM) algorithm, the Knuth-Morris-Pratt (KMP) algorithm, and
the Rabin-Karp (RK) algorithm. The basic algorithm in all these algorithmic
families was developed in the 1970s and 1980s. However a new family of pattern
matching algorithms, known as the Dead-Zone (DZ) family of algorithms, has
recently been developed. In a previous study, it was theoretically proven that
DZ is able to pattern match a text with fewer match attempts than the well-
known Horspool algorithm, a derivative of the BM algorithm.
The main aim of this study was to provide empirical evidence to determine
whether DZ is faster in practice. A benchmark platform was developed to com-
pare variants of the DZ algorithm to existing pattern matching algorithms.
Initial experiments were performed with four C implementations of the DZ
algorithm (two recursive and two iterative implementations). Subsequent to
this, DZ variants that make use of di erent shift functions as well as two
parallel variants of DZ (implemented with Pthreads and CUDA) were devel-
oped. Additionally, the underlying skeleton of the DZ algorithm was tweaked
to determine whether the DZ code was optimal.
The benchmark results showed that the C implementation of the iterative DZ
variants performed favourably. Both iterative algorithms beat traditional pat-
tern matching algorithms when searching natural language and genome texts,
particularly for short patterns. When di erent shift functions were used, the
only time a DZ implementation performed better than an implementation of
the traditional algorithm was for a pattern length of 65536 characters. Con-
trary to our expectations, the parallel implementation of DZ did not always
provide a speedup. In fact, the Pthreaded variants of DZ were slower than
the non-threaded DZ implementations, although the CUDA DZ variants were
consistently ve times faster than a CPU implementation of Horspool. By us-
ing a cache-friendly DZ algorithm, which reduces cache misses by about 20%,
the the original DZ can be improved by approximately 5% for relatively short
patterns (up to 128 characters with a natural language text). Moreover, a cost
of recursion and the impact of information sharing were observed for all DZ
variants and have thus been identi ed as intrinsic DZ characteristics.
Further research is recommended to determine whether the cache-friendly DZ
algorithm should become the standard implementation of the DZ algorithm.
In addition, we hope that the development of our benchmark platform has
produced a technique that can be used by researchers in future studies to conduct benchmark testsAFRIKAANSE OPSOMMING: Patroonpassing word gebruik om vir 'n reeks opeenvolgende karakters in 'n blok
van teks te soek. Dit word breedvoerig programmaties in rekenaarprogramme gebruik, byvoorbeeld in die teksredigeerders.
Tradisioneel is daar drie
afsonderlike patroonpassingalgoritme families: die Boyer-Moore (BM) familie,
Knuth-Morris-Pratt (KMP) familie en Rabin-Karp (RK) familie. Die basisal-
goritmes in hierdie algoritmefamilies was reeds in die 1970s en 1980s ontwikkel.
Maar, 'n nuwe patroonpassingsalgoritme familie is egter onlangs ontwikkel. Dit staan as die Dooie Gebied (DG) algoritme familie bekend. 'n Vorige studie het
bewys dat DG algoritmes in staat is om patroonpassing uit te voer met minder passingpogings as die welbekende Hoorspool algoritme, wat 'n afgeleide algoritme van die BM algoritme is.
Die hoofdoel met hierdie studie was om die DG familie van algoritmes empiries
te ondersoek. 'n Normtoets platform is ontwikkel om veranderlikes van die DG
algoritme met bestaande patroonpassingsalgoritmes te vergelyk. Aanvanklike eksperimente is met vier C implementasies van die DG algoritme uitgevoer.
Twee van die implementasies is rekursief en die ander twee is iteratief. Daarna was DG variante ontwikkel wat van verskillende skuif- funksies gebruik gemaak Twee parallelle variante van DG was ook ontwikkel. Een maak gebruik van \Pthreads' en die ander is in CUDA geimplementeer.
Verder was die C kode weergawe van die basiese DG algoritme fyn aangepas om vas te stel of
die kode optimaal was.
Die normtoetsresultate dui aan dat die C-implementasie van die iteratiewe DG variante gunstig presteer bo-oor die tradisionele patroonpassingsalgoritmes.
Beide van die iteratiewe algoritmes klop die tradisionele patroonpassingsalgoritmes wanneer daar met relatiewe kort patrone getoets word. Die verrigting van verskeie skuif-funksies was ook geondersoek. Die enigste keer wanneer die DG algoritmes beter presteer het as die tradisionele algoritme, was vir patroonlengtes van 65536 karakters. Teen ons verwagtinge, het die parallelle
implementasie nie altyd spoedtoename voorsien nie. Tewens, die \Pthread" variante van DG was stadiger as die nie-gerygde DG implementasies. Die
CUDA DG variante was egter telkens vyf keer vinniger as die konvensionele SVE implementasie van Horspool.
Die normtoetse het ook aangedui dat die oorspronklike DG kode naby aan optimaal was. Egter, deur 'n kas-vriendelike weergawe te gebruik wat kas oorslane met omtrent 20% verminder, kon die
prestasie met naastenby 5% verbeter word vir relatiewe kort patrone (tot by
128 karakters met natuurlike taal teks). Verder was daar vir al die DG variante n rekursiekoste en 'n impak op inligtingdeling waargeneem wat as interne DG
kenmerke geidentifiseer is.
Verdere navorsing word aanbeveel om vas te stel of die kas-vriendelike DG algoritme die standaard implementasie van die DG algoritme behoort te word.
Bykomstiglik, hoop ons dat die ontwikkeling van ons normtoets platform 'n tegniek geproduseer het wat deur navorsers in toekomstige studies gebruik kan
word om normtoetse uit te voer
Using human computation in dead-zone based 2D pattern matching
Abstract. This paper examines the application of human computation (HC) to twodimensional image pattern matching. The two main goals of our algorithm are to use turks as the processing units to perform an efficient pattern match attempt on a subsection of an image, and to divide the work using a version of dead-zone based pattern matching. In this approach, human computation presents an alternative to machine learning by outsourcing computationally difficult work to humans, while the dead-zone search offers an efficient search paradigm open to parallelization-making the combination a powerful approach for searching for patterns in two-dimensional images
Optimising Structured P2P Networks for Complex Queries
With network enabled consumer devices becoming increasingly popular, the number of connected devices and available services is growing considerably - with the number of connected devices es- timated to surpass 15 billion devices by 2015. In this increasingly large and dynamic environment it is important that users have a comprehensive, yet efficient, mechanism to discover services.
Many existing wide-area service discovery mechanisms are centralised and do not scale to large numbers of users. Additionally, centralised services suffer from issues such as a single point of failure, high maintenance costs, and difficulty of management. As such, this Thesis seeks a Peer to Peer (P2P) approach.
Distributed Hash Tables (DHTs) are well known for their high scalability, financially low barrier of entry, and ability to self manage. They can be used to provide not just a platform on which peers can offer and consume services, but also as a means for users to discover such services.
Traditionally DHTs provide a distributed key-value store, with no search functionality. In recent years many P2P systems have been proposed providing support for a sub-set of complex query types, such as keyword search, range queries, and semantic search.
This Thesis presents a novel algorithm for performing any type of complex query, from keyword search, to complex regular expressions, to full-text search, over any structured P2P overlay. This is achieved by efficiently broadcasting the search query, allowing each peer to process the query locally, and then efficiently routing responses back to the originating peer. Through experimentation, this technique is shown to be successful when the network is stable, however performance degrades under high levels of network churn.
To address the issue of network churn, this Thesis proposes a number of enhancements which can be made to existing P2P overlays in order to improve the performance of both the existing DHT and the proposed algorithm. Through two case studies these enhancements are shown to improve not only the performance of the proposed algorithm under churn, but also the performance of traditional lookup operations in these networks
Searching for patterns in Conway's Game of Life
Conway’s Game of Life (Life) is a simple cellular automaton, discovered by John Conway in 1970, that exhibits complex emergent behavior. Life-enthusiasts have been looking for building blocks with specific properties (patterns) to answer unsolved problems in Life for the past five decades. Finding patterns in Life is difficult due to the large search space. Current search algorithms use an explorative approach based on the rules of the game, but this can only sample a small fraction of the search space. More recently, people have used Sat solvers to search for patterns. These solvers are not specifically tuned to this problem and thus waste a lot of time processing Life’s rules in an engine that does not understand them. We propose a novel Sat-based approach that replaces the binary tree used by traditional Sat solvers with a grid-based approach, complemented by an injection of Game of Life specific knowledge. This leads to a significant speedup in searching. As a fortunate side effect, our solver can be generalized to solve general Sat problems. Because it is grid-based, all manipulations are embarrassingly parallel, allowing implementation on massively parallel hardware
Design of an Offline Handwriting Recognition System Tested on the Bangla and Korean Scripts
This dissertation presents a flexible and robust offline handwriting recognition system which is tested on the Bangla and Korean scripts. Offline handwriting recognition is one of the most challenging and yet to be solved problems in machine learning. While a few popular scripts (like Latin) have received a lot of attention, many other widely used scripts (like Bangla) have seen very little progress. Features such as connectedness and vowels structured as diacritics make it a challenging script to recognize. A simple and robust design for offline recognition is presented which not only works reliably, but also can be used for almost any alphabetic writing system. The framework has been rigorously tested for Bangla and demonstrated how it can be transformed to apply to other scripts through experiments on the Korean script whose two-dimensional arrangement of characters makes it a challenge to recognize.
The base of this design is a character spotting network which detects the location of different script elements (such as characters, diacritics) from an unsegmented word image. A transcript is formed from the detected classes based on their corresponding location information. This is the first reported lexicon-free offline recognition system for Bangla and achieves a Character Recognition Accuracy (CRA) of 94.8%. This is also one of the most flexible architectures ever presented. Recognition of Korean was achieved with a 91.2% CRA. Also, a powerful technique of autonomous tagging was developed which can drastically reduce the effort of preparing a dataset for any script. The combination of the character spotting method and the autonomous tagging brings the entire offline recognition problem very close to a singular solution.
Additionally, a database named the Boise State Bangla Handwriting Dataset was developed. This is one of the richest offline datasets currently available for Bangla and this has been made publicly accessible to accelerate the research progress. Many other tools were developed and experiments were conducted to more rigorously validate this framework by evaluating the method against external datasets (CMATERdb 1.1.1, Indic Word Dataset and REID2019: Early Indian Printed Documents). Offline handwriting recognition is an extremely promising technology and the outcome of this research moves the field significantly ahead
The OCaml system release 5.0: Documentation and user's manual
This manual documents the release 5.0 of the OCaml system. It is organized as follows. Part I, "An introduction to OCaml", gives an overview of the language. Part II, "The OCaml language", is the reference description of the language. Part III, "The OCaml tools", documents the compilers, toplevel system, and programming utilities. Part IV, "The OCaml library", describes the modules provided in the standard library. Part V, “Indexes”, contains an index of all identifiers defined in the standard library, and an index of keywords
- …