1,453 research outputs found
DALiuGE: A Graph Execution Framework for Harnessing the Astronomical Data Deluge
The Data Activated Liu Graph Engine - DALiuGE - is an execution framework for
processing large astronomical datasets at a scale required by the Square
Kilometre Array Phase 1 (SKA1). It includes an interface for expressing complex
data reduction pipelines consisting of both data sets and algorithmic
components and an implementation run-time to execute such pipelines on
distributed resources. By mapping the logical view of a pipeline to its
physical realisation, DALiuGE separates the concerns of multiple stakeholders,
allowing them to collectively optimise large-scale data processing solutions in
a coherent manner. The execution in DALiuGE is data-activated, where each
individual data item autonomously triggers the processing on itself. Such
decentralisation also makes the execution framework very scalable and flexible,
supporting pipeline sizes ranging from less than ten tasks running on a laptop
to tens of millions of concurrent tasks on the second fastest supercomputer in
the world. DALiuGE has been used in production for reducing interferometry data
sets from the Karl E. Jansky Very Large Array and the Mingantu Ultrawide
Spectral Radioheliograph; and is being developed as the execution framework
prototype for the Science Data Processor (SDP) consortium of the Square
Kilometre Array (SKA) telescope. This paper presents a technical overview of
DALiuGE and discusses case studies from the CHILES and MUSER projects that use
DALiuGE to execute production pipelines. In a companion paper, we provide
in-depth analysis of DALiuGE's scalability to very large numbers of tasks on
two supercomputing facilities.Comment: 31 pages, 12 figures, currently under review by Astronomy and
Computin
Flocking: A framework for declarative music-making on the Web
(Abstract to follow
Modern web-programming language concurrency
This Masters Thesis compares Elixir, Go and JavaScript (Node.js) as programming language candi- dates for writing concurrent RESTful webservice backends. First we describe each of the languages. Next we compare the functional concurrency characteristics of the languages to each other. Finally we do scalability testing for each of the languages. Scalability testing is done using the Locust.io framework. For testing purposes we introduce for simple REST-api implementations for each of the languages. Result from the tests was that JavaScript performed the worst of the languages and Go was the most verbose language to program with
An All-in-One Debugging Approach: Java Debugging, Execution Visualization and Verification
We devise a widely applicable debugging approach to deal with the prevailing issue that bugs cannot be precisely reproduced in nondeterministic complex concurrent programs. A distinct efficient record-and-playback mechanism is designed to record all the internal states of execution including intermediate results by injecting our own bytecode, which does not affect the source code, and, through a two-step data processing mechanism, these data will be aggregated, structured and parallel processed for the purpose of replay in high fidelity while keeping the overhead at a satisfactory level. Docker and Git are employed to create a clean environment such that the execution will be undertaken repeatedly with a maximum precision of reproducing bugs. In our development, several other forefront technologies, such as MongoDB, Spark and Node.js are utilized and smoothly integrated for easy implementation. Altogether, we develop a system for Java Debugging Execution Visualization and Verification (JDevv), a debugging tool for Java although our debugging approach can apply to other languages as well. JDevv also offers an aggregated and interactive visualization for the ease of users’ code verification
Joining and aggregating datasets using CouchDB
Data mining typically requires implementing operations that involve cross-cutting entity boundaries and are awkward to implement in document-oriented databases. CouchDB, for example, models entities as documents, with highly isolated entity boundaries, and on which joins cannot be directly performed. This project shows how join and aggregation can be achieved across entity boundaries in such systems, as encountered for example in the pre-processing and exploration stages of educational data mining. A software stack is presented as a means by which this can be achieved; first, datasets are processed via ETL operations, then MapReduce is used to create indices of ordered and aggregated data. Finally, a Couchdb list function is used to iterate through these indices and perform joins, and to compute aggregated values on joined datasets such as variance and correlations. In terms of the case study, it is shown that the proposed approach to implementing cross-document joins and aggregation is effective and scalable. In addition, it was discovered that for the 2014 - 2016 UCT cohorts, NBT scores correlate better with final grades for the CSC1015F course than do Grade 12 results for English, Science and Mathematics
Optimizing sequences traversal and extensibility
Dissertação para obtenção do Grau de Mestre em Engenharia Informática e de ComputadoresGeradores yield são uma característica de programação bem conhecida, disponível na maioria dos ambientes de programação usados, como JavaScript, Python e muitos outros. Permitem uma extensibilidade fácil e compacta em operações de streams, como em iteradores ou tipos enumeráveis. Ainda assim, surgem duas
questões sobre a sua utilização: 1) Os geradores são a melhor escolha para estender sequências com novas operações definidas pelo programador? 2) E se as linguagens de programação de desenvolvimento não fornecerem geradores yield, como em Java? O trabalho de pesquisa que descrevo nesta dissertação visa responder a essas duas questões. Para tal, analisei dois desenhos de tipo de sequência de linguagens de programação diferentes, nomeadamente, Java e Javascript. Além disso, estudei as alternativas mais utilizadas às sequências incluídas em cada linguagem, num conjunto de características, criando benchmarks para analisar o desempenho de cada uma em casos de utilização baseados no mundo real, disponíveis
para cada programador poder usar quando quiser escolher um tipo de sequência de acordo com suas necessidades. Para além disto, proponho a minha própria solução para um tipo de sequência, baseado num desenho minimalista que permite não só a extensão concisa da sua API como o encadeamento fluente de operações definidas pelo utilizador. A minha proposta tem como objectivo ser quão simples e transparente quanto possível, para que qualquer programador consiga perceber claramente aquilo que está a usar.Por fim, respondo à questão "Quando se deve usar paralelismo?"com um conjunto de benchmarks que comparam o processamento sequencial das Streams do Java com o seu processamento paralelo.Yield generators are a well-known programming feature available in most used programming environments such as JavaScript, Python and many others. They allow easy and compact extensibility on streams operations such as on iterators or enumerable types. Yet, two questions arise about their use: 1) are generators the most efficient choice to extend sequences with new user-defined operations? 2) What if the development programming languages does not provide the yield feature, such as in Java? The research work that I describe in this dissertation aims to answer these two questions. To that end, I analyzed two different programming languages designs for a sequence type, Java and Javascript. Also, I studied the state-of-the-art alternatives to the out-of-the-box sequences included in each language, in a set of features, devising benchmarks to analyze their performance with real world usecases, available for developers to use when choosing a sequence type according to their needs. Not only that but, I also propose my own solution of a sequence type, based on a minimalist design that both allows for verboseless extension as well as fluent chaining of new operations. My proposal aims to be as simple and transparent as possible so the developer may clearly understand what he is using. Finally, I answer the question "When should you use parallelism?" with a set of benchmarks that compare Java Streams sequential processing with its parallel counterpart.N/
Continuation-Passing C: compiling threads to events through continuations
In this paper, we introduce Continuation Passing C (CPC), a programming
language for concurrent systems in which native and cooperative threads are
unified and presented to the programmer as a single abstraction. The CPC
compiler uses a compilation technique, based on the CPS transform, that yields
efficient code and an extremely lightweight representation for contexts. We
provide a proof of the correctness of our compilation scheme. We show in
particular that lambda-lifting, a common compilation technique for functional
languages, is also correct in an imperative language like C, under some
conditions enforced by the CPC compiler. The current CPC compiler is mature
enough to write substantial programs such as Hekate, a highly concurrent
BitTorrent seeder. Our benchmark results show that CPC is as efficient, while
using significantly less space, as the most efficient thread libraries
available.Comment: Higher-Order and Symbolic Computation (2012). arXiv admin note:
substantial text overlap with arXiv:1202.324
ImageJ2: ImageJ for the next generation of scientific image data
ImageJ is an image analysis program extensively used in the biological
sciences and beyond. Due to its ease of use, recordable macro language, and
extensible plug-in architecture, ImageJ enjoys contributions from
non-programmers, amateur programmers, and professional developers alike.
Enabling such a diversity of contributors has resulted in a large community
that spans the biological and physical sciences. However, a rapidly growing
user base, diverging plugin suites, and technical limitations have revealed a
clear need for a concerted software engineering effort to support emerging
imaging paradigms, to ensure the software's ability to handle the requirements
of modern science. Due to these new and emerging challenges in scientific
imaging, ImageJ is at a critical development crossroads.
We present ImageJ2, a total redesign of ImageJ offering a host of new
functionality. It separates concerns, fully decoupling the data model from the
user interface. It emphasizes integration with external applications to
maximize interoperability. Its robust new plugin framework allows everything
from image formats, to scripting languages, to visualization to be extended by
the community. The redesigned data model supports arbitrarily large,
N-dimensional datasets, which are increasingly common in modern image
acquisition. Despite the scope of these changes, backwards compatibility is
maintained such that this new functionality can be seamlessly integrated with
the classic ImageJ interface, allowing users and developers to migrate to these
new methods at their own pace. ImageJ2 provides a framework engineered for
flexibility, intended to support these requirements as well as accommodate
future needs
- …