41 research outputs found
Distributed Web Service Coordination for Collaboration Applications and Biological Workflows
In this dissertation work, we have investigated the main research thrust of decentralized coordination of workflows over web services. To address distributed workflow coordination, first we have developed “Web Coordination Bonds” as a capable set of dependency modeling primitives that enable each web service to manage its own dependencies. Web bond primitives are as powerful as extended Petri nets and have sufficient modeling and expressive capabilities to model workflow dependencies. We have designed and prototyped our “Web Service Coordination Management Middleware” (WSCMM) system that enhances current web services infrastructure to accommodate web bond enabled web services. Finally, based on core concepts of web coordination bonds and WSCMM, we have developed the “BondFlow” system that allows easy configuration distributed coordination of workflows. The footprint of the BonFlow runtime is 24KB and the additional third party software packages, SOAP client and XML parser, account for 115KB
An evaluation of galaxy and ruffus-scripting workflows system for DNA-seq analysis
>Magister Scientiae - MScFunctional genomics determines the biological functions of genes on a global scale by
using large volumes of data obtained through techniques including next-generation
sequencing (NGS). The application of NGS in biomedical research is gaining in
momentum, and with its adoption becoming more widespread, there is an increasing
need for access to customizable computational workflows that can simplify, and offer
access to, computer intensive analyses of genomic data. In this study, the Galaxy and
Ruffus frameworks were designed and implemented with a view to address the
challenges faced in biomedical research. Galaxy, a graphical web-based framework,
allows researchers to build a graphical NGS data analysis pipeline for accessible,
reproducible, and collaborative data-sharing. Ruffus, a UNIX command-line framework
used by bioinformaticians as Python library to write scripts in object-oriented style,
allows for building a workflow in terms of task dependencies and execution logic. In
this study, a dual data analysis technique was explored which focuses on a comparative
evaluation of Galaxy and Ruffus frameworks that are used in composing analysis
pipelines. To this end, we developed an analysis pipeline in Galaxy, and Ruffus, for the
analysis of Mycobacterium tuberculosis sequence data. Furthermore, this study aimed
to compare the Galaxy framework to Ruffus with preliminary analysis revealing that the
analysis pipeline in Galaxy displayed a higher percentage of load and store instructions.
In comparison, pipelines in Ruffus tended to be CPU bound and memory intensive. The
CPU usage, memory utilization, and runtime execution are graphically represented in
this study. Our evaluation suggests that workflow frameworks have distinctly different
features from ease of use, flexibility, and portability, to architectural designs
A Semantic Framework for Declarative and Procedural Knowledge
In any scientic domain, the full set of data and programs has reached an-ome status, i.e. it has grown massively. The original article on the Semantic Web describes the evolution of a Web of actionable information, i.e.\ud
information derived from data through a semantic theory for interpreting the symbols. In a Semantic Web, methodologies are studied for describing, managing and analyzing both resources (domain knowledge) and applications (operational knowledge) - without any restriction on what and where they\ud
are respectively suitable and available in the Web - as well as for realizing automatic and semantic-driven work\ud
ows of Web applications elaborating Web resources.\ud
This thesis attempts to provide a synthesis among Semantic Web technologies, Ontology Research, Knowledge and Work\ud
ow Management. Such a synthesis is represented by Resourceome, a Web-based framework consisting of two components which strictly interact with each other: an ontology-based and domain-independent knowledge manager system (Resourceome KMS) - relying on a knowledge model where resource and operational knowledge are contextualized in any domain - and a semantic-driven work ow editor, manager and agent-based execution system (Resourceome WMS).\ud
The Resourceome KMS and the Resourceome WMS are exploited in order to realize semantic-driven formulations of work\ud
ows, where activities are semantically linked to any involved resource. In the whole, combining the use of domain ontologies and work ow techniques, Resourceome provides a exible domain and operational knowledge organization, a powerful engine for semantic-driven work\ud
ow composition, and a distributed, automatic and\ud
transparent environment for work ow execution
Graphical programming system for dataflow language
Dataflow languages are languages that support the notion of data flowing from one operation to another. The flow concept gives dataflow languages the advantage of representing dataflow programs in graphical forms. This thesis presents a graphical programming system that supports the editing and simulating of dataflow programs. The system is implemented on an AT&T UnixTM PC.
A high level graphical dataflow language, GDF language, is defined in this thesis. In GDF language, all the operators are represented in graphical forms. A graphical dataflow program is formed by drawing the operators and connecting the arcs in the Graphical Editor which is provided by the system. The system also supports a simulator for simulating the execution of a dataflow program. It will allow a user to discover the power of concurrency and parallel processing. Several simulation control options are offered to facilitate the debugging of dataflow programs
Enabling Security Analysis and Education of the Ethereum Platform: A Network Traffic Dissection Tool
Ethereum, the decentralized global software platform powered by blockchain technology known for its native cryptocurrency, Ether (ETH), provides a technology stack for building apps, holding assets, transacting, and communicating without control by a central authority. At the core of Ethereum’s network is a suite of purpose-built protocols known as DEVP2P, which provides the underlying nodes in an Ethereum network the ability to discover, authenticate and communicate confidentiality. This document discusses the creation of a new Wireshark dissector for DEVP2P’s discovery protocols, DiscoveryV4 and DiscoveryV5, and a dissector for RLPx, an extensible TCP transport protocol for a range of Ethereum node capabilities. Network packet dissectors like Wireshark are commonly used to educate, develop, and analyze underlying network traffic. In support of creating the dissector, a custom private Ethereum docker network was also created, facilitating the communication amongst Go Ethereum execution clients and allowing the Wireshark dissector to capture live network data. Lastly, the dissector is used to understand the differences between DiscoveryV4 and DiscoveryV5, along with stepping through the network packets of RLPx to track a transaction executed on the network
GRASP/Ada (Graphical Representations of Algorithms, Structures, and Processes for Ada): The development of a program analysis environment for Ada. Reverse engineering tools for Ada, task 1, phase 2
The study, formulation, and generation of structures for Ada (GRASP/Ada) are discussed in this second phase report of a three phase effort. Various graphical representations that can be extracted or generated from source code are described and categorized with focus on reverse engineering. The overall goal is to provide the foundation for a CASE (computer-aided software design) environment in which reverse engineering and forward engineering (development) are tightly coupled. Emphasis is on a subset of architectural diagrams that can be generated automatically from source code with the control structure diagram (CSD) included for completeness
Next-generation information systems for genomics
NIH Grant no. HG00739The advent of next-generation sequencing technologies is transforming
biology by enabling individual researchers to sequence the
genomes of individual organisms or cells on a massive scale. In order
to realize the translational potential of this technology we will need
advanced information systems to integrate and interpret this deluge
of data. These systems must be capable of extracting the location and
function of genes and biological features from genomic data, requiring
the coordinated parallel execution of multiple bioinformatics analyses
and intelligent synthesis of the results. The resulting databases must
be structured to allow complex biological knowledge to be recorded
in a computable way, which requires the development of logic-based
knowledge structures called ontologies. To visualise and manipulate
the results, new graphical interfaces and knowledge acquisition tools
are required. Finally, to help understand complex disease processes,
these information systems must be equipped with the capability to
integrate and make inferences over multiple data sets derived from
numerous sources.
RESULTS:
Here I describe research, design and implementation of some of
the components of such a next-generation information system. I first
describe the automated pipeline system used for the annotation of
the Drosophila genome, and the application of this system in genomic
research. This was succeeded by the development of a flexible graphoriented
database system called Chado, which relies on the use of
ontologies for structuring data and knowledge. I also describe research
to develop, restructure and enhance a number of biological
ontologies, adding a layer of logical semantics that increases the computability
of these key knowledge sources. The resulting database and
ontology collection can be accessed through a suite of tools. Finally
I describe how the combination of genome analysis, ontology-based
database representation and powerful tools can be combined in order
to make inferences about genotype-phenotype relationships within and
across species.
CONCLUSION:
The large volumes of complex data generated by high-throughput
genomic and systems biology technology threatens to overwhelm us,
unless we can devise better computing tools to assist us with its analysis.
Ontologies are key technologies, but many existing ontologies are
not interoperable or lack features that make them computable. Here
I have shown how concerted ontology, tool and database development
can be applied to make inferences of value to translational research
Dataflow development of medium-grained parallel software
PhD ThesisIn the 1980s, multiple-processor computers (multiprocessors) based on conven-
tional processing elements emerged as a popular solution to the continuing demand
for ever-greater computing power. These machines offer a general-purpose parallel
processing platform on which the size of program units which can be efficiently
executed in parallel - the "grain size" - is smaller than that offered by distributed
computing environments, though greater than that of some more specialised
architectures. However, programming to exploit this medium-grained parallelism
remains difficult. Concurrent execution is inherently complex, yet there is a lack of
programming tools to support parallel programming activities such as program
design, implementation, debugging, performance tuning and so on.
In helping to manage complexity in sequential programming, visual tools have
often been used to great effect, which suggests one approach towards the goal of
making parallel programming less difficult.
This thesis examines the possibilities which the dataflow paradigm has to offer
as the basis for a set of visual parallel programming tools, and presents a dataflow
notation designed as a framework for medium-grained parallel programming. The
implementation of this notation as a programming language is discussed, and its
suitability for the medium-grained level is examinedScience and Engineering Research Council of Great Britain
EC ERASMUS schem
Quality measures and assurance for AI (Artificial Intelligence) software
This report is concerned with the application of software quality and evaluation measures to AI software and, more broadly, with the question of quality assurance for AI software. Considered are not only the metrics that attempt to measure some aspect of software quality, but also the methodologies and techniques (such as systematic testing) that attempt to improve some dimension of quality, without necessarily quantifying the extent of the improvement. The report is divided into three parts Part 1 reviews existing software quality measures, i.e., those that have been developed for, and applied to, conventional software. Part 2 considers the characteristics of AI software, the applicability and potential utility of measures and techniques identified in the first part, and reviews those few methods developed specifically for AI software. Part 3 presents an assessment and recommendations for the further exploration of this important area
A visual object-oriented environment for LISP.
by Leong Hong Va.Thesis (M.Phil.)--Chinese University of Hong Kong, 1989.Bibliography: leaves 142-146