14 research outputs found

    The Evolution of myExperiment

    No full text
    The myExperiment social website for sharing scientific workflows, designed according to Web 2.0 principles, has grown to be the largest public repository of its kind. It is distinctive for its focus on sharing methods, its researcher-centric design and its facility to aggregate content into sharable 'research objects'. This evolution of myExperiment has occurred hand in hand with its users. myExperiment now supports Linked Data as a step toward our vision of the future research environment, which we categorise here as '3rd generation e-Research'

    Workflow-centric research objects: First class citizens in scholarly discourse.

    Get PDF
    A workflow-centric research object bundles a workflow, the provenance of the results obtained by its enactment, other digital objects that are relevant for the experiment (papers, datasets, etc.), and annotations that semantically describe all these objects. In this paper, we propose a model to specify workflow-centric research objects, and show how the model can be grounded using semantic technologies and existing vocabularies, in particular the Object Reuse and Exchange (ORE) model and the Annotation Ontology (AO).We describe the life-cycle of a research object, which resembles the life-cycle of a scienti?c experiment

    Programming patterns and development guidelines for Semantic Sensor Grids (SemSorGrid4Env)

    No full text
    The web of Linked Data holds great potential for the creation of semantic applications that can combine self-describing structured data from many sources including sensor networks. Such applications build upon the success of an earlier generation of 'rapidly developed' applications that utilised RESTful APIs. This deliverable details experience, best practice, and design patterns for developing high-level web-based APIs in support of semantic web applications and mashups for sensor grids. Its main contributions are a proposal for combining Linked Data with RESTful application development summarised through a set of design principles; and the application of these design principles to Semantic Sensor Grids through the development of a High-Level API for Observations. These are supported by implementations of the High-Level API for Observations in software, and example semantic mashups that utilise the API

    Scientific Workflows: Moving Across Paradigms

    Get PDF
    Modern scientific collaborations have opened up the opportunity to solve complex problems that require both multidisciplinary expertise and large-scale computational experiments. These experiments typically consist of a sequence of processing steps that need to be executed on selected computing platforms. Execution poses a challenge, however, due to (1) the complexity and diversity of applications, (2) the diversity of analysis goals, (3) the heterogeneity of computing platforms, and (4) the volume and distribution of data. A common strategy to make these in silico experiments more manageable is to model them as workflows and to use a workflow management system to organize their execution. This article looks at the overall challenge posed by a new order of scientific experiments and the systems they need to be run on, and examines how this challenge can be addressed by workflows and workflow management systems. It proposes a taxonomy of workflow management system (WMS) characteristics, including aspects previously overlooked. This frames a review of prevalent WMSs used by the scientific community, elucidates their evolution to handle the challenges arising with the emergence of the “fourth paradigm,” and identifies research needed to maintain progress in this area

    The building and application of a semantic platform for an e-research society

    No full text
    This thesis reviews the area of e-Research (the use of electronic infrastructure to support research) and considers how the insight gained from the development of social networking sites in the early 21st century might assist researchers in using this infrastructure. In particular it examines the myExperiment project, a website for e-Research that allows users to upload, share and annotate work flows and associated files, using a social networking framework. This Virtual Organisation (VO) supports many of the attributes required to allow a community of users to come together to build an e-Research society. The main focus of the thesis is how the emerging society that is developing out of my-Experiment could use Semantic Web technologies to provide users with a significantly richer representation of their research and research processes to better support reproducible research. One of the initial major contributions was building an ontology for myExperiment. Through this it became possible to build an API for generating and delivering this richer representation and an interface for querying it. Having this richer representation it has been possible to follow Linked Data principles to link up with other projects that have this type of representation. Doing this has allowed additional data to be provided to the user and has begun to set in context the data produced by myExperiment. The way that the myExperiment project has gone about this task and consideration of how changes may affect existing users, is another major contribution of this thesis. Adding a semantic representation to an emergent e-Research society like myExperiment,has given it the potential to provide additional applications. In particular the capability to support Research Objects, an encapsulation of a scientist's research or research process to support reproducibility. The insight gained by adding a semantic representation to myExperiment, has allowed this thesis to contribute towards the design of the architecture for these Research Objects that use similar Semantic Web technologies. The myExperiment ontology has been designed such that it can be aligned with other ontologies. Scientific Discourse, the collaborative argumentation of different claims and hypotheses, with the support of evidence from experiments, to construct, confirm or disprove theories requires the capability to represent experiments carried out in silico. This thesis discusses how, as part of the HCLS Scientific Discourse subtask group, the myExperiment ontology has begun to be aligned with other scientific discourse ontologies to provide this capability. It also compares this alignment of ontologies with the architecture for Research Objects. This thesis has also examines how myExperiment's Linked Data and that of other projects can be used in the design of novel interfaces. As a theoretical exercise, it considers how this Linked Data might be used to support a Question-Answering system, that would allow users to query myExperiment's data in a more efficient and user-friendly way. It concludes by reviewing all the steps undertaken to provide a semantic platform for an emergent e-Research society to facilitate the sharing of research and its processes to support reproducible research. It assesses their contribution to enhancing the features provided by myExperiment, as well as e-Research as a whole. It considers how the contributions provided by this thesis could be extended to produce additional tools that will allow researchers to make greater use of the rich data that is now available, in a way that enhances their research process rather than significantly changing it or adding extra workload

    A proposed model to analyse risk and return for a large computing system adoption

    No full text
    This thesis presents Organisational Sustainability Modelling (OSM), a new method to model and analyse risk and return systematically for the adoption of large systems such as Cloud Computing. Return includes improvements in technical efficiency, profitability and service. Risk includes controlled risk (risk-control rate) and uncontrolled risk (beta), although uncontrolled risk cannot be evaluated directly. Three OSM metrics, actual return value, expected return value and risk-control rate are used to calculate uncontrolled risk. The OSM data collection process in which hundreds of datasets (rows of data containing three OSM metrics in each row) are used as inputs is explained. Outputs including standard error, mean squared error, Durbin-Watson, p-value and R-squared value are calculated. Visualisation is used to illustrate quality and accuracy of data analysis. The metrics, process and interpretation of data analysis is presented and the rationale is explained in the review of the OSM method.Three case studies are used to illustrate the validity of OSM:• National Health Service (NHS) is a technical application concerned with backing up data files and focuses on improvement in efficiency.• Vodafone/Apple is a cost application and focuses on profitability.• The iSolutions Group, University of Southampton focuses on service improvement using user feedback.The NHS case study is explained in detail. The expected execution time calculated by OSM to complete all backup activity in Cloud-based systems matches actual execution time to within 0.01%. The Cloud system shows improved efficiency in both sets of comparisons. All three case studies confirm there are benefits for the adoption of a large computer system such as the Cloud. Together these demonstrations answer the two research questions for this thesis:1. How do you model and analyse risk and return on adoption of large computing systems systematically and coherently?2. Can the same method be used in risk mitigation of system adoption?Limitations of this study, a reproducibility case, comparisons with similar approaches, research contributions and future work are also presented

    Optimisation of the enactment of fine-grained distributed data-intensive work flows

    Get PDF
    The emergence of data-intensive science as the fourth science paradigm has posed a data deluge challenge for enacting scientific work-flows. The scientific community is facing an imminent flood of data from the next generation of experiments and simulations, besides dealing with the heterogeneity and complexity of data, applications and execution environments. New scientific work-flows involve execution on distributed and heterogeneous computing resources across organisational and geographical boundaries, processing gigabytes of live data streams and petabytes of archived and simulation data, in various formats and from multiple sources. Managing the enactment of such work-flows not only requires larger storage space and faster machines, but the capability to support scalability and diversity of the users, applications, data, computing resources and the enactment technologies. We argue that the enactment process can be made efficient using optimisation techniques in an appropriate architecture. This architecture should support the creation of diversified applications and their enactment on diversified execution environments, with a standard interface, i.e. a work-flow language. The work-flow language should be both human readable and suitable for communication between the enactment environments. The data-streaming model central to this architecture provides a scalable approach to large-scale data exploitation. Data-flow between computational elements in the scientific work-flow is implemented as streams. To cope with the exploratory nature of scientific work-flows, the architecture should support fast work-flow prototyping, and the re-use of work-flows and work-flow components. Above all, the enactment process should be easily repeated and automated. In this thesis, we present a candidate data-intensive architecture that includes an intermediate work-flow language, named DISPEL. We create a new fine-grained measurement framework to capture performance-related data during enactments, and design a performance database to organise them systematically. We propose a new enactment strategy to demonstrate that optimisation of data-streaming work-flows can be automated by exploiting performance data gathered during previous enactments

    Textprozessierung - Design und Applikation

    Get PDF
    Die wissenschaftliche Kommunikation und der Austausch von Forschungsergebnissen beruhte lange Zeit einzig auf der Veröffentlichung und der Rezeption von Fachbüchern und -artikeln. Erst in der jüngeren Vergangenheit wurden auch Lösungen entworfen, wie die dem Forschungsprozess zugrundeliegenden sowie die aus diesem resultierenden Daten ausgetauscht werden können. Eine zentrale Rolle spielt dabei die beständig fortschreitende Entwicklung innerhalb der Informationstechnologie. Im Rahmen dieser Arbeit wurde ein Software-System entwickelt, das es erlaubt, Experimente auszutauschen. Damit ist ein Wissenschaftler in der Lage, die Grundlage seiner empirischen Forschung direkt weiterzugeben. Dieses System ist das Text Engineering Software Laboratory, kurz Tesla. Es stellt eine Arbeitsumgebung für Wissenschaftler, die auf textuellen Daten arbeiten, bereit. Innerhalb dieser Arbeitsumgebung können in einem Client Experimente mithilfe eines graphischen Workflow-Editors sowie diverser Konfigurations-Editoren zusammengestellt werden. Diese werden auf einem Server ausgeführt und können dann wieder im Client auf unterschiedliche Arten visualisiert werden. Die Experimente werden dabei vollständig dokumentiert (Ausgangsdaten, angewendete Verfahren, Resultate). Diese Dokumentation kann exportiert und distribuiert werden, so dass die Experimente jederzeit von anderen Nutzern des Systems reproduziert werden können. Die Arbeit geht zunächst darauf ein, welche Bereiche der Wissenschaft in das Feld der Textprozessierung fallen. Daraus werden Anforderungen abgeleitet, welche von diesen Wissenschaften als Basis für Forschungen an ihrem Gegenstandsbereichen und deren Weitergabe gestellt werden. Auf dieser Grundlage wird das System Tesla vorgestellt, das den formulierten Ansprüchen gerecht wird. Dabei werden die wichtigsten Features behandelt, die Tesla dem Anwender bietet. Die Demonstration des Systems erfolgt am Beispiel einer Analyse des sogenannten Voynich-Manuskripts. Dieses Dokument wurde 1912 in Italien entdeckt wurde und stammt mutmaßlich aus dem 15. Jahrhundert. Das Manuskript enthält einen Text eines unbekannten Autors, dessen Inhalt bisher nicht entschlüsselt werden konnte. Bisher wurde auch noch kein Verschlüsselungsverfahren gefunden, das einen vergleichbaren Text erzeugt, was sich mit dieser Arbeit ändert