28 research outputs found
A Web portal to simplify the scientific communities in using Grid and Cloud resources
The modern scientific applications demand increasing availability of computing and storage resources in order to collect and analyse big volume of data that often the single laboratories are not able to provide. Distributed computing models have proved to be a valid and effective solution. Proofs are the Grid, widely used in the high energy physics experiments, and Cloud solutions that are showing an increasing acceptance.
These infrastructures require robust Authentication and Authorization mechanisms. The X.509 certificate is the standard used to authenticate Grid users and although it represents a valid security mechanism, many communities complain about the difficulty of handling digital certificates and the complexity of the Grid middleware. These represent the main obstacles to the full exploitation of computing and data distributed infrastructures.
In order to simplify the use of these resources it has been developed a Web-based portal that provides users with several important functionalities such as job and workflow submission, interactive service and data management for both Grid and Cloud environments. The thesis describes the Portal architecture, its features, the main benefits for users and the custom views which have been defined and tested in collaboration with some communities to address relevant use cases
ENABLING GENERIC DISTRIBUTED COMPUTING INFRASTRUCTURE COMPATIBILITY FOR WORKFLOW MANAGEMENT SYSTEMS
Solving workflow management system’s Distributed Computing Infrastructure (DCI) incompatibility and their workflow interoperability issues are very challenging and complex tasks. Workflow management systems (and therefore their workflows, workflow developers and also their end-users) are bounded tightly to some limited number of supported DCIs, and efforts required to allow additional DCI support. In this paper we are specifying a concept how to enable generic DCI compatibility for grid workflow management systems (such as ASKALON, MOTEUR, gUSE/WS-PGRADE, etc.) on job and indirectly on workflow level. To enable DCI compatibility among the different workflow management systems we have developed the DCI Bridge software solution. In this paper we will describe its internal architecture, provide usage scenarios to show how the developed service resolve the DCI interoperability issues between various middleware types. The generic DCI Bridge service enables the execution of jobs onto the existing major DCI platforms (such as Service Grids (Globus Toolkit 2 and 4, gLite, ARC, UNICORE), Desktop Grids, Web services, or even cloud based DCIs)
Recommended from our members
A methodology for developing scientific software applications in science gateways
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonDistributed Computing Infrastructures (DCIs) have emerged as a viable and affordable solution to the computing needs of communities of practice that may require the need to improve system performance or enhance the availability of their scientific applications. According to the literature, the ease of access and several other issues which relate to the interoperability among different resources are the biggest challenges surrounding the use of these infrastructures. The traditional method of using a Command Line Interface (CLI) to access these resources is difficult and can make the learning curve quite steep. This approach can result in the low uptake of DCIs as it prevents potential users of the infrastructures from adopting the technology. Science Gateways have emerged as a viable option that are used to realise the high-level scientific domain-specific user interfaces that hide all the details of the underlying infrastructures and expose only the science-specific aspects of the scientific applications to be executed in the various DCIs. A Science Gateway is a digital interface to advanced technologies which is used to provide adequate support for science and engineering research and education. The focus of this study therefore is to propose and implement a Methodology for dEveloping Scientific Software Applications in science GatEways (MESSAGE). This will be achieved by testing an approach which is considered to be appropriate for developing applications in Science Gateways. In the course of this study, several Science Gateway functionalities obtained from the review of literature which may be utilised to provide services for different communities of practice are highlighted. To implement the identified functionalities, this study utilises the methodology for developing scientific software applications in Science Gateways. In order to achieve this purpose, this research therefore adopts the Catania Science Gateway Framework (CSGF) and the Future Gateway approach to implement the methods and ideas described in the proposed methodology, as well the essential services of Science Gateways discussed throughout the thesis. In addition, three different set of scientific software applications are utilised for the implementation of the proposed methodology. While the first application primarily serves as the case study for implementing the methodology discussed in this thesis, a second application is used to evaluate the entire process. Furthermore, several other real-life scientific applications developed (using two distinctly different Science Gateway frameworks) are also utilised for the purpose of evaluation. Subsequently, a revised MESSAGE methodology for developing scientific software applications in Science Gateways is discussed in the latter Chapter of this thesis. Following from the implementation of both scientific software applications which sees the use of portlets to execute single experiments, a study was also conducted to investigate ways in which Science Gateways may be utilised for the execution of multiple experiments in a distributed environment. Finally, similar to making different scientific software applications accessible and available (worldwide) to the communities that need them, the processes involved in making their associated research outputs (such as data, software and results) easily accessible and readily available are also discussed. The main contribution of this thesis is the MESSAGE methodology for developing scientific software applications in Science Gateways. Other contributions which are also made in different aspects of this research include a framework of the essential services required in generic Science Gateways and an approach to developing and executing multiple experiments (via Science Gateway interfaces) within a distributed environment. To a lesser extent, this study also utilises the Open Access Document Repository (OADR) (and other related technologies) to demonstrate accessibility and availability of research outputs associated with specific scientific software applications, thereby introducing the concept (and thus laying the foundation) of an Open Science research
The CloudSME Simulation Platform and its Applications: A Generic Multi-cloud Platform for Developing and Executing Commercial Cloud-based Simulations
Simulation is used in industry to study a large variety of problems ranging from increasing the productivity of a manufacturing system to optimizing the design of a wind turbine. However, some simulation models can be computationally demanding and some simulation projects require time consuming experimentation. High performance computing infrastructures such as clusters can be used to speed up the execution of large models or multiple experiments but at a cost that is often too much for Small and Medium-sized Enterprises (SMEs). Cloud computing presents an attractive, lower cost alternative. However, developing a cloud-based simulation application can again be costly for an SME due to training and development needs, especially if software vendors need to use resources of different heterogeneous clouds to avoid being locked-in to one particular cloud provider. In an attempt to reduce the cost of development of commercial cloud-based simulations, the CloudSME Simulation Platform (CSSP) has been developed as a generic approach that combines an AppCenter with the workflow of the WS-PGRADE/gUSE science gateway framework and the multi-cloud-based capabilities of the CloudBroker Platform. The paper presents the CSSP and two representative case studies from distinctly different areas that illustrate how commercial multi-cloud-based simulations can be created
Computational Methods for Interactive and Explorative Study Design and Integration of High-throughput Biological Data
The increase in the use of high-throughput methods to gain insights into biological systems has come with new challenges. Genomics, transcriptomics, proteomics, and metabolomics lead to a massive amount of data and metadata. While this wealth of information has resulted in many scientific discoveries, new strategies are needed to cope with the ever-growing variety and volume of metadata. Despite efforts to standardize the collection of study metadata, many experiments cannot be reproduced or replicated. One reason for this is the difficulty to provide the necessary metadata. The large sample sizes that modern omics experiments enable, also make it increasingly complicated for scientists to keep track of every sample and the needed annotations. The many data transformations that are often needed to normalize and analyze omics data require a further collection of all parameters and tools involved. A second possible cause is missing knowledge about statistical design of studies, both related to study factors as well as the required sample size to make significant discoveries.
In this thesis, we develop a multi-tier model for experimental design and a portlet for interactive web-based study design. Through the input of experimental factors and the number of replicates, users can easily create large, factorial experimental designs. Changes or additional metadata can be quickly uploaded via user-defined spreadsheets including sample identifiers. In order to comply with existing standards and provide users with a quick way to import existing studies, we provide full interoperability with the ISA-Tab format. We show that both data model and portlet are easily extensible to create additional tiers of samples annotated with technology-specific metadata.
We tackle the problem of unwieldy experimental designs by creating an aggregation graph. Based on our multi-tier experimental design model, similar samples, their sources, and analytes are summarized, creating an interactive summary graph that focuses on study factors and replicates. Thus, we give researchers a quick overview of sample sizes and the aim of different studies. This graph can be included in our portlets or used as a stand alone application and is compatible with the ISA-Tab format. We show that this approach can be used to explore the quality of publicly available experimental designs and metadata annotation.
The third part of this thesis contributes to a more statistically sound experiment planning for differential gene expression experiments. We integrate two tools for the prediction of statistical power and sample size estimation into our portal. This integration enables the use of existing data, in order to arrive at more accurate calculation for sample variability. Additionally, the statistical power of existing experimental designs of certain sample sizes can be analyzed. All results and parameters are stored and can be used for later comparison.
Even perfectly planned and annotated experiments cannot eliminate human error. Based on our model we develop an automated workflow for microarray quality control, enabling users to inspect the quality of normalization and cluster samples by study factor levels. We import a publicly available microarray dataset to assess our contributions to reproducibility and explore alternative analysis methods based on statistical power analysis
Recommended from our members
The CloudSME Simulation Platform and its Applications: A Generic Multi-cloud Platform for Developing and Executing Commercial Cloud-based Simulations
© 2018 The Authors. Simulation is used in industry to study a large variety of problems ranging from increasing the productivity of a manufacturing system to optimizing the design of a wind turbine. However, some simulation models can be computationally demanding and some simulation projects require time consuming experimentation. High performance computing infrastructures such as clusters can be used to speed up the execution of large models or multiple experiments but at a cost that is often too much for Small and Medium-sized Enterprises (SMEs). Cloud computing presents an attractive, lower cost alternative. However, developing a cloud-based simulation application can again be costly for an SME due to training and development needs, especially if software vendors need to use resources of different heterogeneous clouds to avoid being locked-in to one particular cloud provider. In an attempt to reduce the cost of development of commercial cloud-based simulations, the CloudSME Simulation Platform (CSSP) has been developed as a generic approach that combines an AppCenter with the workflow of the WS-PGRADE/gUSE science gateway framework and the multi-cloud-based capabilities of the CloudBroker Platform. The paper presents the CSSP and two representative case studies from distinctly different areas that illustrate how commercial multi-cloud-based simulations can be created.EU FP7 CloudSME project (Project no. 608886), EU H2020 COLA project (Project no. 731574), and EU H2020 CloudiFacturing project (Project no. 768892)
Recommended from our members
Computational analysis of CpG site DNA methylation
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Epigenetics is the study of factors that can change DNA and passed to next generation without change to DNA sequence. DNA methylation is one of the categories of epigenetic change. DNA methylation is the attachment of methyl group (CH3) to DNA. Most of the time it occurs in the sequences that G is followed by C known as CpG sites and by addition of methyl to the cytosine residue. As science and technology progress new data are available about individual’s DNA methylation profile in different conditions. Also new features discovered that can have role in DNA methylation. The availability of new data on DNA methylation and other features of DNA provide challenge to bioinformatics and the opportunity to discover new knowledge from existing data. In this research multiple data series were used to identify classes of methylation DNA to CpG sites. These classes are a) Never methylated CpG sites,b) Always methylated CpG sites, c) Methylated CpG sites in cancer/disease samples and non-methylated in normal samples d) Methylated CpG sites in normal samples and non-methylated in cancer/disease samples. After identification of these sites and their classes, an analysis was carried out to find the features which can better classify these sites a matrix of features was generated using four applications in EMBOSS software suite. Features matrix was also generated using the gUse/WS-PGRADE portal workflow system. In order to do this each of the four applications were grid enabled and ported to BOINC platform. The gUse portal was connected to the BOINC project via 3G-bridge. Each node in the workflow created portion of matrix and then these portions were combined together to create final matrix. This final feature matrix used in a hill climbing workflow. Hill climbing node was a JAVA program ported to BOINC platform. A Hill climbing search workflow was used to search for a subset of features that are better at classifying the CpG sites using 5 different measurements and three different classification methods: support vector machine, naïve bayes and J48 decision tree. Using this approach the hill climbing search found the models which contain less than half the number of features and better classification results. It is also been demonstrated that using gUse/WS-PGRADE workflow system can provide a modular way of feature generation so adding new feature generator application can be done without changing other parts. It is also shown that using grid enabled applications can speedup both feature generation and feature subset selection. The approach used in this research for distributed workflow based feature generation is not restricted to this study and can be applied in other studies that involve feature generation. The approach also needs multiple binaries to generate portions of features. The grid enabled hill climbing search application can also be used in different context as it only requires to follow the same format of feature matrix