Search CORE

14,452 research outputs found

Provenance-based validation of E-science experiments

Author: Klaus-peter Zauner
Luc Moreau
Paul Groth
Simon Miles
Sylvia C. Wong
Weijian Fang
Publication venue: Springer
Publication date: 01/01/2005
Field of study

E-Science experiments typically involve many distributed services maintained by different organisations. After an experiment has been executed, it is useful for a scientist to verify that the execution was performed correctly or is compatible with some existing experimental criteria or standards. Scientists may also want to review and verify experiments performed by their colleagues. There are no existing frameworks for validating such experiments in today's e-Science systems. Users therefore have to rely on error checking performed by the services, or adopt other ad hoc methods. This paper introduces a platform-independent framework for validating workflow executions. The validation relies on reasoning over the documented provenance of experiment results and semantic descriptions of services advertised in a registry. This validation process ensures experiments are performed correctly, and thus results generated are meaningful. The framework is tested in a bioinformatics application that performs protein compressibility analysis

CiteSeerX

Southampton (e-Prints Soton)

King's Research Portal

Many-Task Computing and Blue Waters

Author: Armstrong Timothy G.
Katz Daniel S.
Wilde Michael
Wozniak Justin M.
Zhang Zhao
Publication venue
Publication date: 01/01/2012
Field of study

This report discusses many-task computing (MTC) generically and in the context of the proposed Blue Waters systems, which is planned to be the largest NSF-funded supercomputer when it begins production use in 2012. The aim of this report is to inform the BW project about MTC, including understanding aspects of MTC applications that can be used to characterize the domain and understanding the implications of these aspects to middleware and policies. Many MTC applications do not neatly fit the stereotypes of high-performance computing (HPC) or high-throughput computing (HTC) applications. Like HTC applications, by definition MTC applications are structured as graphs of discrete tasks, with explicit input and output dependencies forming the graph edges. However, MTC applications have significant features that distinguish them from typical HTC applications. In particular, different engineering constraints for hardware and software must be met in order to support these applications. HTC applications have traditionally run on platforms such as grids and clusters, through either workflow systems or parallel programming systems. MTC applications, in contrast, will often demand a short time to solution, may be communication intensive or data intensive, and may comprise very short tasks. Therefore, hardware and software for MTC must be engineered to support the additional communication and I/O and must minimize task dispatch overheads. The hardware of large-scale HPC systems, with its high degree of parallelism and support for intensive communication, is well suited for MTC applications. However, HPC systems often lack a dynamic resource-provisioning feature, are not ideal for task communication via the file system, and have an I/O system that is not optimized for MTC-style applications. Hence, additional software support is likely to be required to gain full benefit from the HPC hardware

arXiv.org e-Print Archive

CiteSeerX

IPD - the Immuno Polymorphism Database

Author: Marsh S.G.E.
Robinson J.
Stoehr P.
Waller M.J.
Publication venue
Publication date: 01/01/2005
Field of study

The Immuno Polymorphism Database (IPD) (http://www.ebi.ac.uk/ipd/) is a set of specialist databases related to the study of polymorphic genes in the immune system. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of Killer-cell Immunoglobulin-like Receptors; IPD-MHC, a database of sequences of the Major Histocompatibility Complex of different species; IPD-HPA, alloantigens expressed only on platelets; and IPD-ESTAB, which provides access to the European Searchable Tumour Cell-Line Database, a cell bank of immunologically characterized melanoma cell lines. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. Those sections with similar data, such as IPD-KIR and IPD-MHC share the same database structure. The sharing of a common database structure makes it easier to implement common tools for data submission and retrieval. The data are currently available online from the website and ftp directory; files will also be made available in different formats to download from the website and ftp server. The data will also be included in SRS, BLAST and FASTA search engines at the European Bioinformatics Institute

UCL Discovery

Semantic Description, Publication and Discovery of Workflows in myGrid

Author: Goble Carole
Lord Phillip
Miles Simon
Moreau Luc
Papay Juri
Wroe Chris
Publication venue: s.n.
Publication date: 01/01/2004
Field of study

The bioinformatics scientific process relies on in silico experiments, which are experiments executed in full in a computational environment. Scientists wish to encode the designs of these experiments as workflows because they provide minimal, declarative descriptions of the designs, overcoming many barriers to the sharing and re-use of these designs between scientists and enable the use of the most appropriate services available at any one time. We anticipate that the number of workflows will increase quickly as more scientists begin to make use of existing workflow construction tools to express their experiment designs. Discovery then becomes an increasingly hard problem, as it becomes more difficult for a scientist to identify the workflows relevant to their particular research goals amongst all those on offer. While many approaches exist for the publishing and discovery of services, there have been few attempts to address where and how authors of experimental designs should advertise the availability of their work or how relevant workflows can be discovered with minimal effort from the user. As the users designing and adapting experiments will not necessarily have a computer science background, we also have to consider how publishing and discovery can be achieved in such a way that they are not required to have detailed technical knowledge of workflow scripting languages. Furthermore, we believe they should be able to make use of others' expert knowledge (the semantics) of the given scientific domain. In this paper, we define the issues related to the semantic description, publishing and discovery of workflows, and demonstrate how the architecture created by the myGrid project aids scientists in this process. We give a walk-through of how users can construct, publish, annotate, discover and enact workflows via the user interfaces of the myGrid architecture; we then describe novel middleware protocols, making use of the Semantic Web technologies RDF and OWL to support workflow publishing and discovery

CiteSeerX

Southampton (e-Prints Soton)

XML in Motion from Genome to Drug

Author: C. Gopi Mohan
Publication venue
Publication date: 28/06/2007
Field of study

Information technology (IT) has emerged as a central to the solution of contemporary genomics and drug discovery problems. Researchers involved in genomics, proteomics, transcriptional profiling, high throughput structure determination, and in other sub-disciplines of bioinformatics have direct impact on this IT revolution. As the full genome sequences of many species, data from structural genomics, micro-arrays, and proteomics became available, integration of these data to a common platform require sophisticated bioinformatics tools. Organizing these data into knowledgeable databases and developing appropriate software tools for analyzing the same are going to be major challenges. XML (eXtensible Markup Language) forms the backbone of biological data representation and exchange over the internet, enabling researchers to aggregate data from various heterogeneous data resources. The present article covers a comprehensive idea of the integration of XML on particular type of biological databases mainly dealing with sequence-structure-function relationship and its application towards drug discovery. This e-medical science approach should be applied to other scientific domains and the latest trend in semantic web applications is also highlighted

Crossref

Nature Precedings

gcodeml: A Grid-enabled Tool for Detecting Positive Selection in Biological Evolution

Author: Castella Briséïs
Kuzniar Arnold
Maffioletti Sergio
Moretti Sébastien
Murri Riccardo
Robinson-Rechavi Marc
Salamin Nicolas
Stockinger Heinz
Publication venue
Publication date: 01/01/2012
Field of study

One of the important questions in biological evolution is to know if certain changes along protein coding genes have contributed to the adaptation of species. This problem is known to be biologically complex and computationally very expensive. It, therefore, requires efficient Grid or cluster solutions to overcome the computational challenge. We have developed a Grid-enabled tool (gcodeml) that relies on the PAML (codeml) package to help analyse large phylogenetic datasets on both Grids and computational clusters. Although we report on results for gcodeml, our approach is applicable and customisable to related problems in biology or other scientific domains.Comment: 10 pages, 4 figures. To appear in the HealthGrid 2012 con

arXiv.org e-Print Archive

Serveur académique lausannois

Charge environments around phosphorylation sites in proteins

Author: Kitchen James
Saunders Rebecca E.
Warwicker Jim
Publication venue: BioMed Central Ltd.
Publication date: 01/01/2008
Field of study

Background: Phosphorylation is a central feature in many biological processes. Structural analyses have identified the importance of charge-charge interactions, for example mediating phosphorylation-driven allosteric change and protein binding to phosphopeptides. Here, we examine computationally the prevalence of charge stabilisation around phosphorylated sites in the structural database, through comparison with locations that are not phosphorylated in the same structures. Results: A significant fraction of phosphorylated sites appear to be electrostatically stabilised, largely through interaction with sidechains. Some examples of stabilisation across a subunit interface are evident from calculations with biological units. When considering the immediately surrounding environment, in many cases favourable interactions are only apparent after conformational change that accompanies phosphorylation. A simple calculation of potential interactions at longer-range, applied to non-phosphorylated structures, recovers the separation exhibited by phosphorylated structures. In a study of sites in the Phospho.ELM dataset, for which structural annotation is provided by non-phosphorylated proteins, there is little separation of the known phospho-acceptor sites relative to background, even using the wider interaction radius. However, there are differences in the distributions of patch polarity for acceptor and background sites in the Phospho.ELM dataset. Conclusion: In this study, an easy to implement procedure is developed that could contribute to the identification of phospho-acceptor sites associated with charge-charge interactions and conformational change. Since the method gives information about potential anchoring interactions subsequent to phosphorylation, it could be combined with simulations that probe conformational change. Our analysis of the Phospho.ELM dataset also shows evidence for mediation of phosphorylation effects through (i) conformational change associated with making a solvent inaccessible phospho-acceptor site accessible, and (ii) modulation of protein-protein interactions

Springer - Publisher Connector

PubMed Central

Warwick Research Archives Portal Repository

The University of Manchester - Institutional Repository

Recommended from our members

ARMC 5 Variants and Risk of Hypertension in Blacks: MH- GRID Study.

Author: Berthon Annabel
Davis Adam R
Faucz Fabio R
Gaye Amadou
Gibbons Gary H
Hannah-Shmouni Fady
Lodish Maya B
Stratakis Constantine A
Zilbermint Mihail
Publication venue: eScholarship, University of California
Publication date: 01/07/2019
Field of study

Background We recently found that ARMC 5 variants may be associated with primary aldosteronism in blacks. We investigated a cohort from the MH - GRID (Minority Health Genomics and Translational Research Bio-Repository Database) and tested the association between ARMC 5 variants and blood pressure in black s. Methods and Results Whole exome sequencing data of 1377 black s were analyzed. Target single-variant and gene-based association analyses of hypertension were performed for ARMC 5, and replicated in a subset of 3015 individuals of African descent from the UK Biobank cohort. Sixteen rare variants were significantly associated with hypertension ( P=0.0402) in the gene-based (optimized sequenced kernel association test) analysis; the 16 and one other, rs116201073, together, showed a strong association ( P=0.0003) with blood pressure in this data set. The presence of the rs116201073 variant was associated with lower blood pressure. We then used human embryonic kidney 293 and adrenocortical H295R cells transfected with an ARMC 5 construct containing rs116201073 (c.*920T>C). The latter was common in both the discovery ( MH - GRID ) and replication ( UK Biobank) data and reached statistical significance ( P=0.044 [odds ratio, 0.7] and P=0.007 [odds ratio, 0.76], respectively). The allele carrying rs116201073 increased levels of ARMC5 mRNA , consistent with its protective effect in the epidemiological data. Conclusions ARMC 5 shows an association with hypertension in black s when rare variants within the gene are considered. We also identified a protective variant of the ARMC 5 gene with an effect on ARMC 5 expression confirmed in vitro. These results extend our previous report of ARMC 5's possible involvement in the determination of blood pressure in blacks

eScholarship - University of California