9,730 research outputs found

    Data access and integration in the ISPIDER proteomics grid

    Get PDF
    Grid computing has great potential for supporting the integration of complex, fast changing biological data repositories to enable distributed data analysis. One scenario where Grid computing has such potential is provided by proteomics resources which are rapidly being developed with the emergence of affordable, reliable methods to study the proteome. The protein identifications arising from these methods derive from multiple repositories which need to be integrated to enable uniform access to them. A number of technologies exist which enable these resources to be accessed in a Grid environment, but the independent development of these resources means that significant data integration challenges, such as heterogeneity and schema evolution, have to be met. This paper presents an architecture which supports the combined use of Grid data access (OGSA-DAI), Grid distributed querying (OGSA-DQP) and data integration (AutoMed) software tools to support distributed data analysis. We discuss the application of this architecture for the integration of several autonomous proteomics data resources

    VM-MAD: a cloud/cluster software for service-oriented academic environments

    Full text link
    The availability of powerful computing hardware in IaaS clouds makes cloud computing attractive also for computational workloads that were up to now almost exclusively run on HPC clusters. In this paper we present the VM-MAD Orchestrator software: an open source framework for cloudbursting Linux-based HPC clusters into IaaS clouds but also computational grids. The Orchestrator is completely modular, allowing flexible configurations of cloudbursting policies. It can be used with any batch system or cloud infrastructure, dynamically extending the cluster when needed. A distinctive feature of our framework is that the policies can be tested and tuned in a simulation mode based on historical or synthetic cluster accounting data. In the paper we also describe how the VM-MAD Orchestrator was used in a production environment at the FGCZ to speed up the analysis of mass spectrometry-based protein data by cloudbursting to the Amazon EC2. The advantages of this hybrid system are shown with a large evaluation run using about hundred large EC2 nodes.Comment: 16 pages, 5 figures. Accepted at the International Supercomputing Conference ISC13, June 17--20 Leipzig, German

    Automated analysis of quantitative image data using isomorphic functional mixed models, with application to proteomics data

    Full text link
    Image data are increasingly encountered and are of growing importance in many areas of science. Much of these data are quantitative image data, which are characterized by intensities that represent some measurement of interest in the scanned images. The data typically consist of multiple images on the same domain and the goal of the research is to combine the quantitative information across images to make inference about populations or interventions. In this paper we present a unified analysis framework for the analysis of quantitative image data using a Bayesian functional mixed model approach. This framework is flexible enough to handle complex, irregular images with many local features, and can model the simultaneous effects of multiple factors on the image intensities and account for the correlation between images induced by the design. We introduce a general isomorphic modeling approach to fitting the functional mixed model, of which the wavelet-based functional mixed model is one special case. With suitable modeling choices, this approach leads to efficient calculations and can result in flexible modeling and adaptive smoothing of the salient features in the data. The proposed method has the following advantages: it can be run automatically, it produces inferential plots indicating which regions of the image are associated with each factor, it simultaneously considers the practical and statistical significance of findings, and it controls the false discovery rate.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS407 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    From access and integration to mining of secure genomic data sets across the grid

    Get PDF
    The UK Department of Trade and Industry (DTI) funded BRIDGES project (Biomedical Research Informatics Delivered by Grid Enabled Services) has developed a Grid infrastructure to support cardiovascular research. This includes the provision of a compute Grid and a data Grid infrastructure with security at its heart. In this paper we focus on the BRIDGES data Grid. A primary aim of the BRIDGES data Grid is to help control the complexity in access to and integration of a myriad of genomic data sets through simple Grid based tools. We outline these tools, how they are delivered to the end user scientists. We also describe how these tools are to be extended in the BBSRC funded Grid Enabled Microarray Expression Profile Search (GEMEPS) to support a richer vocabulary of search capabilities to support mining of microarray data sets. As with BRIDGES, fine grain Grid security underpins GEMEPS

    Educating the educators: Incorporating bioinformatics into biological science education in Malaysia

    Get PDF
    Bioinformatics can be defined as a fusion of computational and biological sciences. The urgency to process and analyse the deluge of data created by proteomics and genomics studies has caused bioinformatics to gain prominence and importance. However, its multidisciplinary nature has created a unique demand for specialist trained in both biology and computing. In this review, we described the components that constitute the bioinformatics field and distinctive education criteria that are required to produce individuals with bioinformatics training. This paper will also provide an introduction and overview of bioinformatics in Malaysia. The existing bioinformatics scenario in Malaysia was surveyed to gauge its advancement and to plan for future bioinformatics education strategies. For comparison, we surveyed methods and strategies used in education by other countries so that lessons can be learnt to further improve the implementation of bioinformatics in Malaysia. It is believed that accurate and sufficient steerage from the academia and industry will enable Malaysia to produce quality bioinformaticians in the future

    The Parallelism Motifs of Genomic Data Analysis

    Get PDF
    Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

    XML in Motion from Genome to Drug

    Get PDF
    Information technology (IT) has emerged as a central to the solution of contemporary genomics and drug discovery problems. Researchers involved in genomics, proteomics, transcriptional profiling, high throughput structure determination, and in other sub-disciplines of bioinformatics have direct impact on this IT revolution. As the full genome sequences of many species, data from structural genomics, micro-arrays, and proteomics became available, integration of these data to a common platform require sophisticated bioinformatics tools. Organizing these data into knowledgeable databases and developing appropriate software tools for analyzing the same are going to be major challenges. XML (eXtensible Markup Language) forms the backbone of biological data representation and exchange over the internet, enabling researchers to aggregate data from various heterogeneous data resources. The present article covers a comprehensive idea of the integration of XML on particular type of biological databases mainly dealing with sequence-structure-function relationship and its application towards drug discovery. This e-medical science approach should be applied to other scientific domains and the latest trend in semantic web applications is also highlighted

    Lapatinib-binding protein kinases in the African trypanosome: identification of cellular targets for kinase-directed chemical scaffolds.

    Get PDF
    Human African trypanosomiasis is caused by the eukaryotic microbe Trypanosoma brucei. To discover new drugs against the disease, one may use drugs in the clinic for other indications whose chemical scaffolds can be optimized via a medicinal chemistry campaign to achieve greater potency against the trypanosome. Towards this goal, we tested inhibitors of human EGFR and/or VEGFR as possible anti-trypanosome compounds. The 4-anilinoquinazolines canertinib and lapatinib, and the pyrrolopyrimidine AEE788 killed bloodstream T. brucei in vitro with GI(50) in the low micromolar range. Curiously, the genome of T. brucei does not encode EGFR or VEGFR, indicating that the drugs recognize alternate proteins. To discover these novel targets, a trypanosome lysate was adsorbed to an ATP-sepharose matrix and washed with a high salt solution followed by nicotinamide adenine dinucleotide (NAD(+)). Proteins that remained bound to the column were eluted with drugs, and identified by mass spectrometry/bioinformatics. Lapatinib bound to Tb927.4.5180 (termed T. brucei lapatinib-binding protein kinase-1 (TbLBPK1)) while AEE788 bound Tb927.5.800 (TbLBPK2). When the NAD(+) wash was omitted from the protocol, AEE788, canertinib and lapatinib eluted TbLBPK1, TbLBPK2, and Tb927.3.1570 (TbLBPK3). In addition, both canertinib and lapatinib eluted Tb10.60.3140 (TbLBPK4), whereas only canertinib desorbed Tb10.61.1880 (TbCBPK1). Lapatinib binds to a unique conformation of protein kinases. To gain insight into the structural basis for lapatinib interaction with TbLBPKs, we constructed three-dimensional models of lapatinib•TbLBPK complexes, which confirmed that TbLBPKs can adopt lapatinib-compatible conformations. Further, lapatinib, AEE788, and canertinib were docked to TbLBPKs with favorable scores. Our studies (a) present novel targets of kinase-directed drugs in the trypanosome, and (b) offer the 4-anilinoquinazoline and pyrrolopyrimidines as scaffolds worthy of medicinal chemistry and structural biology campaigns to develop them into anti-trypanosome drugs
    corecore