239 research outputs found

    SPRINT: Ultrafast protein-protein interaction prediction of the entire human interactome

    Full text link
    Proteins perform their functions usually by interacting with other proteins. Predicting which proteins interact is a fundamental problem. Experimental methods are slow, expensive, and have a high rate of error. Many computational methods have been proposed among which sequence-based ones are very promising. However, so far no such method is able to predict effectively the entire human interactome: they require too much time or memory. We present SPRINT (Scoring PRotein INTeractions), a new sequence-based algorithm and tool for predicting protein-protein interactions. We comprehensively compare SPRINT with state-of-the-art programs on seven most reliable human PPI datasets and show that it is more accurate while running orders of magnitude faster and using very little memory. SPRINT is the only program that can predict the entire human interactome. Our goal is to transform the very challenging problem of predicting the entire human interactome into a routine task. The source code of SPRINT is freely available from github.com/lucian-ilie/SPRINT/ and the datasets and predicted PPIs from www.csd.uwo.ca/faculty/ilie/SPRINT/

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Hyper-Spectral Image Processing Using High Performance Reconfigurable Computers

    Get PDF
    The purpose of this thesis is to investigate the methods of implementing a section of a Matlab hyper-spectral image processing software application into a digital system that operates on a High Performance Reconfigurable Computer. The work presented is concerned with the architecture, the design techniques, and the models of digital systems that are necessary to achieve the best overall performance on HPRC platforms. The application is an image-processing tool that detects the tumors in a chicken using analysis of a hyper-spectral image. Analysis of the original Matlab code has shown that it gives low performance in achieving the result. The implementation is performed using a three-stage approach. In the first stage, the Matlab code is converted into C++ code in order to identify the bottlenecks that require the most resources. During the second stage, the digital system is designed to optimize the performance on a single reconfigurable computer. In the final stage of the implementation, this work explores the HPRC architecture by deploying and testing the digital design on multiple machines. The research shows that HPRC platforms grant a noticeable performance boost. Furthermore, the more hyper-spectral bands exist in the input image data, the better of the speedup can be expected from the HPRC design work

    HAPPI-2: a Comprehensive and High-quality Map of Human Annotated and Predicted Protein Interactions

    Get PDF
    BACKGROUND: Human protein-protein interaction (PPI) data is essential to network and systems biology studies. PPI data can help biochemists hypothesize how proteins form complexes by binding to each other, how extracellular signals propagate through post-translational modification of de-activated signaling molecules, and how chemical reactions are coupled by enzymes involved in a complex biological process. Our capability to develop good public database resources for human PPI data has a direct impact on the quality of future research on genome biology and medicine. RESULTS: The database of Human Annotated and Predicted Protein Interactions (HAPPI) version 2.0 is a major update to the original HAPPI 1.0 database. It contains 2,922,202 unique protein-protein interactions (PPI) linked by 23,060 human proteins, making it the most comprehensive database covering human PPI data today. These PPIs contain both physical/direct interactions and high-quality functional/indirect interactions. Compared with the HAPPI 1.0 database release, HAPPI database version 2.0 (HAPPI-2) represents a 485% of human PPI data coverage increase and a 73% protein coverage increase. The revamped HAPPI web portal provides users with a friendly search, curation, and data retrieval interface, allowing them to retrieve human PPIs and available annotation information on the interaction type, interaction quality, interacting partner drug targeting data, and disease information. The updated HAPPI-2 can be freely accessed by Academic users at http://discovery.informatics.uab.edu/HAPPI . CONCLUSIONS: While the underlying data for HAPPI-2 are integrated from a diverse data sources, the new HAPPI-2 release represents a good balance between data coverage and data quality of human PPIs, making it ideally suited for network biology

    Grid Information Technology as a New Technological Tool for e-Science, Healthcare and Life Science

    Get PDF
    Nowadays, scientific projects require collaborative environments and powerful computing resources capable of handling huge quantities of data, which gives rise to e-Science. These requirements are evident in the need to optimise time and efforts in activities to do with health. When e-Science focuses on the collaborative handling of all the information generated in clinical medicine and health, e-Health is the result. Scientists are taking increasing interest in an emerging technology – Grid Information Technology – that may offer a solution to their current needs. The current work aims to survey how e-Science is using this technology all around the world. We also argue that the technology may provide an ideal solution for the new challenges facing e-Health and Life Science.Hoy en día, los proyectos científicos requieren poderosos recursos de computación capaces de manejar grandes cantidades de datos, los cuales han dado paso a la ciencia electrónica (e-ciencia). Estos requerimientos se hacen evidentes en la necesidad de optimizar tiempo y esfuerzos en actividades relacionadas con la salud. Cuando la e-ciencia se enfoca en el manejo colaborativo de toda la información generada en la medicina clínica y la salud, da como resultado la salud electrónica (e-salud). Los científicos se han interesado cada vez más y más en una tecnología emergente, como lo es la Tecnología de información en red, la que puede ofrecer solución a sus necesidades cotidianas. El siguiente trabajo apunta a examinar como la e-ciencia es empleada en el mundo. También se discute que la tecnología puede proveer una solución ideal para encarar nuevos desafíos en e-salud y Ciencias de la Vida.Nowadays, scientific projects require collaborative environments and powerful computing resources capable of handling huge quantities of data, which gives rise to e-Science. These requirements are evident in the need to optimise time and efforts in activities to do with health. When e-Science focuses on the collaborative handling of all the information generated in clinical medicine and health, e-Health is the result. Scientists are taking increasing interest in an emerging technology – Grid Information Technology – that may offer a solution to their current needs. The current work aims to survey how e-Science is using this technology all around the world. We also argue that the technology may provide an ideal solution for the new challenges facing e-Health and Life Science

    Disease re-classi cation via integration of biological networks

    Get PDF
    Currently, human diseases are classi ed as they were in the late 19th century, by considering only symptoms of the a ected organ. With a growing body of transcriptomic, proteomic, metabolomic and genomics data sets describing diseases, we ask whether the old classi cation still holds in the light of modern biological data. These large-scale and complex biological data can be viewed as networks of inter-connected elements. We propose to rede ne human disease classi cation by considering diseases as systemslevel disorders of the entire cellular system. To do this, we will integrate di erent types of biological data mentioned above. A network-based mathematical model will be designed to represent these integrated data, and computational algorithms and tools will be developed and implemented for its analysis. In this report, a review of the research progress so far will be presented, including 1) a detailed statement of the research problem, 2) a literature survey on relative research topics, 3) reports of on-going work, and 4) future research plans.

    DASMIweb: online integration, analysis and assessment of distributed protein interaction data

    Get PDF
    In recent years, we have witnessed a substantial increase of the amount of available protein interaction data. However, most data are currently not readily accessible to the biologist at a single site, but scattered over multiple online repositories. Therefore, we have developed the DASMIweb server that affords the integration, analysis and qualitative assessment of distributed sources of interaction data in a dynamic fashion. Since DASMIweb allows for querying many different resources of protein and domain interactions simultaneously, it serves as an important starting point for interactome studies and assists the user in finding publicly accessible interaction data with minimal effort. The pool of queried resources is fully configurable and supports the inclusion of own interaction data or confidence scores. In particular, DASMIweb integrates confidence measures like functional similarity scores to assess individual interactions. The retrieved results can be exported in different file formats like MITAB or SIF. DASMIweb is freely available at http://www.dasmiweb.de

    alfaNET: A Database of Alfalfa-Bacterial Stem Blight Protein–Protein Interactions Revealing the Molecular Features of the Disease-Causing Bacteria

    Get PDF
    Alfalfa has emerged as one of the most important forage crops, owing to its wide adaptation and high biomass production worldwide. In the last decade, the emergence of bacterial stem blight (caused by Pseudomonas syringae pv. syringae ALF3) in alfalfa has caused around 50% yield losses in the United States. Studies are being conducted to decipher the roles of the key genes and pathways regulating the disease, but due to the sparse knowledge about the infection mechanisms of Pseudomonas, the development of resistant cultivars is hampered. The database alfaNET is an attempt to assist researchers by providing comprehensive Pseudomonas proteome annotations, as well as a host–pathogen interactome tool, which predicts the interactions between host and pathogen based on orthology. alfaNET is a user-friendly and efficient tool and includes other features such as subcellular localization annotations of pathogen proteins, gene ontology (GO) annotations, network visualization, and effector protein prediction. Users can also browse and search the database using particular keywords or proteins with a specific length. Additionally, the BLAST search tool enables the user to perform a homology sequence search against the alfalfa and Pseudomonas proteomes. With the successful implementation of these attributes, alfaNET will be a beneficial resource to the research community engaged in implementing molecular strategies to mitigate the disease. alfaNET is freely available for public use at http://bioinfo.usu.edu/alfanet/

    Studies on distributed approaches for large scale multi-criteria protein structure comparison and analysis

    Get PDF
    Protein Structure Comparison (PSC) is at the core of many important structural biology problems. PSC is used to infer the evolutionary history of distantly related proteins; it can also help in the identification of the biological function of a new protein by comparing it with other proteins whose function has already been annotated; PSC is also a key step in protein structure prediction, because one needs to reliably and efficiently compare tens or hundreds of thousands of decoys (predicted structures) in evaluation of 'native-like' candidates (e.g. Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiment). Each of these applications, as well as many others where molecular comparison plays an important role, requires a different notion of similarity, which naturally lead to the Multi-Criteria Protein Structure Comparison (MC-PSC) problem. ProCKSI (www.procksi.org), was the first publicly available server to provide algorithmic solutions for the MC-PSC problem by means of an enhanced structural comparison that relies on the principled application of information fusion to similarity assessments derived from multiple comparison methods (e.g. USM, FAST, MaxCMO, DaliLite, CE and TMAlign). Current MC-PSC works well for moderately sized data sets and it is time consuming as it provides public service to multiple users. Many of the structural bioinformatics applications mentioned above would benefit from the ability to perform, for a dedicated user, thousands or tens of thousands of comparisons through multiple methods in real-time, a capacity beyond our current technology. This research is aimed at the investigation of Grid-styled distributed computing strategies for the solution of the enormous computational challenge inherent in MC-PSC. To this aim a novel distributed algorithm has been designed, implemented and evaluated with different load balancing strategies and selection and configuration of a variety of software tools, services and technologies on different levels of infrastructures ranging from local testbeds to production level eScience infrastructures such as the National Grid Service (NGS). Empirical results of different experiments reporting on the scalability, speedup and efficiency of the overall system are presented and discussed along with the software engineering aspects behind the implementation of a distributed solution to the MC-PSC problem based on a local computer cluster as well as with a GRID implementation. The results lead us to conclude that the combination of better and faster parallel and distributed algorithms with more similarity comparison methods provides an unprecedented advance on protein structure comparison and analysis technology. These advances might facilitate both directed and fortuitous discovery of protein similarities, families, super-families, domains, etc, and also help pave the way to faster and better protein function inference, annotation and protein structure prediction and assessment thus empowering the structural biologist to do a science that he/she would not have done otherwise
    corecore