Search CORE

239 research outputs found

SPRINT: Ultrafast protein-protein interaction prediction of the entire human interactome

Author: Ilie Lucian
Li Yiwei
Publication venue
Publication date: 18/05/2017
Field of study

Proteins perform their functions usually by interacting with other proteins. Predicting which proteins interact is a fundamental problem. Experimental methods are slow, expensive, and have a high rate of error. Many computational methods have been proposed among which sequence-based ones are very promising. However, so far no such method is able to predict effectively the entire human interactome: they require too much time or memory. We present SPRINT (Scoring PRotein INTeractions), a new sequence-based algorithm and tool for predicting protein-protein interactions. We comprehensively compare SPRINT with state-of-the-art programs on seven most reliable human PPI datasets and show that it is more accurate while running orders of magnitude faster and using very little memory. SPRINT is the only program that can predict the entire human interactome. Our goal is to transform the very challenging problem of predicting the entire human interactome into a routine task. The source code of SPRINT is freely available from github.com/lucian-ilie/SPRINT/ and the datasets and predicted PPIs from www.csd.uwo.ca/faculty/ilie/SPRINT/

arXiv.org e-Print Archive

Directory of Open Access Journals

A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

Author: Buyya Rajkumar
Ramamohanarao Kotagiri
Venugopal Srikumar
Publication venue
Publication date: 10/06/2005
Field of study

Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

arXiv.org e-Print Archive

CiteSeerX

University of Melbourne Institutional Repository

Hyper-Spectral Image Processing Using High Performance Reconfigurable Computers

Author: He Yuan
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/05/2004
Field of study

The purpose of this thesis is to investigate the methods of implementing a section of a Matlab hyper-spectral image processing software application into a digital system that operates on a High Performance Reconfigurable Computer. The work presented is concerned with the architecture, the design techniques, and the models of digital systems that are necessary to achieve the best overall performance on HPRC platforms. The application is an image-processing tool that detects the tumors in a chicken using analysis of a hyper-spectral image. Analysis of the original Matlab code has shown that it gives low performance in achieving the result. The implementation is performed using a three-stage approach. In the first stage, the Matlab code is converted into C++ code in order to identify the bottlenecks that require the most resources. During the second stage, the digital system is designed to optimize the performance on a single reconfigurable computer. In the final stage of the implementation, this work explores the HPRC architecture by deploying and testing the digital design on multiple machines. The research shows that HPRC platforms grant a noticeable performance boost. Furthermore, the more hyper-spectral bands exist in the input image data, the better of the speedup can be expected from the HPRC design work

University of Tennessee, Knoxville: Trace

HAPPI-2: a Comprehensive and High-quality Map of Human Annotated and Predicted Protein Interactions

Author: Chen Jake Y.
Nguyen Thanh M.
Pandey Ragini
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

BACKGROUND: Human protein-protein interaction (PPI) data is essential to network and systems biology studies. PPI data can help biochemists hypothesize how proteins form complexes by binding to each other, how extracellular signals propagate through post-translational modification of de-activated signaling molecules, and how chemical reactions are coupled by enzymes involved in a complex biological process. Our capability to develop good public database resources for human PPI data has a direct impact on the quality of future research on genome biology and medicine. RESULTS: The database of Human Annotated and Predicted Protein Interactions (HAPPI) version 2.0 is a major update to the original HAPPI 1.0 database. It contains 2,922,202 unique protein-protein interactions (PPI) linked by 23,060 human proteins, making it the most comprehensive database covering human PPI data today. These PPIs contain both physical/direct interactions and high-quality functional/indirect interactions. Compared with the HAPPI 1.0 database release, HAPPI database version 2.0 (HAPPI-2) represents a 485% of human PPI data coverage increase and a 73% protein coverage increase. The revamped HAPPI web portal provides users with a friendly search, curation, and data retrieval interface, allowing them to retrieve human PPIs and available annotation information on the interaction type, interaction quality, interacting partner drug targeting data, and disease information. The updated HAPPI-2 can be freely accessed by Academic users at http://discovery.informatics.uab.edu/HAPPI . CONCLUSIONS: While the underlying data for HAPPI-2 are integrated from a diverse data sources, the new HAPPI-2 release represents a good balance between data coverage and data quality of human PPIs, making it ideally suited for network biology

IUPUIScholarWorks

Springer - Publisher Connector

PubMed Central

Recommended from our members

Grid-based semantic integration of heterogeneous data resources: Implementation on a HealthGrid

Author: Naseer Aisha
Publication venue: Brunel University, School of Information Systems, Computing and Mathematics
Publication date: 01/01/2007
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel University.The semantic integration of geographically distributed and heterogeneous data resources still remains a key challenge in Grid infrastructures. Today's mainstream Grid technologies hold the promise to meet this challenge in a systematic manner, making data applications more scalable and manageable. The thesis conducts a thorough investigation of the problem, the state of the art, and the related technologies, and proposes an Architecture for Semantic Integration of Data Sources (ASIDS) addressing the semantic heterogeneity issue. It defines a simple mechanism for the interoperability of heterogeneous data sources in order to extract or discover information regardless of their different semantics. The constituent technologies of this architecture include Globus Toolkit (GT4) and OGSA-DAI (Open Grid Service Architecture Data Integration and Access) alongside other web services technologies such as XML (Extensive Markup Language). To show this, the ASIDS architecture was implemented and tested in a realistic setting by building an exemplar application prototype on a HealthGrid (pilot implementation). The study followed an empirical research methodology and was informed by extensive literature surveys and a critical analysis of the relevant technologies and their synergies. The two literature reviews, together with the analysis of the technology background, have provided a good overview of the current Grid and HealthGrid landscape, produced some valuable taxonomies, explored new paths by integrating technologies, and more importantly illuminated the problem and guided the research process towards a promising solution. Yet the primary contribution of this research is an approach that uses contemporary Grid technologies for integrating heterogeneous data resources that have semantically different. data fields (attributes). It has been practically demonstrated (using a prototype HealthGrid) that discovery in semantically integrated distributed data sources can be feasible by using mainstream Grid technologies, which have been shown to have some Significant advantages over non-Grid based approaches

Brunel University Research Archive

Grid Information Technology as a New Technological Tool for e-Science, Healthcare and Life Science

Author: Bruque Cámara Sebastián
Maqueira Marín Juan Manuel
Publication venue: Facultad de Economía y Negocios, Universidad Alberto Hurtado
Publication date: 01/06/2007
Field of study

Nowadays, scientific projects require collaborative environments and powerful computing resources capable of handling huge quantities of data, which gives rise to e-Science. These requirements are evident in the need to optimise time and efforts in activities to do with health. When e-Science focuses on the collaborative handling of all the information generated in clinical medicine and health, e-Health is the result. Scientists are taking increasing interest in an emerging technology – Grid Information Technology – that may offer a solution to their current needs. The current work aims to survey how e-Science is using this technology all around the world. We also argue that the technology may provide an ideal solution for the new challenges facing e-Health and Life Science.Hoy en día, los proyectos científicos requieren poderosos recursos de computación capaces de manejar grandes cantidades de datos, los cuales han dado paso a la ciencia electrónica (e-ciencia). Estos requerimientos se hacen evidentes en la necesidad de optimizar tiempo y esfuerzos en actividades relacionadas con la salud. Cuando la e-ciencia se enfoca en el manejo colaborativo de toda la información generada en la medicina clínica y la salud, da como resultado la salud electrónica (e-salud). Los científicos se han interesado cada vez más y más en una tecnología emergente, como lo es la Tecnología de información en red, la que puede ofrecer solución a sus necesidades cotidianas. El siguiente trabajo apunta a examinar como la e-ciencia es empleada en el mundo. También se discute que la tecnología puede proveer una solución ideal para encarar nuevos desafíos en e-salud y Ciencias de la Vida.Nowadays, scientific projects require collaborative environments and powerful computing resources capable of handling huge quantities of data, which gives rise to e-Science. These requirements are evident in the need to optimise time and efforts in activities to do with health. When e-Science focuses on the collaborative handling of all the information generated in clinical medicine and health, e-Health is the result. Scientists are taking increasing interest in an emerging technology – Grid Information Technology – that may offer a solution to their current needs. The current work aims to survey how e-Science is using this technology all around the world. We also argue that the technology may provide an ideal solution for the new challenges facing e-Health and Life Science

Directory of Open Access Journals

Journal of Technology Management & Innovation

Disease re-classi cation via integration of biological networks

Author: Larminie C
Przulj N
Sun K
Publication venue: Department of Computing, Imperial College London
Publication date: 01/06/2011
Field of study

Currently, human diseases are classi ed as they were in the late 19th century, by considering only symptoms of the a ected organ. With a growing body of transcriptomic, proteomic, metabolomic and genomics data sets describing diseases, we ask whether the old classi cation still holds in the light of modern biological data. These large-scale and complex biological data can be viewed as networks of inter-connected elements. We propose to rede ne human disease classi cation by considering diseases as systemslevel disorders of the entire cellular system. To do this, we will integrate di erent types of biological data mentioned above. A network-based mathematical model will be designed to represent these integrated data, and computational algorithms and tools will be developed and implemented for its analysis. In this report, a review of the research progress so far will be presented, including 1) a detailed statement of the research problem, 2) a literature survey on relative research topics, 3) reports of on-going work, and 4) future research plans.

Spiral - Imperial College Digital Repository

DASMIweb: online integration, analysis and assessment of distributed protein interaction data

Author: Bader
Bader
Bordner
Braun
Cline
Dowell
F. Ramirez
Guimaraes
Guldener
H. Blankenburg
Huang
J. Buch
Jothi
Kerrien
Lee
Lehner
M. Albrecht
Ng
Orchard
Prieto
Ramirez
Razick
Rhodes
Riley
Rual
Salwinski
Schlicker
Stein
Stelzl
Thornton
Venkatesan
Wang
Wu
Wuchty
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

In recent years, we have witnessed a substantial increase of the amount of available protein interaction data. However, most data are currently not readily accessible to the biologist at a single site, but scattered over multiple online repositories. Therefore, we have developed the DASMIweb server that affords the integration, analysis and qualitative assessment of distributed sources of interaction data in a dynamic fashion. Since DASMIweb allows for querying many different resources of protein and domain interactions simultaneously, it serves as an important starting point for interactome studies and assists the user in finding publicly accessible interaction data with minimal effort. The pool of queried resources is fully configurable and supports the inclusion of own interaction data or confidence scores. In particular, DASMIweb integrates confidence measures like functional similarity scores to assess individual interactions. The retrieved results can be exported in different file formats like MITAB or SIF. DASMIweb is freely available at http://www.dasmiweb.de

Crossref

PubMed Central

MPG.PuRe

alfaNET: A Database of Alfalfa-Bacterial Stem Blight Protein–Protein Interactions Revealing the Molecular Features of the Disease-Causing Bacteria

Author: Kataria Raghav
Kaundal Rakesh
Publication venue: Hosted by Utah State University Libraries
Publication date: 01/08/2021
Field of study

Alfalfa has emerged as one of the most important forage crops, owing to its wide adaptation and high biomass production worldwide. In the last decade, the emergence of bacterial stem blight (caused by Pseudomonas syringae pv. syringae ALF3) in alfalfa has caused around 50% yield losses in the United States. Studies are being conducted to decipher the roles of the key genes and pathways regulating the disease, but due to the sparse knowledge about the infection mechanisms of Pseudomonas, the development of resistant cultivars is hampered. The database alfaNET is an attempt to assist researchers by providing comprehensive Pseudomonas proteome annotations, as well as a host–pathogen interactome tool, which predicts the interactions between host and pathogen based on orthology. alfaNET is a user-friendly and efficient tool and includes other features such as subcellular localization annotations of pathogen proteins, gene ontology (GO) annotations, network visualization, and effector protein prediction. Users can also browse and search the database using particular keywords or proteins with a specific length. Additionally, the BLAST search tool enables the user to perform a homology sequence search against the alfalfa and Pseudomonas proteomes. With the successful implementation of these attributes, alfaNET will be a beneficial resource to the research community engaged in implementing molecular strategies to mitigate the disease. alfaNET is freely available for public use at http://bioinfo.usu.edu/alfanet/

Directory of Open Access Journals

PubMed Central

DigitalCommons@USU

Studies on distributed approaches for large scale multi-criteria protein structure comparison and analysis

Author: Shah Azhar Ali
Publication venue
Publication date: 01/01/2010
Field of study

Protein Structure Comparison (PSC) is at the core of many important structural biology problems. PSC is used to infer the evolutionary history of distantly related proteins; it can also help in the identification of the biological function of a new protein by comparing it with other proteins whose function has already been annotated; PSC is also a key step in protein structure prediction, because one needs to reliably and efficiently compare tens or hundreds of thousands of decoys (predicted structures) in evaluation of 'native-like' candidates (e.g. Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiment). Each of these applications, as well as many others where molecular comparison plays an important role, requires a different notion of similarity, which naturally lead to the Multi-Criteria Protein Structure Comparison (MC-PSC) problem. ProCKSI (www.procksi.org), was the first publicly available server to provide algorithmic solutions for the MC-PSC problem by means of an enhanced structural comparison that relies on the principled application of information fusion to similarity assessments derived from multiple comparison methods (e.g. USM, FAST, MaxCMO, DaliLite, CE and TMAlign). Current MC-PSC works well for moderately sized data sets and it is time consuming as it provides public service to multiple users. Many of the structural bioinformatics applications mentioned above would benefit from the ability to perform, for a dedicated user, thousands or tens of thousands of comparisons through multiple methods in real-time, a capacity beyond our current technology. This research is aimed at the investigation of Grid-styled distributed computing strategies for the solution of the enormous computational challenge inherent in MC-PSC. To this aim a novel distributed algorithm has been designed, implemented and evaluated with different load balancing strategies and selection and configuration of a variety of software tools, services and technologies on different levels of infrastructures ranging from local testbeds to production level eScience infrastructures such as the National Grid Service (NGS). Empirical results of different experiments reporting on the scalability, speedup and efficiency of the overall system are presented and discussed along with the software engineering aspects behind the implementation of a distributed solution to the MC-PSC problem based on a local computer cluster as well as with a GRID implementation. The results lead us to conclude that the combination of better and faster parallel and distributed algorithms with more similarity comparison methods provides an unprecedented advance on protein structure comparison and analysis technology. These advances might facilitate both directed and fortuitous discovery of protein similarities, families, super-families, domains, etc, and also help pave the way to faster and better protein function inference, annotation and protein structure prediction and assessment thus empowering the structural biologist to do a science that he/she would not have done otherwise

Nottingham eTheses

CiteSeerX