Search CORE

1,649 research outputs found

Domain fusion analysis by applying relational algebra to protein sequence and domain databases

Author: Ikura Mitsuhiko
Truong Kevin
Publication venue: BioMed Central
Publication date: 01/01/2003
Field of study

BACKGROUND: Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. RESULTS: This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at . CONCLUSION: As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time

University of Toronto Research Repository

Springer

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A Data Transformation System for Biological Data Sources

Author: Buneman Peter
Davidson Susan
Hart Kyle
Overton Chris
Wong L.
Publication venue
Publication date: 01/01/1995
Field of study

Scientific data of importance to biologists in the Human Genome Project resides not only in conventional databases, but in structured files maintained in a number of different formats (e.g. ASN.1 and ACE) as well a.s sequence analysis packages (e.g. BLAST and FASTA). These formats and packages contain a number of data types not found in conventional databases, such as lists and variants, and may be deeply nested. We present in this paper techniques for querying and transforming such data, and illustrate their use in a prototype system developed in conjunction with the Human Genome Center for Chromosome 22. We also describe optimizations performed by the system, a crucial issue for bulk data

CiteSeerX

Edinburgh Research Explorer

ScholarlyCommons@Penn

SAFE Software and FED Database to Uncover Protein-Protein Interactions using Gene Fusion Analysis

Author: Danos Vasilis
Karagouni Amalia D.
Kissa Maria
Kossida Sophia
Koumandou V. Lila
Trimpalis Philip
Tsagrasoulis Dimosthenis
Tsakalidis Athanasios
Publication venue: Libertas Academica
Publication date: 01/12/2011
Field of study

Domain Fusion Analysis takes advantage of the fact that certain proteins in a given proteome A, are found to have statistically significant similarity with two separate proteins in another proteome B. In other words, the result of a fusion event between two separate proteins in proteome B is a specific full-length protein in proteome A. In such a case, it can be safely concluded that the protein pair has a common biological function or even interacts physically. In this paper, we present the Fusion Events Database (FED), a database for the maintenance and retrieval of fusion data both in prokaryotic and eukaryotic organisms and the Software for the Analysis of Fusion Events (SAFE), a computational platform implemented for the automated detection, filtering and visualization of fusion events (both available at: http://www.bioacademy.gr/bioinformatics/projects/ProteinFusion/index.htm). Finally, we analyze the proteomes of three microorganisms using these tools in order to demonstrate their functionality

CiteSeerX

Directory of Open Access Journals

PubMed Central

Digital Repository of Hellenic Managing Authority of the Operational Programme "Education and Lifelong Learning" (EDULLL)

BISON: bio-interface for the semi-global analysis of network patterns

Author: Besemann Christopher
Carr Nathan J
Denton Anne
Prüβ Birgit M
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The large amount of genomics data that have accumulated over the past decade require extensive data mining. However, the global nature of data mining, which includes pattern mining, poses difficulties for users who want to study specific questions in a more local environment. This creates a need for techniques that allow a localized analysis of globally determined patterns. RESULTS: We developed a tool that determines and evaluates global patterns based on protein property and network information, while providing all the benefits of a perspective that is targeted at biologist users with specific goals and interests. Our tool uses our own data mining techniques, integrated into current visualization and navigation techniques. The functionality of the tool is discussed in the context of the transcriptional network of regulation in the enteric bacterium Escherichia coli. Two biological questions were asked: (i) Which functional categories of proteins (identified by hidden Markov models) are regulated by a regulator with a specific domain? (ii) Which regulators are involved in the regulation of proteins that contain a common hidden Markov model? Using these examples, we explain the gene-centered and pattern-centered analysis that the tool permits. CONCLUSION: In summary, we have a tool that can be used for a wide variety of applications in biology, medicine, or agriculture. The pattern mining engine is global in the way that patterns are determined across the entire network. The tool still permits a localized analysis for users who want to analyze a subportion of the total network. We have named the tool BISON (Bio-Interface for the Semi-global analysis Of Network patterns)

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

CODA: Accurate Detection of Functional Associations between Proteins in Eukaryotic Genomes Using Domain Fusion

Author: Adam J. Reid
AJ Enright
AJ Enright
Andrew B. Clegg
B Snel
C von Mering
C Yeats
Christine A. Orengo
CJ Marcotte
DE Barnes
EM Marcotte
F Bellivier
G Apic
I Yanai
Juan A. G. Ranea
K Truong
M Huynen
Magnus Rattray
P Resnik
PM Bowers
PW Lord
RD Finn
RD Finn
S Hoffman
SF Altschul
SK Kummerfeld
TF Smith
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Background: In order to understand how biological systems function it is necessary to determine the interactions and associations between proteins. Gene fusion prediction is one approach to detection of such functional relationships. Its use is however known to be problematic in higher eukaryotic genomes due to the presence of large homologous domain families. Here we introduce CODA (Co-Occurrence of Domains Analysis), a method to predict functional associations based on the gene fusion idiom.Methodology/Principal Findings: We apply a novel scoring scheme which takes account of the genome-specific size of homologous domain families involved in fusion to improve accuracy in predicting functional associations. We show that CODA is able to accurately predict functional similarities in human with comparison to state-of-the-art methods and show that different methods can be complementary. CODA is used to produce evidence that a currently uncharacterised human protein may be involved in pathways related to depression and that another is involved in DNA replication.Conclusions/Significance: The relative performance of different gene fusion methodologies has not previously been explored. We find that they are largely complementary, with different methods being more or less appropriate in different genomes. Our method is the only one currently available for download and can be run on an arbitrary dataset by the user. The CODA software and datasets are freely available from ftp://ftp.biochem.ucl.ac.uk/pub/gene3d_data/v6.1.0/CODA/. Predictions are also available via web services from http://funcnet.eu/

CiteSeerX

Public Library of Science (PLOS)

Crossref

PubMed Central

UCL Discovery

Large Scale Data Analytics with Language Integrated Query

Author: Cho Chung Yik
Publication venue: Curtin University
Publication date: 01/01/2018
Field of study

Databases can easily reach petabytes (1,048,576 gigabytes) in scale. A system to enable users to efficiently retrieve or query data from multiple databases simultaneously is needed. This research introduces a new, cloud-based query framework, designed and built using Language Integrated Query, to query existing data sources without the need to integrate or restructure existing databases. Protein data obtained through the query framework proves its feasibility and cost effectiveness

espace@Curtin

Identification of Potential Drug Targets Implicated in Parkinson's Disease from Human Genome: Insights of Using Fused Domains in Hypothetical Proteins as Probes

Author: Khanduja Varun
Nagendra H. G.
Nirmala K. A.
Rathankar N.
Publication venue: International Scholarly Research Network
Publication date
Field of study

High-throughput genome sequencing has led to data explosion in sequence databanks, with an imbalance of sequence-structure-function relationships, resulting in a substantial fraction of proteins known as hypothetical proteins. Functions of such proteins can be assigned based on the analysis and characterization of the domains that they are made up of. Domains are basic evolutionary units of proteins and most proteins contain multiple domains. A subset of multidomain proteins is fused domains (overlapping domains), wherein sequence overlaps between two or more domains occur. These fused domains are a result of gene fusion events and their implication in diseases is well established. Hence, an attempt has been made in this paper to identify the fused domain containing hypothetical proteins from human genome homologous to parkinsonian targets present in KEGG database. The results of this research identified 18 hypothetical proteins, with domains fused with ubiquitin domains and having homology with targets present in parkinsonian pathway

Crossref

PubMed Central

XML-based approaches for the integration of heterogeneous bio-molecular data

Author: Berlanga-Llavori Rafael
Jiménez-Ruiz Ernesto
Manset David
Mesiti Marco
Perlasca Paolo
Sanz Ismael
Valentini Giorgio
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background: The today's public database infrastructure spans a very large collection of heterogeneous biological data, opening new opportunities for molecular biology, bio-medical and bioinformatics research, but raising also new problems for their integration and computational processing. Results: In this paper we survey the most interesting and novel approaches for the representation, integration and management of different kinds of biological data by exploiting XML and the related recommendations and approaches. Moreover, we present new and interesting cutting edge approaches for the appropriate management of heterogeneous biological data represented through XML. Conclusion: XML has succeeded in the integration of heterogeneous biomolecular information, and has established itself as the syntactic glue for biological data sources. Nevertheless, a large variety of XML-based data formats have been proposed, thus resulting in a difficult effective integration of bioinformatics data schemes. The adoption of a few semantic-rich standard formats is urgent to achieve a seamless integration of the current biological resources. </p

CiteSeerX

City Research Online

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

AIR Universita degli studi di Milano

Springer - Publisher Connector

PubMed Central

Repositori Institucional de la Universitat Jaume I

Oxford University Research Archive

Recommended from our members

Molecular characterization and evolutionary plasticity of protein-protein interfaces

Author: Bickerton George Richard James
Publication venue: University of Cambridge
Publication date: 01/01/2010
Field of study

Abstract The sequencing of the human genome provides the parts list for understanding cellular processes. However, as 70% of eukaryotic genes work through multi-protein systems, it is only through detailed study of the interactions of these components, that a more complete, systems-level understanding can be gained. This thesis is centred on the establishment of PICCOLO - a comprehensive database of structurally characterized protein interactions. In generating the resource, issues of interface definition, quaternary structure, data redundancy, structural environment and interaction type are addressed. The resource enables a variety of analyses to be performed concerning interface properties including residue propensity, hydropathy, polarity, interface size, sequence entropy and residue contact preference. PICCOLO has been applied to probing the patterns of substitutions that are accepted in protein interfaces across evolution, and whether these patterns are distinguishable from those seen in other structural environments. The derivation of a high-quality set of multiple structural alignments in the form of the database TOCCATA, a prerequisite for such analysis, is described, as well as procedures to derive environment-specific substitution tables. The Blundell group has contributed a series of methods to predict the likely effect of non-synonymous Single Nucleotide Polymorphisms (nsSNPs) on protein stability, function and interactions in order to triage the large volumes of data created from high-throughput genetic screening studies, enabling prioritization of those nsSNPs most likely to be phenotypically detrimental. PICCOLO's contribution to these predictions is described. Historically there has been little focus on protein-protein interactions as drug targets for small-molecule therapeutics. However, alanine-scanning mutagenesis studies have revealed that only a subset of residues contribute the greater part of free energy to binding - so-called "hot-spots". Molecular characterization of hot-spots performed using PICCOLO, probes the molecular basis underlying this important phenomenon leading to the possibility of predictive methods to identify hot-spots 'in silico'

Apollo (Cambridge)

OpenGrey Repository