58 research outputs found
Using indirect protein interactions for the prediction of Gene Ontology functions
10.1186/1471-2105-8-S4-S8BMC Bioinformatics8SUPPL. 4BBMI
Microbial community pattern detection in human body habitats via ensemble clustering framework
The human habitat is a host where microbial species evolve, function, and
continue to evolve. Elucidating how microbial communities respond to human
habitats is a fundamental and critical task, as establishing baselines of human
microbiome is essential in understanding its role in human disease and health.
However, current studies usually overlook a complex and interconnected
landscape of human microbiome and limit the ability in particular body habitats
with learning models of specific criterion. Therefore, these methods could not
capture the real-world underlying microbial patterns effectively. To obtain a
comprehensive view, we propose a novel ensemble clustering framework to mine
the structure of microbial community pattern on large-scale metagenomic data.
Particularly, we first build a microbial similarity network via integrating
1920 metagenomic samples from three body habitats of healthy adults. Then a
novel symmetric Nonnegative Matrix Factorization (NMF) based ensemble model is
proposed and applied onto the network to detect clustering pattern. Extensive
experiments are conducted to evaluate the effectiveness of our model on
deriving microbial community with respect to body habitat and host gender. From
clustering results, we observed that body habitat exhibits a strong bound but
non-unique microbial structural patterns. Meanwhile, human microbiome reveals
different degree of structural variations over body habitat and host gender. In
summary, our ensemble clustering framework could efficiently explore integrated
clustering results to accurately identify microbial communities, and provide a
comprehensive view for a set of microbial communities. Such trends depict an
integrated biography of microbial communities, which offer a new insight
towards uncovering pathogenic model of human microbiome.Comment: BMC Systems Biology 201
Decomposing PPI networks for complex discovery
<p>Abstract</p> <p>Background</p> <p>Protein complexes are important for understanding principles of cellular organization and functions. With the availability of large amounts of high-throughput protein-protein interactions (PPI), many algorithms have been proposed to discover protein complexes from PPI networks. However, existing algorithms generally do not take into consideration the fact that not all the interactions in a PPI network take place at the same time. As a result, predicted complexes often contain many spuriously included proteins, precluding them from matching true complexes.</p> <p>Results</p> <p>We propose two methods to tackle this problem: (1) The localization GO term decomposition method: We utilize cellular component Gene Ontology (GO) terms to decompose PPI networks into several smaller networks such that the proteins in each decomposed network are annotated with the same cellular component GO term. (2) The hub removal method: This method is based on the observation that hub proteins are more likely to fuse clusters that correspond to different complexes. To avoid this, we remove hub proteins from PPI networks, and then apply a complex discovery algorithm on the remaining PPI network. The removed hub proteins are added back to the generated clusters afterwards. We tested the two methods on the yeast PPI network downloaded from BioGRID. Our results show that these methods can improve the performance of several complex discovery algorithms significantly. Further improvement in performance is achieved when we apply them in tandem.</p> <p>Conclusions</p> <p>The performance of complex discovery algorithms is hindered by the fact that not all the interactions in a PPI network take place at the same time. We tackle this problem by using localization GO terms or hubs to decompose a PPI network before complex discovery, which achieves considerable improvement.</p
Supervised maximum-likelihood weighting of composite protein networks for complex prediction
10.1186/1752-0509-6-S2-S13BMC Systems Biology6SUPPL.2
A Common Class of Transcripts with 5\u27-Intron Depletion, Distinct Early Coding Sequence Features, and N1-Methyladenosine Modification [preprint]
Introns are found in 5\u27 untranslated regions (5\u27UTRs) for 35% of all human transcripts. These 5\u27UTR introns are not randomly distributed: genes that encode secreted, membrane-bound and mitochondrial proteins are less likely to have them. Curiously, transcripts lacking 5\u27UTR introns tend to harbor specific RNA sequence elements in their early coding regions. To model and understand the connection between coding-region sequence and 5\u27UTR intron status, we developed a classifier that can predict 5\u27UTR intron status with \u3e80% accuracy using only sequence features in the early coding region. Thus, the classifier identifies transcripts with 5\u27 proximal-intron-minus-like-coding regions ( 5IM transcripts). Unexpectedly, we found that the early coding sequence features defining 5IM transcripts are widespread, appearing in 21% of all human RefSeq transcripts. The 5IM class of transcripts is enriched for non-AUG start codons, more extensive secondary structure both preceding the start codon and near the 5\u27 cap, greater dependence on eIF4E for translation, and association with ER-proximal ribosomes. 5IM transcripts are bound by the Exon Junction Complex (EJC) at non-canonical 5\u27 proximal positions. Finally, N1-methyladenosines are specifically enriched in the early coding regions of 5IM transcripts. Taken together, our analyses point to the existence of a distinct 5IM class comprising ~20% of human transcripts. This class is defined by depletion of 5\u27 proximal introns, presence of specific RNA sequence features associated with low translation efficiency, N1-methyladenosines in the early coding region, and enrichment for non-canonical binding by the Exon Junction Complex
A common class of transcripts with 5\u27-intron depletion, distinct early coding sequence features, and N1-methyladenosine modification
Introns are found in 5\u27 untranslated regions (5\u27UTRs) for 35% of all human transcripts. These 5\u27UTR introns are not randomly distributed: Genes that encode secreted, membrane-bound and mitochondrial proteins are less likely to have them. Curiously, transcripts lacking 5\u27UTR introns tend to harbor specific RNA sequence elements in their early coding regions. To model and understand the connection between coding-region sequence and 5\u27UTR intron status, we developed a classifier that can predict 5\u27UTR intron status with \u3e 80% accuracy using only sequence features in the early coding region. Thus, the classifier identifies transcripts with 5\u27 proximal-intron-minus-like-coding regions ( 5IM transcripts). Unexpectedly, we found that the early coding sequence features defining 5IM transcripts are widespread, appearing in 21% of all human RefSeq transcripts. The 5IM class of transcripts is enriched for non-AUG start codons, more extensive secondary structure both preceding the start codon and near the 5\u27 cap, greater dependence on eIF4E for translation, and association with ER-proximal ribosomes. 5IM transcripts are bound by the exon junction complex (EJC) at noncanonical 5\u27 proximal positions. Finally, N1-methyladenosines are specifically enriched in the early coding regions of 5IM transcripts. Taken together, our analyses point to the existence of a distinct 5IM class comprising approximately 20% of human transcripts. This class is defined by depletion of 5\u27 proximal introns, presence of specific RNA sequence features associated with low translation efficiency, N1-methyladenosines in the early coding region, and enrichment for noncanonical binding by the EJC
A Resource of Quantitative Functional Annotation for Homo sapiens Genes
The body of human genomic and proteomic evidence continues to grow at ever-increasing rates, while annotation efforts struggle to keep pace. A surprisingly small fraction of human genes have clear, documented associations with specific functions, and new functions continue to be found for characterized genes. Here we assembled an integrated collection of diverse genomic and proteomic data for 21,341 human genes and make quantitative associations of each to 4333 Gene Ontology terms. We combined guilt-by-profiling and guilt-by-association approaches to exploit features unique to the data types. Performance was evaluated by cross-validation, prospective validation, and by manual evaluation with the biological literature. Functional-linkage networks were also constructed, and their utility was demonstrated by identifying candidate genes related to a glioma FLN using a seed network from genome-wide association studies. Our annotations are presented—alongside existing validated annotations—in a publicly accessible and searchable web interface
Recommended from our members
Systematic Exploration of Synergistic Drug Pairs
Drug synergy allows a therapeutic effect to be achieved with lower doses of component drugs. Drug synergy can result when drugs target the products of genes that act in parallel pathways (‘specific synergy’). Such cases of drug synergy should tend to correspond to synergistic genetic interaction between the corresponding target genes. Alternatively, ‘promiscuous synergy’ can arise when one drug non-specifically increases the effects of many other drugs, for example, by increased bioavailability. To assess the relative abundance of these drug synergy types, we examined 200 pairs of antifungal drugs in S. cerevisiae. We found 38 antifungal synergies, 37 of which were novel. While 14 cases of drug synergy corresponded to genetic interaction, 92% of the synergies we discovered involved only six frequently synergistic drugs. Although promiscuity of four drugs can be explained under the bioavailability model, the promiscuity of Tacrolimus and Pentamidine was completely unexpected. While many drug synergies correspond to genetic interactions, the majority of drug synergies appear to result from non-specific promiscuous synergy
- …