132 research outputs found
Composition Profiler: a tool for discovery and visualization of amino acid composition differences
<p>Abstract</p> <p>Background</p> <p>Composition Profiler is a web-based tool for semi-automatic discovery of enrichment or depletion of amino acids, either individually or grouped by their physico-chemical or structural properties.</p> <p>Results</p> <p>The program takes two samples of amino acids as input: a query sample and a reference sample. The latter provides a suitable background amino acid distribution, and should be chosen according to the nature of the query sample, for example, a standard protein database (e.g. SwissProt, PDB), a representative sample of proteins from the organism under study, or a group of proteins with a contrasting functional annotation. The results of the analysis of amino acid composition differences are summarized in textual and graphical form.</p> <p>Conclusion</p> <p>As an exploratory data mining tool, our software can be used to guide feature selection for protein function or structure predictors. For classes of proteins with significant differences in frequencies of amino acids having particular physico-chemical (e.g. hydrophobicity or charge) or structural (e.g. Ξ± helix propensity) properties, Composition Profiler can be used as a rough, light-weight visual classifier.</p
The variance of identity-by-descent sharing in the Wright-Fisher model
Widespread sharing of long, identical-by-descent (IBD) genetic segments is a
hallmark of populations that have experienced recent genetic drift. Detection
of these IBD segments has recently become feasible, enabling a wide range of
applications from phasing and imputation to demographic inference. Here, we
study the distribution of IBD sharing in the Wright-Fisher model. Specifically,
using coalescent theory, we calculate the variance of the total sharing between
random pairs of individuals. We then investigate the cohort-averaged sharing:
the average total sharing between one individual and the rest of the cohort. We
find that for large cohorts, the cohort-averaged sharing is distributed
approximately normally. Surprisingly, the variance of this distribution does
not vanish even for large cohorts, implying the existence of "hyper-sharing"
individuals. The presence of such individuals has consequences for the design
of sequencing studies, since, if they are selected for whole-genome sequencing,
a larger fraction of the cohort can be subsequently imputed. We calculate the
expected gain in power of imputation by IBD, and subsequently, in power to
detect an association, when individuals are either randomly selected or
specifically chosen to be the hyper-sharing individuals. Using our framework,
we also compute the variance of an estimator of the population size that is
based on the mean IBD sharing and the variance in the sharing between inbred
siblings. Finally, we study IBD sharing in an admixture pulse model, and show
that in the Ashkenazi Jewish population the admixture fraction is correlated
with the cohort-averaged sharing.Comment: Includes Supplementary Materia
DisProt: the Database of Disordered Proteins
The Database of Protein Disorder (DisProt) links structure and function information for intrinsically disordered proteins (IDPs). Intrinsically disordered proteins do not form a fixed three-dimensional structure under physiological conditions, either in their entireties or in segments or regions. We define IDP as a protein that contains at least one experimentally determined disordered region. Although lacking fixed structure, IDPs and regions carry out important biological functions, being typically involved in regulation, signaling and control. Such functions can involve high-specificity low-affinity interactions, the multiple binding of one protein to many partners and the multiple binding of many proteins to one partner. These three features are all enabled and enhanced by protein intrinsic disorder. One of the major hindrances in the study of IDPs has been the lack of organized information. DisProt was developed to enable IDP research by collecting and organizing knowledge regarding the experimental characterization and the functional associations of IDPs. In addition to being a unique source of biological information, DisProt opens doors for a plethora of bioinformatics studies. DisProt is openly available at
The unfoldomics decade: an update on intrinsically disordered proteins
Background
Our first predictor of protein disorder was published just over a decade ago in the Proceedings of the IEEE International Conference on Neural Networks (Romero P, Obradovic Z, Kissinger C, Villafranca JE, Dunker AK (1997) Identifying disordered regions in proteins from amino acid sequence. Proceedings of the IEEE International Conference on Neural Networks, 1: 90β95). By now more than twenty other laboratory groups have joined the efforts to improve the prediction of protein disorder. While the various prediction methodologies used for protein intrinsic disorder resemble those methodologies used for secondary structure prediction, the two types of structures are entirely different. For example, the two structural classes have very different dynamic properties, with the irregular secondary structure class being much less mobile than the disorder class. The prediction of secondary structure has been useful. On the other hand, the prediction of intrinsic disorder has been revolutionary, leading to major modifications of the more than 100 year-old views relating protein structure and function. Experimentalists have been providing evidence over many decades that some proteins lack fixed structure or are disordered (or unfolded) under physiological conditions. In addition, experimentalists are also showing that, for many proteins, their functions depend on the unstructured rather than structured state; such results are in marked contrast to the greater than hundred year old views such as the lock and key hypothesis. Despite extensive data on many important examples, including disease-associated proteins, the importance of disorder for protein function has been largely ignored. Indeed, to our knowledge, current biochemistry books don't present even one acknowledged example of a disorder-dependent function, even though some reports of disorder-dependent functions are more than 50 years old. The results from genome-wide predictions of intrinsic disorder and the results from other bioinformatics studies of intrinsic disorder are demanding attention for these proteins.
Results
Disorder prediction has been important for showing that the relatively few experimentally characterized examples are members of a very large collection of related disordered proteins that are wide-spread over all three domains of life. Many significant biological functions are now known to depend directly on, or are importantly associated with, the unfolded or partially folded state. Here our goal is to review the key discoveries and to weave these discoveries together to support novel approaches for understanding sequence-function relationships.
Conclusion
Intrinsically disordered protein is common across the three domains of life, but especially common among the eukaryotic proteomes. Signaling sequences and sites of posttranslational modifications are frequently, or very likely most often, located within regions of intrinsic disorder. Disorder-to-order transitions are coupled with the adoption of different structures with different partners. Also, the flexibility of intrinsic disorder helps different disordered regions to bind to a common binding site on a common partner. Such capacity for binding diversity plays important roles in both protein-protein interaction networks and likely also in gene regulation networks. Such disorder-based signaling is further modulated in multicellular eukaryotes by alternative splicing, for which such splicing events map to regions of disorder much more often than to regions of structure. Associating alternative splicing with disorder rather than structure alleviates theoretical and experimentally observed problems associated with the folding of different length, isomeric amino acid sequences. The combination of disorder and alternative splicing is proposed to provide a mechanism for easily "trying out" different signaling pathways, thereby providing the mechanism for generating signaling diversity and enabling the evolution of cell differentiation and multicellularity. Finally, several recent small molecules of interest as potential drugs have been shown to act by blocking protein-protein interactions based on intrinsic disorder of one of the partners. Study of these examples has led to a new approach for drug discovery, and bioinformatics analysis of the human proteome suggests that various disease-associated proteins are very rich in such disorder-based drug discovery targets
Protein interaction network of alternatively spliced isoforms from brain links genetic risk factors for autism
Increased risk for autism spectrum disorders (ASD) is attributed to hundreds of genetic loci. The convergence of ASD variants have been investigated using various approaches, including protein interactions extracted from the published literature. However, these datasets are frequently incomplete, carry biases and are limited to interactions of a single splicing isoform, which may not be expressed in the disease-relevant tissue. Here we introduce a new interactome mapping approach by experimentally identifying interactions between brain-expressed alternatively spliced variants of ASD risk factors. The Autism Spliceform Interaction Network reveals that almost half of the detected interactions and about 30% of the newly identified interacting partners represent contribution from splicing variants, emphasizing the importance of isoform networks. Isoform interactions greatly contribute to establishing direct physical connections between proteins from the de novo autism CNVs. Our findings demonstrate the critical role of spliceform networks for translating genetic knowledge into a better understanding of human diseases
Incorporating Distant Sequence Features and Radial Basis Function Networks to Identify Ubiquitin Conjugation Sites
Ubiquitin (Ub) is a small protein that consists of 76 amino acids about 8.5 kDa. In ubiquitin conjugation, the ubiquitin is majorly conjugated on the lysine residue of protein by Ub-ligating (E3) enzymes. Three major enzymes participate in ubiquitin conjugation. They are β E1, E2 and E3 which are responsible for activating, conjugating and ligating ubiquitin, respectively. Ubiquitin conjugation in eukaryotes is an important mechanism of the proteasome-mediated degradation of a protein and regulating the activity of transcription factors. Motivated by the importance of ubiquitin conjugation in biological processes, this investigation develops a method, UbSite, which uses utilizes an efficient radial basis function (RBF) network to identify protein ubiquitin conjugation (ubiquitylation) sites. This work not only investigates the amino acid composition but also the structural characteristics, physicochemical properties, and evolutionary information of amino acids around ubiquitylation (Ub) sites. With reference to the pathway of ubiquitin conjugation, the substrate sites for E3 recognition, which are distant from ubiquitylation sites, are investigated. The measurement of F-score in a large window size (β20βΌ+20) revealed a statistically significant amino acid composition and position-specific scoring matrix (evolutionary information), which are mainly located distant from Ub sites. The distant information can be used effectively to differentiate Ub sites from non-Ub sites. As determined by five-fold cross-validation, the model that was trained using the combination of amino acid composition and evolutionary information performs best in identifying ubiquitin conjugation sites. The prediction sensitivity, specificity, and accuracy are 65.5%, 74.8%, and 74.5%, respectively. Although the amino acid sequences around the ubiquitin conjugation sites do not contain conserved motifs, the cross-validation result indicates that the integration of distant sequence features of Ub sites can improve predictive performance. Additionally, the independent test demonstrates that the proposed method can outperform other ubiquitylation prediction tools
Contribution of proline to the pre-structuring tendency of transient helical secondary structure elements in intrinsically disordered proteins
Background: IDPs function without relying on three-dimensional structures. No clear rationale for such a behavior is available yet. PreSMos are transient secondary structures observed in the target-free IDPs and serve as the target-binding active motifs in IDPs. Prolines are frequently found in the flanking regions of PreSMos. Contribution of prolines to the conformational stability of the helical PreSMos in IDPs is investigated.
Methods: MD simulations are performed for several IDP segments containing a helical PreSMo and the flanking prolines. To measure the influence of flanking-prolines on the structural content of a helical PreSMo calculations were done for wild type as well as for mutant segments with ProβAsp, His, Lys, or Ala. The change in the helicity due to removal of a proline was measured both for the PreSMo region and for the flanking regions.
Results: The Ξ±-helical content in ~70% of the helical PreSMos at the early stage of simulation decreases due to replacement of an N-terminal flanking proline by other residues whereas the helix content in nearly all PreSMos increases when the same replacements occur at the C-terminal flanking region. The helix destabilizing/terminating role of the C-terminal flanking prolines is more pronounced than the helix promoting effect of the N-terminal flanking prolines.
General significance: This work represents a novel example demonstrating that a proline is encoded in an IDP with a defined purpose. The helical PreSMos presage their target-bound conformations. As they most likely mediate IDP-target binding via conformational selection their helical content can be an important feature for IDP function.
Keywords: Flanking proline; Intrinsically disordered protein (IDP); Molecular dynamics simulation; PreSMo (Pre-Structured Motif). Copyright Β© 2013 Elsevier B.V. All rights reserved
Rosetta FlexPepDock ab-initio: Simultaneous Folding, Docking and Refinement of Peptides onto Their Receptors
Flexible peptides that fold upon binding to another protein molecule mediate a large number of regulatory interactions in the living cell and may provide highly specific recognition modules. We present Rosetta FlexPepDock ab-initio, a protocol for simultaneous docking and de-novo folding of peptides, starting from an approximate specification of the peptide binding site. Using the Rosetta fragments library and a coarse-grained structural representation of the peptide and the receptor, FlexPepDock ab-initio samples efficiently and simultaneously the space of possible peptide backbone conformations and rigid-body orientations over the receptor surface of a given binding site. The subsequent all-atom refinement of the coarse-grained models includes full side-chain modeling of both the receptor and the peptide, resulting in high-resolution models in which key side-chain interactions are recapitulated. The protocol was applied to a benchmark in which peptides were modeled over receptors in either their bound backbone conformations or in their free, unbound form. Near-native peptide conformations were identified in 18/26 of the bound cases and 7/14 of the unbound cases. The protocol performs well on peptides from various classes of secondary structures, including coiled peptides with unusual turns and kinks. The results presented here significantly extend the scope of state-of-the-art methods for high-resolution peptide modeling, which can now be applied to a wide variety of peptide-protein interactions where no prior information about the peptide backbone conformation is available, enabling detailed structure-based studies and manipulation of those interactions
Microduplications of 16p11.2 are associated with schizophrenia
Recurrent microdeletions and microduplications of a 600 kb genomic region of chromosome 16p11.2 have been implicated in childhood-onset developmental disorders1-3. Here we report the strong association of 16p11.2 microduplications with schizophrenia in two large cohorts. In the primary sample, the microduplication was detected in 12/1906 (0.63%) cases and 1/3971 (0.03%) controls (P=1.2Γ10-5, OR=25.8). In the replication sample, the microduplication was detected in 9/2645 (0.34%) cases and 1/2420 (0.04%) controls (P=0.022, OR=8.3). For the series combined, microduplication of 16p11.2 was associated with 14.5-fold increased risk of schizophrenia (95% C.I. [3.3, 62]). A meta-analysis of multiple psychiatric disorders showed a significant association of the microduplication with schizophrenia, bipolar disorder and autism. The reciprocal microdeletion was associated only with autism and developmental disorders. Analysis of patient clinical data showed that head circumference was significantly larger in patients with the microdeletion compared with patients with the microduplication (P = 0.0007). Our results suggest that the microduplication of 16p11.2 confers substantial risk for schizophrenia and other psychiatric disorders, whereas the reciprocal microdeletion is associated with contrasting clinical features
Small RNAs and the regulation of cis-natural antisense transcripts in Arabidopsis
<p>Abstract</p> <p>Background</p> <p>In spite of large intergenic spaces in plant and animal genomes, 7% to 30% of genes in the genomes encode overlapping cis-natural antisense transcripts (cis-NATs). The widespread occurrence of cis-NATs suggests an evolutionary advantage for this type of genomic arrangement. Experimental evidence for the regulation of two cis-NAT gene pairs by natural antisense transcripts-generated small interfering RNAs (nat-siRNAs) via the RNA interference (RNAi) pathway has been reported in Arabidopsis. However, the extent of siRNA-mediated regulation of cis-NAT genes is still unclear in any genome.</p> <p>Results</p> <p>The hallmarks of RNAi regulation of NATs are 1) inverse regulation of two genes in a cis-NAT pair by environmental and developmental cues and 2) generation of siRNAs by cis-NAT genes. We examined Arabidopsis transcript profiling data from public microarray databases to identify cis-NAT pairs whose sense and antisense transcripts show opposite expression changes. A subset of the cis-NAT genes displayed negatively correlated expression profiles as well as inverse differential expression changes under at least one of the examined developmental stages or treatment conditions. By searching the <it>Arabidopsis </it>Small RNA Project (ASRP) and Massively Parallel Signature Sequencing (MPSS) small RNA databases as well as our stress-treated small RNA dataset, we found small RNAs that matched at least one gene in 646 pairs out of 1008 (64%) protein-coding cis-NAT pairs, which suggests that siRNAs may regulate the expression of many cis-NAT genes. 209 putative siRNAs have the potential to target more than one gene and half of these small RNAs could target multiple members of a gene family. Furthermore, the majority of the putative siRNAs within the overlapping regions tend to target only one transcript of a given NAT pair, which is consistent with our previous finding on salt- and bacteria-induced nat-siRNAs. In addition, we found that genes encoding plastid- or mitochondrion-targeted proteins are over-represented in the Arabidopsis cis-NATs and that 19% of sense and antisense partner genes of cis-NATs share at least one common Gene Ontology term, which suggests that they encode proteins with possible functional connection.</p> <p>Conclusion</p> <p>The negatively correlated expression patterns of sense and antisense genes as well as the presence of siRNAs in many of the cis-NATs suggest that siRNA regulation of cis-NATs via the RNAi pathway is an important gene regulatory mechanism for at least a subgroup of cis-NATs in Arabidopsis.</p
- β¦