34 research outputs found
Bifurcation-based parameter tuning in a model of the GnRH pulse and surge generator
We investigate a model of the GnRH pulse and surge generator, with the
definite aim of constraining the model GnRH output with respect to a
physiologically relevant list of specifications. The alternating pulse and
surge pattern of secretion results from the interaction between a GnRH
secreting system and a regulating system exhibiting fast-slow dynamics. The
mechanisms underlying the behavior of the model are reminded from the study of
the Boundary-Layer System according to the "dissection method" principle. Using
singular perturbation theory, we describe the sequence of bifurcations
undergone by the regulating (FitzHugh-Nagumo) system, encompassing the rarely
investigated case of homoclinic connexion. Basing on pure dynamical
considerations, we restrict the space of parameter search for the regulating
system and describe a foliation of this restricted space, whose leaves define
constant duration ratios between the surge and the pulsatility phase in the
whole system. We propose an algorithm to fix the parameter values to also meet
the other prescribed ratios dealing with amplitude and frequency features of
the secretion signal. We finally apply these results to illustrate the dynamics
of GnRH secretion in the ovine species and the rhesus monkey
Mining protein loops using a structural alphabet and statistical exceptionality
<p>Abstract</p> <p>Background</p> <p>Protein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied.</p> <p>Results</p> <p>We developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times). Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words). These structural words have low structural variability (mean RMSd of 0.85 Å). As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues) and long loops. Moreover, half of recurrent motifs exhibit a significant level of amino-acid conservation with at least four significant positions and 87% of long loops contain at least one such word. We complement our analysis with the detection of statistically over-represented patterns of structural letters as in conventional DNA sequence analysis. About 30% (930) of structural words are over-represented, and cover about 40% of loop lengths. Interestingly, these words exhibit lower structural variability and higher sequential specificity, suggesting structural or functional constraints.</p> <p>Conclusions</p> <p>We developed a method to systematically decompose and study protein loops using recurrent structural motifs. This method is based on the structural alphabet HMM-SA and not on structural alignment and geometrical parameters. We extracted meaningful structural motifs that are found in both short and long loops. To our knowledge, it is the first time that pattern mining helps to increase the signal-to-noise ratio in protein loops. This finding helps to better describe protein loops and might permit to decrease the complexity of long-loop analysis. Detailed results are available at <url>http://www.mti.univ-paris-diderot.fr/publication/supplementary/2009/ACCLoop/</url>.</p
Combined assessment of DYRK1A, BDNF and homocysteine levels as diagnostic marker for Alzheimer’s disease
Early identification of Alzheimer’s disease (AD) risk factors would aid development of interventions to delay the onset of dementia, but current biomarkers are invasive and/or costly to assess. Validated plasma biomarkers would circumvent these challenges. We previously identified the kinase DYRK1A in plasma. To validate DYRK1A as a biomarker for AD diagnosis, we assessed the levels of DYRK1A and the related markers brain-derived neurotrophic factor (BDNF) and homocysteine in two unrelated AD patient cohorts with age-matched controls. Receiver-operating characteristic curves and logistic regression analyses showed that combined assessment of DYRK1A, BDNF and homocysteine has a sensitivity of 0.952, a specificity of 0.889 and an accuracy of 0.933 in testing for AD. The blood levels of these markers provide a diagnosis assessment profile. Combined assessment of these three markers outperforms most of the previous markers and could become a useful substitute to the current panel of AD biomarkers. These results associate a decreased level of DYRK1A with AD and challenge the use of DYRK1A inhibitors in peripheral tissues as treatment. These measures will be useful for diagnosis purposes.This work was supported by the FEANS. We acknowledge the platform accommodation and animal testing of the animal facility at the Institute Jacques-Monod
(University Paris Diderot) and the FlexStation3 facility of the Functional and Adaptive
Biology (BFA) LaboratoryPeer reviewe
Exact distribution of a pattern in a set of random sequences generated by a Markov source: applications to biological data
<p>Abstract</p> <p>Background</p> <p>In bioinformatics it is common to search for a pattern of interest in a potentially large set of rather short sequences (upstream gene regions, proteins, exons, etc.). Although many methodological approaches allow practitioners to compute the distribution of a pattern count in a random sequence generated by a Markov source, no specific developments have taken into account the counting of occurrences in a set of independent sequences. We aim to address this problem by deriving efficient approaches and algorithms to perform these computations both for low and high complexity patterns in the framework of homogeneous or heterogeneous Markov models.</p> <p>Results</p> <p>The latest advances in the field allowed us to use a technique of optimal Markov chain embedding based on deterministic finite automata to introduce three innovative algorithms. Algorithm 1 is the only one able to deal with heterogeneous models. It also permits to avoid any product of convolution of the pattern distribution in individual sequences. When working with homogeneous models, Algorithm 2 yields a dramatic reduction in the complexity by taking advantage of previous computations to obtain moment generating functions efficiently. In the particular case of low or moderate complexity patterns, Algorithm 3 exploits power computation and binary decomposition to further reduce the time complexity to a logarithmic scale. All these algorithms and their relative interest in comparison with existing ones were then tested and discussed on a toy-example and three biological data sets: structural patterns in protein loop structures, PROSITE signatures in a bacterial proteome, and transcription factors in upstream gene regions. On these data sets, we also compared our exact approaches to the tempting approximation that consists in concatenating the sequences in the data set into a single sequence.</p> <p>Conclusions</p> <p>Our algorithms prove to be effective and able to handle real data sets with multiple sequences, as well as biological patterns of interest, even when the latter display a high complexity (PROSITE signatures for example). In addition, these exact algorithms allow us to avoid the edge effect observed under the single sequence approximation, which leads to erroneous results, especially when the marginal distribution of the model displays a slow convergence toward the stationary distribution. We end up with a discussion on our method and on its potential improvements.</p
Improving model construction of profile HMMs for remote homology detection through structural alignment
<p>Abstract</p> <p>Background</p> <p>Remote homology detection is a challenging problem in Bioinformatics. Arguably, profile Hidden Markov Models (pHMMs) are one of the most successful approaches in addressing this important problem. pHMM packages present a relatively small computational cost, and perform particularly well at recognizing remote homologies. This raises the question of whether structural alignments could impact the performance of pHMMs trained from proteins in the <it>Twilight Zone</it>, as structural alignments are often more accurate than sequence alignments at identifying motifs and functional residues. Next, we assess the impact of using structural alignments in pHMM performance.</p> <p>Results</p> <p>We used the SCOP database to perform our experiments. Structural alignments were obtained using the 3DCOFFEE and MAMMOTH-mult tools; sequence alignments were obtained using CLUSTALW, TCOFFEE, MAFFT and PROBCONS. We performed leave-one-family-out cross-validation over super-families. Performance was evaluated through ROC curves and paired two tailed t-test.</p> <p>Conclusion</p> <p>We observed that pHMMs derived from structural alignments performed significantly better than pHMMs derived from sequence alignment in low-identity regions, mainly below 20%. We believe this is because structural alignment tools are better at focusing on the important patterns that are more often conserved through evolution, resulting in higher quality pHMMs. On the other hand, sensitivity of these tools is still quite low for these low-identity regions. Our results suggest a number of possible directions for improvements in this area.</p
Protein structure search and local structure characterization
<p>Abstract</p> <p>Background</p> <p>Structural similarities among proteins can provide valuable insight into their functional mechanisms and relationships. As the number of available three-dimensional (3D) protein structures increases, a greater variety of studies can be conducted with increasing efficiency, among which is the design of protein structural alphabets. Structural alphabets allow us to characterize local structures of proteins and describe the global folding structure of a protein using a one-dimensional (1D) sequence. Thus, 1D sequences can be used to identify structural similarities among proteins using standard sequence alignment tools such as BLAST or FASTA.</p> <p>Results</p> <p>We used self-organizing maps in combination with a minimum spanning tree algorithm to determine the optimum size of a structural alphabet and applied the k-means algorithm to group protein fragnts into clusters. The centroids of these clusters defined the structural alphabet. We also developed a flexible matrix training system to build a substitution matrix (TRISUM-169) for our alphabet. Based on FASTA and using TRISUM-169 as the substitution matrix, we developed the SA-FAST alignment tool. We compared the performance of SA-FAST with that of various search tools in database-scale search tasks and found that SA-FAST was highly competitive in all tests conducted. Further, we evaluated the performance of our structural alphabet in recognizing specific structural domains of EGF and EGF-like proteins. Our method successfully recovered more EGF sub-domains using our structural alphabet than when using other structural alphabets. SA-FAST can be found at <url>http://140.113.166.178/safast/</url>.</p> <p>Conclusion</p> <p>The goal of this project was two-fold. First, we wanted to introduce a modular design pipeline to those who have been working with structural alphabets. Secondly, we wanted to open the door to researchers who have done substantial work in biological sequences but have yet to enter the field of protein structure research. Our experiments showed that by transforming the structural representations from 3D to 1D, several 1D-based tools can be applied to structural analysis, including similarity searches and structural motif finding.</p
Designing Focused Chemical Libraries Enriched in Protein-Protein Interaction Inhibitors using Machine-Learning Methods
Protein-protein interactions (PPIs) may represent one of the next major classes of therapeutic targets. So far, only a minute fraction of the estimated 650,000 PPIs that comprise the human interactome are known with a tiny number of complexes being drugged. Such intricate biological systems cannot be cost-efficiently tackled using conventional high-throughput screening methods. Rather, time has come for designing new strategies that will maximize the chance for hit identification through a rationalization of the PPI inhibitor chemical space and the design of PPI-focused compound libraries (global or target-specific). Here, we train machine-learning-based models, mainly decision trees, using a dataset of known PPI inhibitors and of regular drugs in order to determine a global physico-chemical profile for putative PPI inhibitors. This statistical analysis unravels two important molecular descriptors for PPI inhibitors characterizing specific molecular shapes and the presence of a privileged number of aromatic bonds. The best model has been transposed into a computer program, PPI-HitProfiler, that can output from any drug-like compound collection a focused chemical library enriched in putative PPI inhibitors. Our PPI inhibitor profiler is challenged on the experimental screening results of 11 different PPIs among which the p53/MDM2 interaction screened within our own CDithem platform, that in addition to the validation of our concept led to the identification of 4 novel p53/MDM2 inhibitors. Collectively, our tool shows a robust behavior on the 11 experimental datasets by correctly profiling 70% of the experimentally identified hits while removing 52% of the inactive compounds from the initial compound collections. We strongly believe that this new tool can be used as a global PPI inhibitor profiler prior to screening assays to reduce the size of the compound collections to be experimentally screened while keeping most of the true PPI inhibitors. PPI-HitProfiler is freely available on request from our CDithem platform website, www.CDithem.com
Detection of a Fourth Orbivirus Non-Structural Protein
The genus Orbivirus includes both insect and tick-borne viruses. The orbivirus genome, composed of 10 segments of dsRNA, encodes 7 structural proteins (VP1–VP7) and 3 non-structural proteins (NS1–NS3). An open reading frame (ORF) that spans almost the entire length of genome segment-9 (Seg-9) encodes VP6 (the viral helicase). However, bioinformatic analysis recently identified an overlapping ORF (ORFX) in Seg-9. We show that ORFX encodes a new non-structural protein, identified here as NS4. Western blotting and confocal fluorescence microscopy, using antibodies raised against recombinant NS4 from Bluetongue virus (BTV, which is insect-borne), or Great Island virus (GIV, which is tick-borne), demonstrate that these proteins are synthesised in BTV or GIV infected mammalian cells, respectively. BTV NS4 is also expressed in Culicoides insect cells. NS4 forms aggregates throughout the cytoplasm as well as in the nucleus, consistent with identification of nuclear localisation signals within the NS4 sequence. Bioinformatic analyses indicate that NS4 contains coiled-coils, is related to proteins that bind nucleic acids, or are associated with membranes and shows similarities to nucleolar protein UTP20 (a processome subunit). Recombinant NS4 of GIV protects dsRNA from degradation by endoribonucleases of the RNAse III family, indicating that it interacts with dsRNA. However, BTV NS4, which is only half the putative size of the GIV NS4, did not protect dsRNA from RNAse III cleavage. NS4 of both GIV and BTV protect DNA from degradation by DNAse. NS4 was found to associate with lipid droplets in cells infected with BTV or GIV or transfected with a plasmid expressing NS4