8 research outputs found
MASPIC: Intensity-Based Tandem Mass Spectrometry Scoring Scheme That Improves Peptide Identification at High Confidence
Algorithmic search engines bridge the gap between large
tandem mass spectrometry data sets and the identification
of proteins associated with biological samples. Improvements in these tools can greatly enhance biological
discovery. We present a new scoring scheme for comparing tandem mass spectra with a protein sequence database. The MASPIC (Multinomial Algorithm for Spectral
Profile-based Intensity Comparison) scorer converts an
experimental tandem mass spectrum into a m/z profile
of probability and then scores peak lists from potential
candidate peptides using a multinomial distribution model.
The MASPIC scoring scheme incorporates intensity,
spectral peak density variations, and m/z error distribution associated with peak matches into a multinomial
distribution. The scoring scheme was validated on two
standard protein mixtures and an additional set of spectra
collected on a complex ribosomal protein mixture from
Rhodopseudomonas palustris. The results indicate a
5−15% improvement over Sequest for high-confidence
identifications. The performance gap grows as sequence
database size increases. Additional tests on spectra from
proteinase-K digest data showed similar performance
improvements demonstrating the advantages in using
MASPIC for studying proteins digested with less specific
proteases. All these investigations show MASPIC to be a
versatile and reliable system for peptide tandem mass
spectral identification
MASPIC: Intensity-Based Tandem Mass Spectrometry Scoring Scheme That Improves Peptide Identification at High Confidence
Algorithmic search engines bridge the gap between large
tandem mass spectrometry data sets and the identification
of proteins associated with biological samples. Improvements in these tools can greatly enhance biological
discovery. We present a new scoring scheme for comparing tandem mass spectra with a protein sequence database. The MASPIC (Multinomial Algorithm for Spectral
Profile-based Intensity Comparison) scorer converts an
experimental tandem mass spectrum into a m/z profile
of probability and then scores peak lists from potential
candidate peptides using a multinomial distribution model.
The MASPIC scoring scheme incorporates intensity,
spectral peak density variations, and m/z error distribution associated with peak matches into a multinomial
distribution. The scoring scheme was validated on two
standard protein mixtures and an additional set of spectra
collected on a complex ribosomal protein mixture from
Rhodopseudomonas palustris. The results indicate a
5−15% improvement over Sequest for high-confidence
identifications. The performance gap grows as sequence
database size increases. Additional tests on spectra from
proteinase-K digest data showed similar performance
improvements demonstrating the advantages in using
MASPIC for studying proteins digested with less specific
proteases. All these investigations show MASPIC to be a
versatile and reliable system for peptide tandem mass
spectral identification
Triplex target sites are enriched in mammalian and non-mammalian genomes.
(A-P) Genome-wide analyses of potential microRNA binding sites in genomic DNA were performed across fifteen species. The heuristic score (”Score”, x-axis) represents Hoogsteen or Reverse Hoogsteen base pair complementarity and Thermodynamic Energy (”Energy”, y-axis) represents the binding energy of the triplex (see Methods). Binding sites were categorized based on the number of hits with better score and energy. Grade 1 hits represent the 99.999th percentile of triplex forming interactions, which are sequences most likely to participate in DNA-microRNA triplex formation. Subsequent grades are 10 fold lower in their percentile ranking (e.g. 99.99, 99.9, 99th percentiles). Additionally, randomly generated DNA sequences were analyzed against human microRNAs. The random DNA sequences (M, green surface) showed many orders of magnitude fewer binding sites than the human genome (M, blue surface) and the identified binding sites were of low quality (low score, high energy).</p
Detection of DNA-DNA and RNA-DNA triplexes by EMSA and NMR, and molecular modeling of miRNA-duplex DNA triplex.
<p><b>(A)</b> EMSA; 5’ ROX-labeled hairpin duplex DNA (0.1 μM) was incubated for 3-hrs at 22°C in the presence (lanes 2–11) or absence (lane 1) of 2.5 μM 483-opti DNA oligo, and increasing concentration (30, 60, 150 μM) of Hoogsteen bond-optimized hsa-miR-483-5p (483-opti, lanes 3–5), hsa-miR-483-5 (483, lanes 6–8), or a scrambled RNA oligo (Scramble, lanes 9–11). Duplexes and triplexes were resolved on a 20% non-denaturing acrylamide gel, and the ROX-signal visualized. Triplex of 483-opti DNA oligo and duplex DNA is readily detected (lane 2). The 483-opti RNA oligo competes with 483-opti DNA oligo for binding to duplex DNA which is evident by increased amounts of duplex DNA and decreased amounts of triplex (compare lanes 3–5 with lane 2). Hsa-miR-483-5p (483) and scrambled RNA, because of the fewer number of favorable Hoogsteen bonds, did not compete with the 483-opti DNA oligo for binding to duplex DNA (lanes 6–7 and 9–10, respectively). <b>(B-C)</b> NMR; Two-Dimensional (2D) [<sup>1</sup>H, <sup>1</sup>H] TOCSY spectra of free single stranded hairpin duplex DNA (blue contours), hairpin duplex DNA combined with hsa-miR-483-5p RNA oligo (green contours; 1:1.5 ratio), and hairpin duplex DNA with single stranded DNA oligo with the same sequence as hsa-miR-483-5p (red contours; 1:1ratio). <b>(B)</b> Thymidine cross-peaks between H6 and H7 (methyl), and <b>(C)</b> cytosine cross-peaks between H5 and H6. Single stranded RNA (hsa-miR-483-5p) or single stranded DNA with hairpin duplex DNA show similar improvement in peak the intensities, and similar chemical shift perturbations/appearance of new peaks highlighted in blue boxes, suggesting that single stranded DNA and single stranded RNA of the same sequence bind to DNA duplex in a similar manner; the major differences (peaks in red boxes) are one peak among thymidine cross-peaks, showing an intermediate change (peak disappearing) with singe stranded RNA while saturated with hairpin duplex DNA, and two new peaks among cytosine cross-peaks showing much higher intensities with single stranded DNA, indicating that the latter DNA binds to duplex DNA duplex with higher binding affinity than RNA, consistent with the results obtained by EMSA. <b>(D)</b> Molecular model of hsa-miR-483-5p-DNA triplex. (I): the model of predicted miRNA and corresponding DNA duplex sequences (16 favorable Hoogsteen pairings). All predicted Hoogsteen base pairs are well maintained after removal of positional and distance restraints(II): negative control (antisense hsa-miR-483-5p) of model with 9 favorable Hoogsteen pairings. Both RNA and DNA duplex are largely twisted and nearly all predicted Hoogsteen pairings cannot be stably maintained. Residues in favor of Hoogsteen hydrogen bond formation are shown in red while the others are shown in blue.</p
MicroRNAs Form Triplexes with Double Stranded DNA at Sequence-Specific Binding Sites; a Eukaryotic Mechanism via which microRNAs Could Directly Alter Gene Expression
<div><p>MicroRNAs are important regulators of gene expression, acting primarily by binding to sequence-specific locations on already transcribed messenger RNAs (mRNA) and typically down-regulating their stability or translation. Recent studies indicate that microRNAs may also play a role in up-regulating mRNA transcription levels, although a definitive mechanism has not been established. Double-helical DNA is capable of forming triple-helical structures through Hoogsteen and reverse Hoogsteen interactions in the major groove of the duplex, and we show physical evidence (i.e., NMR, FRET, SPR) that purine or pyrimidine-rich microRNAs of appropriate length and sequence form triple-helical structures with purine-rich sequences of duplex DNA, and identify microRNA sequences that favor triplex formation. We developed an algorithm (Trident) to search genome-wide for potential triplex-forming sites and show that several mammalian and non-mammalian genomes are enriched for strong microRNA triplex binding sites. We show that those genes containing sequences favoring microRNA triplex formation are markedly enriched (3.3 fold, p<2.2 × 10<sup>−16</sup>) for genes whose expression is positively correlated with expression of microRNAs targeting triplex binding sequences. This work has thus revealed a new mechanism by which microRNAs could interact with gene promoter regions to modify gene transcription.</p></div
Higher expression of microRNAs forming triplex structures with duplex DNA is more frequently associated with increased gene expression.
<p>MicroRNA and mRNA expression were measured in leukemia cells (ALL) obtained at the time of diagnosis from two cohorts of patients (St. Jude Protocols Total 15 and Total 16). Genome-wide linear correlations between microRNA expression and mRNA expression calculated to form grade 1 triplex structures were assessed in each cohort separately and then a meta-analysis was performed. <b>(A)</b> The distributions of Spearman p-values for associations with positive or <b>(B)</b> negative correlations are shown. Over-representation of small p-values for positive associations was significantly enriched as compared to negative associations.</p
MicroRNAs form triplex structures with DNA.
<p><b>(A)</b> Duplex DNA identified by genome-wide screens of binding sites was incubated in presence or absence of a synthesized hsa-miR-483-5p with a 3’ ROX label to perform a FRET assay to detect triplex formation (illustrated in <b>3B</b>). In the absence of ROX labeled hsa-miR-483-5p (<b>3A</b>, black line) a single emission peak at 520nm is observed which, with the addition of ROX labeled hsa-miR-483-5p (<b>3A</b>, red line), is diminished and a second FRET induced emission peak at 610nm is observed. <b>(C)</b> In a complementary surface plasmon resonance (SPR) based assay (illustrated in <b>3D</b>), a 3’ biotin labeled hsa-miR-483-5p was immobilized and duplex DNA was introduced in triplicate in a 2-fold dilution series starting at 20 nM.</p
Characteristics of triplex forming microRNA.
<p>The top 1 percent of Homo sapiens triplex interactions (grades 1–4) were characterized by <b>(A-C)</b> microRNA dinucleotide frequency, <b>(D)</b> microRNA length, and <b>(E-H)</b> single nucleotide frequency, and compared to these same characteristics for all human microRNAs. The percentage of purine content was the largest discriminating factor in predicting triplex formation, with the majority of binding sites having greater than 75% purine or pyrimidine content (A). Higher GC content (B), length between 21 and 25 nucleotides (D), greater than or less than average G or C content (F and G), and lower than average U content (H) also predicted triplex formation.</p
