14 research outputs found

    Test data and evaluation criteria.

    No full text
    <p>The table lists the data and evaluation criteria used in each benchmark.</p

    Summary of the comparison. Boldface indicates significantly better performance than the other methods (including equal top performance).

    No full text
    <p>Summary of the comparison. Boldface indicates significantly better performance than the other methods (including equal top performance).</p

    Similarity between methods.

    No full text
    <p>(A) For each pair of methods, the Euclidean distance between the PWMs of the two methods is reported. Before the comparison, the column method's PWM is trimmed to eight most informative contiguous positions. (B–D) ranking based comparisons. For each pair of methods, the probe ranking defined according to the column's method is used as reference, and the ranking of the row's method is evaluated using AUC (B) and sensitivity at 1% false positive (C). In (D), for each pair of methods, the 4σ positive sets of the paired PBM are first ranked by each method, and the Spearman rank coefficient of those rankings is computed. In all tables, the average over 230 PBM experiments is reported. Red colour corresponds to greater similarity.</p

    Similarity to experimentally established PWMs.

    No full text
    <p>For 58 TFs, we compared the motifs produced from their PBM profiles by each method, to the known motif from JASPAR database. Distance was measured using Euclidean distance. Three distance cutoffs were used, and the fraction of recovered motifs with distance below the cutoff is the success rate. BE: BEEML-PBM, RM: RankMotif++, SW: Seed-and-Wobble, AM: Amadeus-PBM, JR: JASPAR.</p

    Examples of generated motifs.

    No full text
    <p>The figure shows examples of the motifs produced by each method and the corresponding JASPAR motif. For three proteins, the PWM logos produced by each method and the experimentally and independently established motif in the JASPAR database are shown. AM was trained on motif length 8, while for BE, RM and SW only the most informative contiguous positions were kept. We chose TFs whose motifs had information content most similar to the averages of the different methods.</p

    Additional file 2: of Finding RNA structure in the unstructured RBPome

    No full text
    Figure S2 A) RNA structural binding preferences do not improve in vitro binding prediction when random structure probabilities are assigned. Correlation results over 488 paired experiments reveals that RNA structure does not improve binding prediction when structure probabilities are assigned randomly. B) RNA structural binding preferences do not improve in vivo binding prediction when random structure probabilities are assigned. AUC results of 96 paired eCLIP and RNAcompete experiments over 21 joint proteins demonstrate that RNA structural binding preferences learned from in vitro data do not correlate well with protein-RNA interactions measured in vivo when structure probabilities are assigned randomly. (PNG 85 kb

    Properties of the tested methods.

    No full text
    <p>Properties of the tested methods.</p

    Additional file 3: of Finding RNA structure in the unstructured RBPome

    No full text
    Figure S3 There is no improvement in binding prediction from amino acid sequence by utilizing RNA structure with random structure probabilities. A) When we add RNA structural features to the sequence k-mer space of AffinityRegression, but assign structure probabilities randomly, we do no predict binding any better than using sequence features alone. B) When we add RNA structural features to the sequence k-mer space of AffinityRegression, but assign structure probabilities randomly, we do not predict the top-bound probes as compared to unbound probes any better than using sequence features alone. (PNG 68 kb

    Designing small universal <i>k</i>-mer hitting sets for improved analysis of high-throughput sequencing

    No full text
    <div><p>With the rapidly increasing volume of deep sequencing data, more efficient algorithms and data structures are needed. Minimizers are a central recent paradigm that has improved various sequence analysis tasks, including hashing for faster read overlap detection, sparse suffix arrays for creating smaller indexes, and Bloom filters for speeding up sequence search. Here, we propose an alternative paradigm that can lead to substantial further improvement in these and other tasks. For integers <i>k</i> and <i>L</i> > <i>k</i>, we say that a set of <i>k</i>-mers is a <i>universal hitting set</i> (UHS) if every possible <i>L</i>-long sequence must contain a <i>k</i>-mer from the set. We develop a heuristic called DOCKS to find a compact UHS, which works in two phases: The first phase is solved optimally, and for the second we propose several efficient heuristics, trading set size for speed and memory. The use of heuristics is motivated by showing the NP-hardness of a closely related problem. We show that DOCKS works well in practice and produces UHSs that are very close to a theoretical lower bound. We present results for various values of <i>k</i> and <i>L</i> and by applying them to real genomes show that UHSs indeed improve over minimizers. In particular, DOCKS uses less than 30% of the 10-mers needed to span the human genome compared to minimizers. The software and computed UHSs are freely available at github.com/Shamir-Lab/DOCKS/ and acgt.cs.tau.ac.il/docks/, respectively.</p></div

    Performance of DOCKS.

    No full text
    <p>For different combinations of <i>k</i> and <i>L</i> we ran DOCKS over the DNA alphabet. (A) Set sizes. The results are shown as a fraction of the total number of <i>k</i>-mers |Σ|<sup><i>k</i></sup>. The broken lines show the decycling set size for each <i>k</i>. (B) Running time in seconds. Note that y-axis is in log scale. (C) Maximum memory usage in megabytes. Note that y-axis is in log scale.</p
    corecore