Search CORE

14 research outputs found

Test data and evaluation criteria.

Author: Chaim Linhart (83639)
Ron Shamir (11621)
Yaron Orenstein (130959)
Publication venue
Publication date
Field of study

The table lists the data and evaluation criteria used in each benchmark.</p

FigShare

Summary of the comparison. Boldface indicates significantly better performance than the other methods (including equal top performance).

Author: Chaim Linhart (83639)
Ron Shamir (11621)
Yaron Orenstein (130959)
Publication venue
Publication date
Field of study

Summary of the comparison. Boldface indicates significantly better performance than the other methods (including equal top performance).</p

FigShare

Similarity between methods.

Author: Chaim Linhart (83639)
Ron Shamir (11621)
Yaron Orenstein (130959)
Publication venue
Publication date
Field of study

(A) For each pair of methods, the Euclidean distance between the PWMs of the two methods is reported. Before the comparison, the column method's PWM is trimmed to eight most informative contiguous positions. (B–D) ranking based comparisons. For each pair of methods, the probe ranking defined according to the column's method is used as reference, and the ranking of the row's method is evaluated using AUC (B) and sensitivity at 1% false positive (C). In (D), for each pair of methods, the 4σ positive sets of the paired PBM are first ranked by each method, and the Spearman rank coefficient of those rankings is computed. In all tables, the average over 230 PBM experiments is reported. Red colour corresponds to greater similarity.</p

FigShare

Similarity to experimentally established PWMs.

Author: Chaim Linhart (83639)
Ron Shamir (11621)
Yaron Orenstein (130959)
Publication venue
Publication date
Field of study

For 58 TFs, we compared the motifs produced from their PBM profiles by each method, to the known motif from JASPAR database. Distance was measured using Euclidean distance. Three distance cutoffs were used, and the fraction of recovered motifs with distance below the cutoff is the success rate. BE: BEEML-PBM, RM: RankMotif++, SW: Seed-and-Wobble, AM: Amadeus-PBM, JR: JASPAR.</p

FigShare

Examples of generated motifs.

Author: Chaim Linhart (83639)
Ron Shamir (11621)
Yaron Orenstein (130959)
Publication venue
Publication date
Field of study

The figure shows examples of the motifs produced by each method and the corresponding JASPAR motif. For three proteins, the PWM logos produced by each method and the experimentally and independently established motif in the JASPAR database are shown. AM was trained on motif length 8, while for BE, RM and SW only the most informative contiguous positions were kept. We chose TFs whose motifs had information content most similar to the averages of the different methods.</p

FigShare

Additional file 2: of Finding RNA structure in the unstructured RBPome

Author: Bonnie Berger (19458)
Uwe Ohler (20245)
Yaron Orenstein (130959)
Publication venue
Publication date
Field of study

Figure S2 A) RNA structural binding preferences do not improve in vitro binding prediction when random structure probabilities are assigned. Correlation results over 488 paired experiments reveals that RNA structure does not improve binding prediction when structure probabilities are assigned randomly. B) RNA structural binding preferences do not improve in vivo binding prediction when random structure probabilities are assigned. AUC results of 96 paired eCLIP and RNAcompete experiments over 21 joint proteins demonstrate that RNA structural binding preferences learned from in vitro data do not correlate well with protein-RNA interactions measured in vivo when structure probabilities are assigned randomly. (PNG 85Â kb

FigShare

Properties of the tested methods.

Author: Chaim Linhart (83639)
Ron Shamir (11621)
Yaron Orenstein (130959)
Publication venue
Publication date
Field of study

Properties of the tested methods.</p

FigShare

Additional file 3: of Finding RNA structure in the unstructured RBPome

Author: Bonnie Berger (19458)
Uwe Ohler (20245)
Yaron Orenstein (130959)
Publication venue
Publication date
Field of study

Figure S3 There is no improvement in binding prediction from amino acid sequence by utilizing RNA structure with random structure probabilities. A) When we add RNA structural features to the sequence k-mer space of AffinityRegression, but assign structure probabilities randomly, we do no predict binding any better than using sequence features alone. B) When we add RNA structural features to the sequence k-mer space of AffinityRegression, but assign structure probabilities randomly, we do not predict the top-bound probes as compared to unbound probes any better than using sequence features alone. (PNG 68Â kb

FigShare

Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing

Author: Carl Kingsford (70642)
David Pellow (4484029)
Guillaume Marçais (4484032)
Ron Shamir (11621)
Yaron Orenstein (130959)
Publication venue
Publication date: 01/10/2017
Field of study

<div>With the rapidly increasing volume of deep sequencing data, more efficient algorithms and data structures are needed. Minimizers are a central recent paradigm that has improved various sequence analysis tasks, including hashing for faster read overlap detection, sparse suffix arrays for creating smaller indexes, and Bloom filters for speeding up sequence search. Here, we propose an alternative paradigm that can lead to substantial further improvement in these and other tasks. For integers k and L > k, we say that a set of k-mers is a universal hitting set (UHS) if every possible L-long sequence must contain a k-mer from the set. We develop a heuristic called DOCKS to find a compact UHS, which works in two phases: The first phase is solved optimally, and for the second we propose several efficient heuristics, trading set size for speed and memory. The use of heuristics is motivated by showing the NP-hardness of a closely related problem. We show that DOCKS works well in practice and produces UHSs that are very close to a theoretical lower bound. We present results for various values of k and L and by applying them to real genomes show that UHSs indeed improve over minimizers. In particular, DOCKS uses less than 30% of the 10-mers needed to span the human genome compared to minimizers. The software and computed UHSs are freely available at github.com/Shamir-Lab/DOCKS/ and acgt.cs.tau.ac.il/docks/, respectively.</div

Directory of Open Access Journals

FigShare

Performance of DOCKS.

Author: Carl Kingsford (70642)
David Pellow (4484029)
Guillaume Marçais (4484032)
Ron Shamir (11621)
Yaron Orenstein (130959)
Publication venue
Publication date
Field of study

For different combinations of k and L we ran DOCKS over the DNA alphabet. (A) Set sizes. The results are shown as a fraction of the total number of k-mers |Σ|k. The broken lines show the decycling set size for each k. (B) Running time in seconds. Note that y-axis is in log scale. (C) Maximum memory usage in megabytes. Note that y-axis is in log scale.</p

FigShare