16 research outputs found

    F-measures for simulation results.

    No full text
    <p>The median value (black) and quantile ranges (in 5% steps) of the micro- (top) and macro-averaged (bottom) F-measures (<i>F</i><sub>mi</sub>, <i>F</i><sub>ma</sub>) for uncompressed (left) and compressed (right) FBG inference, on the same 129,600 simulated data sets, using automatic priors. The x-axis represents the number of iterations alone, and does not reflect the additional speedup obtained through compression. Notice that the compressed HMM converges no later than 50 iterations (inset figures, right).</p

    F-measures of CBS (light) and HaMMLET (dark) for calling aberrant copy numbers on simulated aCGH data [66].

    No full text
    <p>Boxes represent the interquartile range (IQR = Q3−Q1), with a horizontal line showing the median (Q2), whiskers representing the range ( beyond Q1 and Q3), and the bullet representing the mean. HaMMLET has the same or better F-measures in most cases, and on the SRS simulation converges to 1 for larger segments, whereas CBS plateaus for aberrations greater than 10.</p

    Overview of HaMMLET.

    No full text
    <p>Instead of individual computations per observation (panel a), Forward-Backward Gibbs Sampling is performed on a compressed version of the data, using sufficient statistics for block-wise computations (panel b) to accelerate inference in Bayesian Hidden Markov Models. During the sampling (panel c) parameters and copy number sequences are sampled iteratively. During each iteration, the sampled emission variances determine which coefficients of the data’s Haar wavelet transform are dynamically set to zero. This controls potential break points at finer or coarser resolution or, equivalently, defines blocks of variable number and size (panel c, bottom). Our approach thus yields a dynamic, adaptive compression scheme which greatly improves speed of convergence, accuracy and running times.</p

    Example of dynamic block creation.

    No full text
    <p>The data is of size T = 256, so the wavelet tree contains 512 nodes. Here, only 37 entries had to be checked against the threshold (dark line), 19 of which (round markers) yielded a block (vertical lines on the bottom). Sampling is hence done on a short array of 19 blocks instead of 256 individual values, thus the compression ratio is 13.5. The horizontal lines in the bottom subplot are the block means derived from the sufficient statistics in the nodes. Notice how the algorithm creates small blocks around the breakpoints, e. g. at t ≈ 125, which requires traversing to lower levels and thus induces some additional blocks in other parts of the tree (left subtree), since all block sizes are powers of 2. This somewhat reduces the compression ratio, which is unproblematic as it increases the degrees of freedom in the sampler.</p

    HaMMLET’s inference of copy-number segments on T47D breast ductal carcinoma.

    No full text
    <p>Notice that the data is much more complex than the simple structure of a diploid majority class with some small aberrations typically observed for Coriell data.</p

    Mapping of wavelets <i>ψ</i><sub><i>j</i>, <i>k</i></sub> and data points <i>y</i><sub><i>t</i></sub> to tree nodes <i>N</i><sub><i>ℓ</i>, <i>t</i></sub>.

    No full text
    <p>Each node is the root of a subtree with <i>n</i> = 2<sup><i>ℓ</i></sup> leaves; pruning that subtree yields a block of size <i>n</i>, starting at position <i>t</i>. For instance, the node <i>N</i><sub>1,6</sub> is located at position 13 of the DFS array (solid line), and corresponds to the wavelet <i>ψ</i><sub>3,3</sub>. A block of size <i>n</i> = 2 can be created by pruning the subtree, which amounts to advancing by 2<i>n</i> − 1 = 3 positions (dashed line), yielding <i>N</i><sub>3,8</sub> at position 16, which is the wavelet <i>ψ</i><sub>1,1</sub>. Thus the number of steps for creating blocks per iteration is at most the number of nodes in the tree, and thus strictly smaller than 2<i>T</i>.</p

    Identifying protein complexes directly from high-throughput TAP data with Markov random fields-0

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Identifying protein complexes directly from high-throughput TAP data with Markov random fields"</p><p>http://www.biomedcentral.com/1471-2105/8/482</p><p>BMC Bioinformatics 2007;8():482-482.</p><p>Published online 19 Dec 2007</p><p>PMCID:PMC2222659.</p><p></p>ive rate is set to 0.005 and the false negative rates is 0.2 or 0.5. With = 0.2 (2(a), 2(b)), MRF can recover the true clustering with the minimum negative log-likelihood which is taken on for 11 clusters. Notice that any more clusters do not reduce the cost any further; additional clusters simply remain empty. For = 0.5, the accuracy is worse and needs more empty clusters to reach convergence. In 2(c) and 2(d) the convergence rate fluctuates more

    Identifying protein complexes directly from high-throughput TAP data with Markov random fields-3

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Identifying protein complexes directly from high-throughput TAP data with Markov random fields"</p><p>http://www.biomedcentral.com/1471-2105/8/482</p><p>BMC Bioinformatics 2007;8():482-482.</p><p>Published online 19 Dec 2007</p><p>PMCID:PMC2222659.</p><p></p>ive rate is set to 0.005 and the false negative rates is 0.2 or 0.5. With = 0.2 (2(a), 2(b)), MRF can recover the true clustering with the minimum negative log-likelihood which is taken on for 11 clusters. Notice that any more clusters do not reduce the cost any further; additional clusters simply remain empty. For = 0.5, the accuracy is worse and needs more empty clusters to reach convergence. In 2(c) and 2(d) the convergence rate fluctuates more

    Semi-supervised learning for the identification of syn-expressed genes from fused microarray and image data-6

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Semi-supervised learning for the identification of syn-expressed genes from fused microarray and image data"</p><p>http://www.biomedcentral.com/1471-2105/8/S10/S3</p><p>BMC Bioinformatics 2007;8(Suppl 10):S3-S3.</p><p>Published online 21 Dec 2007</p><p>PMCID:PMC2230504.</p><p></p>ges investigated. The major phenomena are depletion of maternal mRNA (maternal genes) and start of the embryonic transcriptional machinery during embryogenesis at time point 3 hours (zigotically expressed genes). In the clusters with zigotically expressed genes, we observe two main periods of activation: 3–4 hours for cluster U1 to U5, and 7–8 h for clusters U8 to U11. In the clusters with maternal genes, we observe under-expression of genes at several time periods: 3–4 h in clusters U21 to U28; 4–5 h for clusters U17 to U20; 6–7 h for cluster U16; 7–8 h for clusters U12 and U13; and 9–10 h for cluster U15

    Semi-supervised learning for the identification of syn-expressed genes from fused microarray and image data-3

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Semi-supervised learning for the identification of syn-expressed genes from fused microarray and image data"</p><p>http://www.biomedcentral.com/1471-2105/8/S10/S3</p><p>BMC Bioinformatics 2007;8(Suppl 10):S3-S3.</p><p>Published online 21 Dec 2007</p><p>PMCID:PMC2230504.</p><p></p>n cMoG than in MoG (-axis). The threshold discards ImaGO terms, where the difference in the log of the -value of cMoG and MoG in smaller then . As can be observed, the proportion is higher then 0.5 for all values, which indicates an advantage of cMoG. Furthermore, the proportion has an increasing tendency for higher values
    corecore