101 research outputs found

    Hot Hands, Streaks and Coin-flips: Numerical Nonsense in the New York Times

    Full text link
    The existence of "Hot Hands" and "Streaks" in sports and gambling is hotly debated, but there is no uncertainty about the recent batting-average of the New York Times: it is now two-for-two in mangling and misunderstanding elementary concepts in probability and statistics; and mixing up the key points in a recent paper that re-examines earlier work on the statistics of streaks. In so doing, it's high-visibility articles have added to the general-public's confusion about probability, making it seem mysterious and paradoxical when it needn't be. However, those articles make excellent case studies on how to get it wrong, and for discussions in high-school and college classes focusing on quantitative reasoning, data analysis, probability and statistics. What I have written here is intended for that audience

    G\"odel for Goldilocks: A Rigorous, Streamlined Proof of (a variant of) G\"odel's First Incompleteness Theorem

    Full text link
    Most discussions of G\"odel's theorems fall into one of two types: either they emphasize perceived philosophical, cultural "meanings" of the theorems, and perhaps sketch some of the ideas of the proofs, usually relating G\"odel's proofs to riddles and paradoxes, but do not attempt to present rigorous, complete proofs; or they do present rigorous proofs, but in the traditional style of mathematical logic, with all of its heavy notation and difficult definitions, and technical issues which reflect G\"odel's original approach and broader logical issues. Many non-specialists are frustrated by these two extreme types of expositions and want a complete, rigorous proof that they can understand. Such an exposition is possible, because many people have realized that variants of G\"odel's first incompleteness theorem can be rigorously proved by a simpler middle approach, avoiding philosophical discussions and hand-waiving at one extreme; and also avoiding the heavy machinery of traditional mathematical logic, and many of the harder detail's of G\"odel's original proof, at the other extreme. This is the just-right Goldilocks approach. In this exposition we give a short, self-contained Goldilocks exposition of G\"odel's first theorem, aimed at a broad, undergraduate audience.Comment: Version 2 corrects typos and one definition in the first version, and expands or contracts parts of the exposition, but the main content remains the same. Version 3 removes an unnecessary comment in Version

    19th International Workshop on Algorithms in Bioinformatics (WABI 2019)

    Get PDF
    Front Matter, Table of Contents, Preface, Conference Organizatio

    Linear time algorithms for finding and representing all the tandem repeats in a string

    Get PDF
    Gusfield D, Stoye J. Linear time algorithms for finding and representing all the tandem repeats in a string. Journal of computer and system sciences. 2004;69(4):525-546.A tandem repeat (or square) is a string [alpha][alpha], where [alpha] is a non-empty string. We present an O(|S|)-time algorithm that operates on the suffix tree T(S) for a string S, finding and marking the endpoint in T(S) of every tandem repeat that occurs in S. This decorated suffix tree implicitly represents all occurrences of tandem repeats in S, and can be used to efficiently solve many questions concerning tandem repeats and tandem arrays in S. This improves and generalizes several prior efforts to efficiently capture large subsets of tandem repeats

    An efficiently computed lower bound on the number of recombinations in phylogenetic networks: Theory and empirical study

    Get PDF
    AbstractPhylogenetic networks are models of sequence evolution that go beyond trees, allowing biological operations that are not tree-like. One of the most important biological operations is recombination between two sequences. An established problem [J. Hein, Reconstructing evolution of sequences subject to recombination using parsimony, Math. Biosci. 98 (1990) 185–200; J. Hein, A heuristic method to reconstruct the history of sequences subject to recombination, J. Molecular Evoluation 36 (1993) 396–405; Y. Song, J. Hein, Parsimonious reconstruction of sequence evolution and haplotype blocks: finding the minimum number of recombination events, in: Proceedings of 2003 Workshop on Algorithms in Bioinformatics, Berlin, Germany, 2003, Lecture Notes in Computer Science, Springer, Berlin; Y. Song, J. Hein, On the minimum number of recombination events in the evolutionary history of DNA sequences, J. Math. Biol. 48 (2003) 160–186; L. Wang, K. Zhang, L. Zhang, Perfect phylogenetic networks with recombination, J. Comput. Biol. 8 (2001) 69–78; S.R. Myers, R.C. Griffiths, Bounds on the minimum number of recombination events in a sample history, Genetics 163 (2003) 375–394; V. Bafna, V. Bansal, Improved recombination lower bounds for haplotype data, in: Proceedings of RECOMB, 2005; Y. Song, Y. Wu, D. Gusfield, Efficient computation of close lower and upper bounds on the minimum number of needed recombinations in the evolution of biological sequences, Bioinformatics 21 (2005) i413–i422. Bioinformatics (Suppl. 1), Proceedings of ISMB, 2005, D. Gusfield, S. Eddhu, C. Langley, Optimal, efficient reconstruction of phylogenetic networks with constrained recombination, J. Bioinform. Comput. Biol. 2(1) (2004) 173–213; D. Gusfield, Optimal, efficient reconstruction of root-unknown phylogenetic networks with constrained and structured recombination, J. Comput. Systems Sci. 70 (2005) 381–398] is to find a phylogenetic network that derives an input set of sequences, minimizing the number of recombinations used. No efficient, general algorithm is known for this problem. Several papers consider the problem of computing a lower bound on the number of recombinations needed. In this paper we establish a new, efficiently computed lower bound. This result is useful in methods to estimate the number of needed recombinations, and also to prove the optimality of algorithms for constructing phylogenetic networks under certain conditions [D. Gusfield, S. Eddhu, C. Langley, Optimal, efficient reconstruction of phylogenetic networks with constrained recombination, J. Bioinform. Comput. Biol. 2(1) (2004) 173–213; D. Gusfield, Optimal, efficient reconstruction of root-unknown phylogenetic networks with constrained and structured recombination, J. Comput. Systems Sci. 70 (2005) 381–398; D. Gusfield, Optimal, efficient reconstruction of root-unknown phylogenetic networks with constrained recombination, Technical Report, Department of Computer Science, University of California, Davis, CA, 2004]. The lower bound is based on a structural, combinatorial insight, using only the site conflicts and incompatibilities, and hence it is fundamental and applicable to many biological phenomena other than recombination, for example, when gene conversions or recurrent or back mutations or cross-species hybridizations cause the phylogenetic history to deviate from a tree structure. In addition to establishing the bound, we examine its use in more complex lower bound methods, and compare the bounds obtained to those obtained by other established lower bound methods

    A simple, practical and complete O-time Algorithm for RNA folding using the Four-Russians Speedup

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The problem of computationally predicting the secondary structure (or folding) of RNA molecules was first introduced more than thirty years ago and yet continues to be an area of active research and development. The basic <it>RNA-folding problem </it>of finding a maximum cardinality, non-crossing, matching of complimentary nucleotides in an RNA sequence of length <it>n</it>, has an <it>O</it>(<it>n</it><sup>3</sup>)-time dynamic programming solution that is widely applied. It is known that an <it>o</it>(<it>n</it><sup>3</sup>) worst-case time solution is possible, but the published and suggested methods are complex and have not been established to be practical. Significant practical improvements to the original dynamic programming method have been introduced, but they retain the <it>O</it>(<it>n</it><sup>3</sup>) worst-case time bound when <it>n </it>is the only problem-parameter used in the bound. Surprisingly, the most widely-used, general technique to achieve a worst-case (and often practical) speed up of dynamic programming, the <it>Four-Russians </it>technique, has not been previously applied to the RNA-folding problem. This is perhaps due to technical issues in adapting the technique to RNA-folding.</p> <p>Results</p> <p>In this paper, we give a simple, complete, and practical Four-Russians algorithm for the basic RNA-folding problem, achieving a worst-case time-bound of <it>O</it>(<it>n</it><sup>3</sup>/log(<it>n</it>)).</p> <p>Conclusions</p> <p>We show that this time-bound can also be obtained for richer nucleotide matching scoring-schemes, and that the method achieves consistent speed-ups in practice. The contribution is both theoretical and practical, since the basic RNA-folding problem is often solved multiple times in the inner-loop of more complex algorithms, and for long RNA molecules in the study of RNA virus genomes.</p

    Escândalos, marolas e finanças: para uma sociologia da transformação do ambiente econômico

    Full text link

    Simple and flexible detection of contiguous repeats using a suffix tree

    Get PDF
    Stoye J, Gusfield D. Simple and flexible detection of contiguous repeats using a suffix tree. Theoretical Computer Science. 2002;270(1-2):843-856.We study the problem of detecting all occurrences of (primitive) tandem repeats and tandem arrays in a string. We first give a simple time- and space-optimal algorithm to find all tandem repeats, and then modify it to become a time and space-optimal algorithm for finding only the primitive tandem repeats. Both of these algorithms are then extended to handle tandem arrays. The contribution of this paper is both pedagogical and practical, giving simple algorithms and implementations based on a suffix tree, using only standard tree traversal techniques
    corecore