844 research outputs found

    A Coverage Criterion for Spaced Seeds and its Applications to Support Vector Machine String Kernels and k-Mer Distances

    Get PDF
    Spaced seeds have been recently shown to not only detect more alignments, but also to give a more accurate measure of phylogenetic distances (Boden et al., 2013, Horwege et al., 2014, Leimeister et al., 2014), and to provide a lower misclassification rate when used with Support Vector Machines (SVMs) (On-odera and Shibuya, 2013), We confirm by independent experiments these two results, and propose in this article to use a coverage criterion (Benson and Mak, 2008, Martin, 2013, Martin and No{\'e}, 2014), to measure the seed efficiency in both cases in order to design better seed patterns. We show first how this coverage criterion can be directly measured by a full automaton-based approach. We then illustrate how this criterion performs when compared with two other criteria frequently used, namely the single-hit and multiple-hit criteria, through correlation coefficients with the correct classification/the true distance. At the end, for alignment-free distances, we propose an extension by adopting the coverage criterion, show how it performs, and indicate how it can be efficiently computed.Comment: http://online.liebertpub.com/doi/abs/10.1089/cmb.2014.017

    A Coverage Criterion for Spaced Seeds and its Applications to Support Vector Machine String Kernels and k-Mer Distances

    Get PDF
    Spaced seeds have been recently shown to not only detect more alignments, but also to give a more accurate measure of phylogenetic distances (Boden et al., 2013, Horwege et al., 2014, Leimeister et al., 2014), and to provide a lower misclassification rate when used with Support Vector Machines (SVMs) (On-odera and Shibuya, 2013), We confirm by independent experiments these two results, and propose in this article to use a coverage criterion (Benson and Mak, 2008, Martin, 2013, Martin and No{\'e}, 2014), to measure the seed efficiency in both cases in order to design better seed patterns. We show first how this coverage criterion can be directly measured by a full automaton-based approach. We then illustrate how this criterion performs when compared with two other criteria frequently used, namely the single-hit and multiple-hit criteria, through correlation coefficients with the correct classification/the true distance. At the end, for alignment-free distances, we propose an extension by adopting the coverage criterion, show how it performs, and indicate how it can be efficiently computed.Comment: http://online.liebertpub.com/doi/abs/10.1089/cmb.2014.017

    Conflict and Computation on Wikipedia: a Finite-State Machine Analysis of Editor Interactions

    Full text link
    What is the boundary between a vigorous argument and a breakdown of relations? What drives a group of individuals across it? Taking Wikipedia as a test case, we use a hidden Markov model to approximate the computational structure and social grammar of more than a decade of cooperation and conflict among its editors. Across a wide range of pages, we discover a bursty war/peace structure where the systems can become trapped, sometimes for months, in a computational subspace associated with significantly higher levels of conflict-tracking "revert" actions. Distinct patterns of behavior characterize the lower-conflict subspace, including tit-for-tat reversion. While a fraction of the transitions between these subspaces are associated with top-down actions taken by administrators, the effects are weak. Surprisingly, we find no statistical signal that transitions are associated with the appearance of particularly anti-social users, and only weak association with significant news events outside the system. These findings are consistent with transitions being driven by decentralized processes with no clear locus of control. Models of belief revision in the presence of a common resource for information-sharing predict the existence of two distinct phases: a disordered high-conflict phase, and a frozen phase with spontaneously-broken symmetry. The bistability we observe empirically may be a consequence of editor turn-over, which drives the system to a critical point between them.Comment: 23 pages, 3 figures. Matches published version. Code for HMM fitting available at http://bit.ly/sfihmm ; time series and derived finite state machines at bit.ly/wiki_hm

    A compact statistical model of the song syntax in Bengalese finch

    Get PDF
    Songs of many songbird species consist of variable sequences of a finite number of syllables. A common approach for characterizing the syntax of these complex syllable sequences is to use transition probabilities between the syllables. This is equivalent to the Markov model, in which each syllable is associated with one state, and the transition probabilities between the states do not depend on the state transition history. Here we analyze the song syntax in a Bengalese finch. We show that the Markov model fails to capture the statistical properties of the syllable sequences. Instead, a state transition model that accurately describes the statistics of the syllable sequences includes adaptation of the self-transition probabilities when states are repeatedly revisited, and allows associations of more than one state to the same syllable. Such a model does not increase the model complexity significantly. Mathematically, the model is a partially observable Markov model with adaptation (POMMA). The success of the POMMA supports the branching chain network hypothesis of how syntax is controlled within the premotor song nucleus HVC, and suggests that adaptation and many-to-one mapping from neural substrates to syllables are important features of the neural control of complex song syntax
    • …
    corecore