3,130 research outputs found

    Neural CRF Parsing

    Full text link
    This paper describes a parsing model that combines the exact dynamic programming of CRF parsing with the rich nonlinear featurization of neural net approaches. Our model is structurally a CRF that factors over anchored rule productions, but instead of linear potential functions based on sparse features, we use nonlinear potentials computed via a feedforward neural network. Because potentials are still local to anchored rules, structured inference (CKY) is unchanged from the sparse case. Computing gradients during learning involves backpropagating an error signal formed from standard CRF sufficient statistics (expected rule counts). Using only dense features, our neural CRF already exceeds a strong baseline CRF model (Hall et al., 2014). In combination with sparse features, our system achieves 91.1 F1 on section 23 of the Penn Treebank, and more generally outperforms the best prior single parser results on a range of languages.Comment: Accepted for publication at ACL 201

    Coexistence in stochastic spatial models

    Full text link
    In this paper I will review twenty years of work on the question: When is there coexistence in stochastic spatial models? The answer, announced in Durrett and Levin [Theor. Pop. Biol. 46 (1994) 363--394], and that we explain in this paper is that this can be determined by examining the mean-field ODE. There are a number of rigorous results in support of this picture, but we will state nine challenging and important open problems, most of which date from the 1990's.Comment: Published in at http://dx.doi.org/10.1214/08-AAP590 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Waiting for regulatory sequences to appear

    Full text link
    One possible explanation for the substantial organismal differences between humans and chimpanzees is that there have been changes in gene regulation. Given what is known about transcription factor binding sites, this motivates the following probability question: given a 1000 nucleotide region in our genome, how long does it take for a specified six to nine letter word to appear in that region in some individual? Stone and Wray [Mol. Biol. Evol. 18 (2001) 1764--1770] computed 5,950 years as the answer for six letter words. Here, we will show that for words of length 6, the average waiting time is 100,000 years, while for words of length 8, the waiting time has mean 375,000 years when there is a 7 out of 8 letter match in the population consensus sequence (an event of probability roughly 5/16) and has mean 650 million years when there is not. Fortunately, in biological reality, the match to the target word does not have to be perfect for binding to occur. If we model this by saying that a 7 out of 8 letter match is good enough, the mean reduces to about 60,000 years.Comment: Published at http://dx.doi.org/10.1214/105051606000000619 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Asymptotic behavior of Aldous' gossip process

    Full text link
    Aldous [(2007) Preprint] defined a gossip process in which space is a discrete N×NN\times N torus, and the state of the process at time tt is the set of individuals who know the information. Information spreads from a site to its nearest neighbors at rate 1/4 each and at rate NαN^{-\alpha} to a site chosen at random from the torus. We will be interested in the case in which α<3\alpha<3, where the long range transmission significantly accelerates the time at which everyone knows the information. We prove three results that precisely describe the spread of information in a slightly simplified model on the real torus. The time until everyone knows the information is asymptotically T=(22α/3)Nα/3logNT=(2-2\alpha/3)N^{\alpha/3}\log N. If ρs\rho_s is the fraction of the population who know the information at time ss and ε\varepsilon is small then, for large NN, the time until ρs\rho_s reaches ε\varepsilon is T(ε)T+Nα/3log(3ε/M)T(\varepsilon)\approx T+N^{\alpha/3}\log (3\varepsilon /M), where MM is a random variable determined by the early spread of the information. The value of ρs\rho_s at time s=T(1/3)+tNα/3s=T(1/3)+tN^{\alpha/3} is almost a deterministic function h(t)h(t) which satisfies an odd looking integro-differential equation. The last result confirms a heuristic calculation of Aldous.Comment: Published in at http://dx.doi.org/10.1214/10-AAP750 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore