6,377 research outputs found

    New Alignment Methods for Discriminative Book Summarization

    Full text link
    We consider the unsupervised alignment of the full text of a book with a human-written summary. This presents challenges not seen in other text alignment problems, including a disparity in length and, consequent to this, a violation of the expectation that individual words and phrases should align, since large passages and chapters can be distilled into a single summary phrase. We present two new methods, based on hidden Markov models, specifically targeted to this problem, and demonstrate gains on an extractive book summarization task. While there is still much room for improvement, unsupervised alignment holds intrinsic value in offering insight into what features of a book are deemed worthy of summarization.Comment: This paper reflects work in progres

    An Empirical Comparison of Parsing Methods for Stanford Dependencies

    Full text link
    Stanford typed dependencies are a widely desired representation of natural language sentences, but parsing is one of the major computational bottlenecks in text analysis systems. In light of the evolving definition of the Stanford dependencies and developments in statistical dependency parsing algorithms, this paper revisits the question of Cer et al. (2010): what is the tradeoff between accuracy and speed in obtaining Stanford dependencies in particular? We also explore the effects of input representations on this tradeoff: part-of-speech tags, the novel use of an alternative dependency representation as input, and distributional representaions of words. We find that direct dependency parsing is a more viable solution than it was found to be in the past. An accompanying software release can be found at: http://www.ark.cs.cmu.edu/TBSDComment: 13 pages, 2 figure

    Coalescent histories for lodgepole species trees

    Full text link
    Coalescent histories are combinatorial structures that describe for a given gene tree and species tree the possible lists of branches of the species tree on which the gene tree coalescences take place. Properties of the number of coalescent histories for gene trees and species trees affect a variety of probabilistic calculations in mathematical phylogenetics. Exact and asymptotic evaluations of the number of coalescent histories, however, are known only in a limited number of cases. Here we introduce a particular family of species trees, the \emph{lodgepole} species trees (λn)n≥0(\lambda_n)_{n\geq 0}, in which tree λn\lambda_n has m=2n+1m=2n+1 taxa. We determine the number of coalescent histories for the lodgepole species trees, in the case that the gene tree matches the species tree, showing that this number grows with m!!m!! in the number of taxa mm. This computation demonstrates the existence of tree families in which the growth in the number of coalescent histories is faster than exponential. Further, it provides a substantial improvement on the lower bound for the ratio of the largest number of matching coalescent histories to the smallest number of matching coalescent histories for trees with mm taxa, increasing a previous bound of (π/32)[(5m−12)/(4m−6)]mm(\sqrt{\pi} / 32)[(5m-12)/(4m-6)] m \sqrt{m} to [m−1/(4e)]m[ \sqrt{m-1}/(4 \sqrt{e}) ]^{m}. We discuss the implications of our enumerative results for phylogenetic computations

    Detailed analysis of the predictions of loop quantum cosmology for the primordial power spectra

    Get PDF
    We provide an exhaustive numerical exploration of the predictions of loop quantum cosmology (LQC) with a post-bounce phase of inflation for the primordial power spectrum of scalar and tensor perturbations. We extend previous analysis by characterizing the phenomenologically relevant parameter space and by constraining it using observations. Furthermore, we characterize the shape of LQC-corrections to observable quantities across this parameter space. Our analysis provides a framework to contrast more accurately the theory with forthcoming polarization data, and it also paves the road for the computation of other observables beyond the power spectra, such as non-Gaussianity.Comment: 24 pages, 5 figure

    On the number of ranked species trees producing anomalous ranked gene trees

    Full text link
    Analysis of probability distributions conditional on species trees has demonstrated the existence of anomalous ranked gene trees (ARGTs), ranked gene trees that are more probable than the ranked gene tree that accords with the ranked species tree. Here, to improve the characterization of ARGTs, we study enumerative and probabilistic properties of two classes of ranked labeled species trees, focusing on the presence or avoidance of certain subtree patterns associated with the production of ARGTs. We provide exact enumerations and asymptotic estimates for cardinalities of these sets of trees, showing that as the number of species increases without bound, the fraction of all ranked labeled species trees that are ARGT-producing approaches 1. This result extends beyond earlier existence results to provide a probabilistic claim about the frequency of ARGTs
    • …
    corecore