Search CORE

6,377 research outputs found

New Alignment Methods for Discriminative Book Summarization

Author: Bamman David
Smith Noah A.
Publication venue
Publication date: 06/05/2013
Field of study

We consider the unsupervised alignment of the full text of a book with a human-written summary. This presents challenges not seen in other text alignment problems, including a disparity in length and, consequent to this, a violation of the expectation that individual words and phrases should align, since large passages and chapters can be distilled into a single summary phrase. We present two new methods, based on hidden Markov models, specifically targeted to this problem, and demonstrate gains on an extractive book summarization task. While there is still much room for improvement, unsupervised alignment holds intrinsic value in offering insight into what features of a book are deemed worthy of summarization.Comment: This paper reflects work in progres

arXiv.org e-Print Archive

CiteSeerX

An Empirical Comparison of Parsing Methods for Stanford Dependencies

Author: Kong Lingpeng
Smith Noah A.
Publication venue
Publication date: 16/04/2014
Field of study

Stanford typed dependencies are a widely desired representation of natural language sentences, but parsing is one of the major computational bottlenecks in text analysis systems. In light of the evolving definition of the Stanford dependencies and developments in statistical dependency parsing algorithms, this paper revisits the question of Cer et al. (2010): what is the tradeoff between accuracy and speed in obtaining Stanford dependencies in particular? We also explore the effects of input representations on this tradeoff: part-of-speech tags, the novel use of an alternative dependency representation as input, and distributional representaions of words. We find that direct dependency parsing is a more viable solution than it was found to be in the past. An accompanying software release can be found at: http://www.ark.cs.cmu.edu/TBSDComment: 13 pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

Coalescent histories for lodgepole species trees

Author: Disanto Filippo
Rosenberg Noah A.
Publication venue
Publication date: 01/01/2015
Field of study

Coalescent histories are combinatorial structures that describe for a given gene tree and species tree the possible lists of branches of the species tree on which the gene tree coalescences take place. Properties of the number of coalescent histories for gene trees and species trees affect a variety of probabilistic calculations in mathematical phylogenetics. Exact and asymptotic evaluations of the number of coalescent histories, however, are known only in a limited number of cases. Here we introduce a particular family of species trees, the \emph{lodgepole} species trees

(\lambda_n)_{n\geq 0}

, in which tree

\lambda_n

has

m=2n+1

taxa. We determine the number of coalescent histories for the lodgepole species trees, in the case that the gene tree matches the species tree, showing that this number grows with

m!!

in the number of taxa

m

. This computation demonstrates the existence of tree families in which the growth in the number of coalescent histories is faster than exponential. Further, it provides a substantial improvement on the lower bound for the ratio of the largest number of matching coalescent histories to the smallest number of matching coalescent histories for trees with

m

taxa, increasing a previous bound of

(\sqrt{\pi} / 32)[(5m-12)/(4m-6)] m \sqrt{m}

[ \sqrt{m-1}/(4 \sqrt{e}) ]^{m}

. We discuss the implications of our enumerative results for phylogenetic computations

arXiv.org e-Print Archive

Crossref

Archivio della Ricerca - Università di Pisa

Detailed analysis of the predictions of loop quantum cosmology for the primordial power spectra

Author: Agullo Ivan
Morris Noah A.
Publication venue: 'American Physical Society (APS)'
Publication date: 18/09/2015
Field of study

We provide an exhaustive numerical exploration of the predictions of loop quantum cosmology (LQC) with a post-bounce phase of inflation for the primordial power spectrum of scalar and tensor perturbations. We extend previous analysis by characterizing the phenomenologically relevant parameter space and by constraining it using observations. Furthermore, we characterize the shape of LQC-corrections to observable quantities across this parameter space. Our analysis provides a framework to contrast more accurately the theory with forthcoming polarization data, and it also paves the road for the computation of other observables beyond the power spectra, such as non-Gaussianity.Comment: 24 pages, 5 figure

arXiv.org e-Print Archive

Crossref

Louisiana State University

On the number of ranked species trees producing anomalous ranked gene trees

Author: Disanto Filippo
Rosenberg Noah A.
Publication venue
Publication date: 01/01/2014
Field of study

Analysis of probability distributions conditional on species trees has demonstrated the existence of anomalous ranked gene trees (ARGTs), ranked gene trees that are more probable than the ranked gene tree that accords with the ranked species tree. Here, to improve the characterization of ARGTs, we study enumerative and probabilistic properties of two classes of ranked labeled species trees, focusing on the presence or avoidance of certain subtree patterns associated with the production of ARGTs. We provide exact enumerations and asymptotic estimates for cardinalities of these sets of trees, showing that as the number of species increases without bound, the fraction of all ranked labeled species trees that are ARGT-producing approaches 1. This result extends beyond earlier existence results to provide a probabilistic claim about the frequency of ARGTs

arXiv.org e-Print Archive

Crossref

Archivio della Ricerca - Università di Pisa