1,107 research outputs found
Using Conservative Estimation for Conditional Probability instead of Ignoring Infrequent Case
There are several estimators of conditional probability from observed
frequencies of features. In this paper, we propose using the lower limit of
confidence interval on posterior distribution determined by the observed
frequencies to ascertain conditional probability. In our experiments, this
method outperformed other popular estimators.Comment: The 2016 International Conference on Advanced Informatics: Concepts,
Theory and Application (ICAICTA2016
Rhetorical relations for information retrieval
Typically, every part in most coherent text has some plausible reason for its
presence, some function that it performs to the overall semantics of the text.
Rhetorical relations, e.g. contrast, cause, explanation, describe how the parts
of a text are linked to each other. Knowledge about this socalled discourse
structure has been applied successfully to several natural language processing
tasks. This work studies the use of rhetorical relations for Information
Retrieval (IR): Is there a correlation between certain rhetorical relations and
retrieval performance? Can knowledge about a document's rhetorical relations be
useful to IR? We present a language model modification that considers
rhetorical relations when estimating the relevance of a document to a query.
Empirical evaluation of different versions of our model on TREC settings shows
that certain rhetorical relations can benefit retrieval effectiveness notably
(> 10% in mean average precision over a state-of-the-art baseline)
General Type Token Distribution
We consider the problem of estimating the number of types in a corpus using
the number of types observed in a sample of tokens from that corpus. We derive
exact and asymptotic distributions for the number of observed types,
conditioned upon the number of tokens and the latent type distribution. We use
the asymptotic distributions to derive an estimator of the latent number of
types and we validate this estimator numerically.Comment: This paper is accepted in Biometrika. 5 pages and no figure in the
main paper. 3 pages and 1 figure in the supplementary materia
- …