Search CORE

1,107 research outputs found

Using Conservative Estimation for Conditional Probability instead of Ignoring Infrequent Case

Author: Kikuchi Masato
Okabe Masayuki
Umemura Kyoji
Yamamoto Eiko
Yoshida Mitsuo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/09/2017
Field of study

There are several estimators of conditional probability from observed frequencies of features. In this paper, we propose using the lower limit of confidence interval on posterior distribution determined by the observed frequencies to ascertain conditional probability. In our experiments, this method outperformed other popular estimators.Comment: The 2016 International Conference on Advanced Informatics: Concepts, Theory and Application (ICAICTA2016

arXiv.org e-Print Archive

Crossref

Rhetorical relations for information retrieval

Author: Larsen Birger
Lioma Christina
Lu Wei
Publication venue
Publication date: 05/04/2017
Field of study

Typically, every part in most coherent text has some plausible reason for its presence, some function that it performs to the overall semantics of the text. Rhetorical relations, e.g. contrast, cause, explanation, describe how the parts of a text are linked to each other. Knowledge about this socalled discourse structure has been applied successfully to several natural language processing tasks. This work studies the use of rhetorical relations for Information Retrieval (IR): Is there a correlation between certain rhetorical relations and retrieval performance? Can knowledge about a document's rhetorical relations be useful to IR? We present a language model modification that considers rhetorical relations when estimating the relevance of a document to a query. Empirical evaluation of different versions of our model on TREC settings shows that certain rhetorical relations can benefit retrieval effectiveness notably (> 10% in mean average precision over a state-of-the-art baseline)

arXiv.org e-Print Archive

CiteSeerX

General Type Token Distribution

Author: Hidaka Shohei
Publication venue
Publication date: 01/01/2014
Field of study

We consider the problem of estimating the number of types in a corpus using the number of types observed in a sample of tokens from that corpus. We derive exact and asymptotic distributions for the number of observed types, conditioned upon the number of tokens and the latent type distribution. We use the asymptotic distributions to derive an estimator of the latent number of types and we validate this estimator numerically.Comment: This paper is accepted in Biometrika. 5 pages and no figure in the main paper. 3 pages and 1 figure in the supplementary materia

arXiv.org e-Print Archive

CiteSeerX