5,109 research outputs found
Asymptotic Analysis of Generative Semi-Supervised Learning
Semisupervised learning has emerged as a popular framework for improving
modeling accuracy while controlling labeling cost. Based on an extension of
stochastic composite likelihood we quantify the asymptotic accuracy of
generative semi-supervised learning. In doing so, we complement
distribution-free analysis by providing an alternative framework to measure the
value associated with different labeling policies and resolve the fundamental
question of how much data to label and in what manner. We demonstrate our
approach with both simulation studies and real world experiments using naive
Bayes for text classification and MRFs and CRFs for structured prediction in
NLP.Comment: 12 pages, 9 figure
Species abundance information improves sequence taxonomy classification accuracy.
Popular naive Bayes taxonomic classifiers for amplicon sequences assume that all species in the reference database are equally likely to be observed. We demonstrate that classification accuracy degrades linearly with the degree to which that assumption is violated, and in practice it is always violated. By incorporating environment-specific taxonomic abundance information, we demonstrate a significant increase in the species-level classification accuracy across common sample types. At the species level, overall average error rates decline from 25% to 14%, which is favourably comparable to the error rates that existing classifiers achieve at the genus level (16%). Our findings indicate that for most practical purposes, the assumption that reference species are equally likely to be observed is untenable. q2-clawback provides a straightforward alternative for samples from common environments
Real estate portfolio construction and estimation risk
The use of MPT in the construction real estate portfolios has two serious limitations when used in an ex-ante framework: (1) the intertemporal instability of the portfolio weights and (2) the sharp deterioration in performance of the optimal portfolios outside the sample period used to estimate asset mean returns. Both problems can be traced to wide fluctuations in sample means Jorion (1985). Thus the use of a procedure that ignores the estimation risk due to the uncertain in mean returns is likely to produce sub-optimal results in subsequent periods. This suggests that the consideration of the issue of estimation risk is crucial in the use of MPT in developing a successful real estate portfolio strategy. Therefore, following Eun & Resnick (1988), this study extends previous ex-ante based studies by evaluating optimal portfolio allocations in subsequent test periods by using methods that have been proposed to reduce the effect of measurement error on optimal portfolio allocations
- …