5,109 research outputs found

    Asymptotic Analysis of Generative Semi-Supervised Learning

    Full text link
    Semisupervised learning has emerged as a popular framework for improving modeling accuracy while controlling labeling cost. Based on an extension of stochastic composite likelihood we quantify the asymptotic accuracy of generative semi-supervised learning. In doing so, we complement distribution-free analysis by providing an alternative framework to measure the value associated with different labeling policies and resolve the fundamental question of how much data to label and in what manner. We demonstrate our approach with both simulation studies and real world experiments using naive Bayes for text classification and MRFs and CRFs for structured prediction in NLP.Comment: 12 pages, 9 figure

    Species abundance information improves sequence taxonomy classification accuracy.

    Get PDF
    Popular naive Bayes taxonomic classifiers for amplicon sequences assume that all species in the reference database are equally likely to be observed. We demonstrate that classification accuracy degrades linearly with the degree to which that assumption is violated, and in practice it is always violated. By incorporating environment-specific taxonomic abundance information, we demonstrate a significant increase in the species-level classification accuracy across common sample types. At the species level, overall average error rates decline from 25% to 14%, which is favourably comparable to the error rates that existing classifiers achieve at the genus level (16%). Our findings indicate that for most practical purposes, the assumption that reference species are equally likely to be observed is untenable. q2-clawback provides a straightforward alternative for samples from common environments

    Real estate portfolio construction and estimation risk

    Get PDF
    The use of MPT in the construction real estate portfolios has two serious limitations when used in an ex-ante framework: (1) the intertemporal instability of the portfolio weights and (2) the sharp deterioration in performance of the optimal portfolios outside the sample period used to estimate asset mean returns. Both problems can be traced to wide fluctuations in sample means Jorion (1985). Thus the use of a procedure that ignores the estimation risk due to the uncertain in mean returns is likely to produce sub-optimal results in subsequent periods. This suggests that the consideration of the issue of estimation risk is crucial in the use of MPT in developing a successful real estate portfolio strategy. Therefore, following Eun & Resnick (1988), this study extends previous ex-ante based studies by evaluating optimal portfolio allocations in subsequent test periods by using methods that have been proposed to reduce the effect of measurement error on optimal portfolio allocations
    • …
    corecore