161 research outputs found

    Comment: Boosting Algorithms: Regularization, Prediction and Model Fitting

    Full text link
    The authors are doing the readers of Statistical Science a true service with a well-written and up-to-date overview of boosting that originated with the seminal algorithms of Freund and Schapire. Equally, we are grateful for high-level software that will permit a larger readership to experiment with, or simply apply, boosting-inspired model fitting. The authors show us a world of methodology that illustrates how a fundamental innovation can penetrate every nook and cranny of statistical thinking and practice. They introduce the reader to one particular interpretation of boosting and then give a display of its potential with extensions from classification (where it all started) to least squares, exponential family models, survival analysis, to base-learners other than trees such as smoothing splines, to degrees of freedom and regularization, and to fascinating recent work in model selection. The uninitiated reader will find that the authors did a nice job of presenting a certain coherent and useful interpretation of boosting. The other reader, though, who has watched the business of boosting for a while, may have quibbles with the authors over details of the historic record and, more importantly, over their optimism about the current state of theoretical knowledge. In fact, as much as ``the statistical view'' has proven fruitful, it has also resulted in some ideas about why boosting works that may be misconceived, and in some recommendations that may be misguided. [arXiv:0804.2752]Comment: Published in at http://dx.doi.org/10.1214/07-STS242B the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Evidence Contrary to the Statistical View of Boosting

    Get PDF
    The statistical perspective on boosting algorithms focuses on optimization, drawing parallels with maximum likelihood estimation for logistic regression. In this paper we present empirical evidence that raises questions about this view. Although the statistical perspective provides a theoretical framework within which it is possible to derive theorems and create new algorithms in general contexts, we show that there remain many unanswered important questions. Furthermore, we provide examples that reveal crucial flaws in the many practical suggestions and new methods that are derived from the statistical view. We perform carefully designed experiments using simple simulation models to illustrate some of these flaws and their practical consequences

    Boosted Classification Trees and Class Probability/Quantile Estimation

    Get PDF
    The standard by which binary classifiers are usually judged, misclassification error, assumes equal costs of misclassifying the two classes or, equivalently, classifying at the 1/2 quantile of the conditional class probability function P[y = 1jx]. Boosted classification trees are known to perform quite well for such problems. In this article we consider the use of standard, off-the-shelf boosting for two more general problems: 1) classification with unequal costs or, equivalently, classification at quantiles other than 1/2, and 2) estimation of the conditional class probability function P[y = 1jx]. We first examine whether the latter problem, estimation of P[y = 1jx], can be solved with Logit- Boost, and with AdaBoost when combined with a natural link function. The answer is negative: both approaches are often ineffective because they overfit P[y = 1jx] even though they perform well as classifiers. A major negative point of the present article is the disconnect between class probability estimation and classification. Next we consider the practice of over/under-sampling of the two classes. We present an algorithm that uses AdaBoost in conjunction with Over/Under-Sampling and Jittering of the data (“JOUS-Boost”). This algorithm is simple, yet successful, and it preserves the advantage of relative protection against overfitting, but for arbitrary misclassification costs and, equivalently, arbitrary quantile boundaries. We then use collections of classifiers obtained from a grid of quantiles to form estimators of class probabilities. The estimates of the class probabilities compare favorably to those obtained by a variety of methods across both simulated and real data sets

    Comment: Boosting Algorithms: Regularization, Prediction and Model Fitting

    Get PDF
    The authors are doing the readers of Statistical Science a true service with a well-written and up-to-date overview of boosting that originated with the seminal algorithms of Freund and Schapire. Equally, we are grateful for high-level software that will permit a larger readership to experiment with, or simply apply, boosting-inspired model fitting. The authors show us a world of methodology that illustrates how a fundamental innovation can penetrate every nook and cranny of statistical thinking and practice. They introduce the reader to one particular interpretation of boosting and then give a display of its potential with extensions from classification (where it all started) to least squares, exponential family models, survival analysis, to base-learners other than trees such as smoothing splines, to degrees of freedom and regularization, and to fascinating recent work in model selection. The uninitiated reader will find that the authors did a nice job of presenting a certain coherent and useful interpretation of boosting. The other reader, though, who has watched the business of boosting for a while, may have quibbles with the authors over details of the historic record and, more importantly, over their optimism about the current state of theoretical knowledge. In fact, as much as the statistical view has proven fruitful, it has also resulted in some ideas about why boosting works that may be misconceived, and in some recommendations that may be misguided

    Boosted Classification Trees and Class Probability/Quantile Estimation

    Get PDF
    The standard by which binary classifiers are usually judged, misclassification error, assumes equal costs of misclassifying the two classes or, equivalently, classifying at the 1/2 quantile of the conditional class probability function P[y = 1|x]. Boosted classification trees are known to perform quite well for such problems. In this article we consider the use of standard, off-the-shelf boosting for two more general problems: 1) classification with unequal costs or, equivalently, classification at quantiles other than 1/2, and 2) estimation of the conditional class probability function P[y = 1|x]. We first examine whether the latter problem, estimation of P[y = 1|x], can be solved with Logit- Boost, and with AdaBoost when combined with a natural link function. The answer is negative: both approaches are often ineffective because they overfit P[y = 1|x] even though they perform well as classifiers. A major negative point of the present article is the disconnect between class probability estimation and classification. Next we consider the practice of over/under-sampling of the two classes. We present an algorithm that uses AdaBoost in conjunction with Over/Under-Sampling and Jittering of the data (“JOUS-Boost”). This algorithm is simple, yet successful, and it preserves the advantage of relative protection against overfitting, but for arbitrary misclassification costs and, equivalently, arbitrary quantile boundaries. We then use collections of classifiers obtained from a grid of quantiles to form estimators of class probabilities. The estimates of the class probabilities compare favorably to those obtained by a variety of methods across both simulated and real data sets

    Enhancing the Communication Competency of Business Undergraduates: A Consumer Socialization Perspective

    Get PDF
    Explaining how individuals acquire the necessary skills and knowledge to effectively participate in society is often accomplished through Socialization Theory. We investigate numerous socialization agents and their relationship with the communication competency of university business majors. Communication competency (reading, writing, and verbal) was measured via both a standardized skill test and self report. Exploratory analysis was conducted upon high and low communication competency groups that were identified via cluster analysis. Our findings generally indicate the most important socialization agents are via personal interactions whereas the least important socialization agents are influencing via primarily electronic or media-based methods

    The Power Spectrum of Mass Fluctuations Measured from the Lyman-alpha Forest at Redshift z=2.5

    Full text link
    We measure the linear power spectrum of mass density fluctuations at redshift z=2.5 from the \lya forest absorption in a sample of 19 QSO spectra, using the method introduced by Croft et al. (1998). The P(k) measurement covers the range 2\pi/k ~ 450-2350 km/s (2-12 comoving \hmpc for \Omega=1). We examine a number of possible sources of systematic error and find none that are significant on these scales. In particular, we show that spatial variations in the UV background caused by the discreteness of the source population should have negligible effect on our P(k) measurement. We obtain consistent results from the high and low redshift halves of the data set and from an entirely independent sample of nine QSO spectra with mean redshift z=2.1. A power law fit to our measured P(k) yields a logarithmic slope n=-2.25 +/- 0.18 and an amplitude \Delta^2(k_p) = 0.57^{+0.26}_{-0.18}, where Δ2\Delta^2 is the contribution to the density variance from a unit interval of lnk and k_p=0.008 (km/s)^{-1}. Direct comparison of our mass P(k) to the measured clustering of Lyman Break Galaxies shows that they are a highly biased population, with a bias factor b~2-5. The slope of the linear P(k), never previously measured on these scales, is close to that predicted by models based on inflation and Cold Dark Matter (CDM). The P(k) amplitude is consistent with some scale-invariant, COBE-normalized CDM models (e.g., an open model with \Omega_0=0.4) and inconsistent with others (e.g., \Omega=1). Even with limited dynamic range and substantial statistical uncertainty, a measurement of P(k) that has no unknown ``bias factors'' offers many opportunities for testing theories of structure formation and constraining cosmological parameters. (Shortened)Comment: Submitted to ApJ, 27 emulateapj pages w/ 19 postscript fig

    Fatigue in fibromyalgia: a conceptual model informed by patient interviews

    Get PDF
    Abstract Background Fatigue is increasingly recognized as an important symptom in fibromyalgia (FM). Unknown however is how fatigue is experienced by individuals in the context of FM. We conducted qualitative research in order to better understand aspects of fatigue that might be unique to FM as well as the impact it has on patients' lives. The data obtained informed the development of a conceptual model of fatigue in FM. Methods Open-ended interviews were conducted with 40 individuals with FM (US [n = 20], Germany [n = 10] and France [n = 10]). Transcripts were analyzed using qualitative methods based upon grounded theory to identify key themes and concepts. Results Participants were mostly female (70%) with a mean age of 48.7 years (range: 25-79). Thirty-one individuals (i.e., 77.5%) spontaneously described experiencing tiredness/lack of energy/fatigue due to FM. Participants discussed FM fatigue as being more severe, constant/persistent and unpredictable than normal tiredness. The conceptual model depicts the key elements of fatigue in FM from a patient perspective. This includes: an overwhelming feeling of tiredness (n = 17, 42.5%), not relieved by resting/sleeping (n = 15, 37.5%), not proportional to effort exerted (n = 25, 62.5%), associated with a feeling of weakness/heaviness (n = 20, 50%), interferes with motivation (n = 22, 55%), interferes with desired activities (n = 27, 67.5%), prolongs tasks (n = 15, 37.5%), and makes it difficult to concentrate (n = 21, 52.5%), think clearly (n = 12, 30%) or remember things (n = 9, 22.5%). Conclusion The majority of individuals with FM who participated in this study experience fatigue and describe it as more severe than normal tiredness.http://deepblue.lib.umich.edu/bitstream/2027.42/112483/1/12891_2010_Article_962.pd

    Participation in an innovative patient support program reduces prescription abandonment for adalimumab-treated patients in a commercial population.

    Get PDF
    Purpose: Nonadherence to indicated therapy reduces treatment effectiveness and may increase cost of care. HUMIRA Complete, a Patient Support Program (PSP), aims to reduce nonadherence in patients prescribed adalimumab (ADA). The objective of this study was to assess the relationship between participation in the PSP and prescription abandonment rates among ADA-treated patients. Patients and methods: This longitudinal study using patient-level data from AbbVie\u27s PSP linked with medical and pharmacy claims data included patients ≥18 years with an ADA-approved indication, ≥1 pharmacy claim for ADA, and available data ≥3 months before and ≥6 months after the index date (defined as the initial ADA claim [01/2015 to 02/2017]). Abandonment was defined as reversal of initial ADA prescription with no paid claim during 3-month follow-up. Abandonment rates were compared between PSP and non-PSP cohorts using multivariable logistic regression controlling for potentially confounding baseline characteristics. Results: In 17,371 patients (9,851 PSP; 7,520 non-PSP), the overall abandonment rate was 10.8-16.8% across indications. The odds of ADA abandonment were 70% less for PSP vs non-PSP patients (5.6% vs 20.4%, odds ratio [OR]=0.30, [95% confidence interval (CI)=0.27-0.33] Conclusion: Participation in the PSP, higher income, and using a specialty pharmacy were associated with lower odds of abandoning ADA therapy, whereas increased copayments were associated with greater abandonment. PSPs should be considered to improve initiation of ADA therapy
    • …
    corecore