352 research outputs found

    The number of subjects per variable required in linear regression analyses

    Get PDF
    Objectives To determine the number of independent variables that can be included in a linear regression model. Study Design and Setting We used a series of Monte Carlo simulations to examine the impact of the number of subjects per variable (SPV) on the accuracy of estimated regression coefficients and standard errors, on the empirical coverage of estimated confidence intervals, and on the accuracy of the estimated R2 of the fitted model. Results A minimum of approximately two SPV tended to result in estimation of regression coefficients with relative bias of less than 10%. Furthermore, with this minimum number of SPV, the standard errors of the regression coefficients were accurately estimated and estimated confidence intervals had approximately the advertised coverage rates. A much higher number of SPV were necessary to minimize bias in estimating the model R2, although adjusted R2 estimates behaved well. The bias in estimating the model R2 statistic was inversely proportional to the magnitude of the proportion of variation explained by the population regression model. Conclusion Linear regression models require only two SPV for adequate estimation of regression coefficients, standard errors, and confidence intervals

    Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers

    Get PDF
    Predicting the probability of the occurrence of a binary outcome or condition is important in biomedical research. While assessing discrimination is an essential issue in developing and validating binary prediction models, less attention has been paid to methods for assessing model calibration. Calibration refers to the degree of agreement between observed and predicted probabilities and is often assessed by testing for lack-of-fit. The objective of our study was to examine the ability of graphical methods to assess the calibration of logistic regression models. We examined lack of internal calibration, which was related to misspecification of the logistic regression model, and external calibration, which was related to an overfit model or to shrinkage of the linear predictor. We conducted an extensive set of Monte Carlo simulations with a locally weighted least squares regression smoother (i.e., the loess algorithm) to examine the ability of graphical methods to assess model calibration. We found that loess-based methods were able to provide evidence of moderate departures from linearity and indicate omission of a moderately strong interaction. Misspecification of the link function was harder to detect. Visual patterns were clearer with higher sample sizes, higher incidence of the outcome, or higher discrimination. Loess-based methods were also able to identify the lack of calibration in external validation samples when an overfit regression model had been used. In conclusion, loess-based smoothing methods are adequate tools to graphically assess calibration and merit wider application

    Predictive accuracy of novel risk factors and markers: A simulation study of the sensitivity of different performance measures for the Cox proportional hazards regression model

    Get PDF
    Predicting outcomes that occur over time is important in clinical, population health, and health services research. We compared changes in different measures of performance when a novel risk factor or marker was added to an existing Cox proportional hazards regression model. We performed Monte Carlo simulations for common measures of performance: concordance indices (c, including various extensions to survival outcomes), Royston's D index, R2-type measures, and Chambless' adaptation of the integrated discrimination improvement to survival outcomes. We found that the increase in performance due to the inclusion of a risk factor tended to decrease as the performance of the reference model increased. Moreover, the increase in performance increased as the hazard ratio or the prevalence of a binary risk factor increased. Finally, for the concordance indices and R2-type measures, the absolute increase in predictive accuracy due to the inclusion of a risk factor was greater when the observed event rate was higher (low censoring). Amongst the different concordance indices, Chambless and Diao's c-statistic exhibited the greatest increase in predictive accuracy when a novel risk factor was added to an existing model. Amongst the different R2-type measures, O'Quigley et al.'s modification of Nagelkerke's R2 index and Kent and O'Quigley's Ï w, a 2 displayed the greatest sensitivity to the addition of a novel risk factor or marker. These methods were then applied to a cohort of 8635 patients hospitalized with heart failure to examine the added benefit of a point-based scoring system for predicting mortality after initial adjustment with patient age alone

    Predictive performance of machine and statistical learning methods: impact of data-generating processes on external validity in the "large N, small p" setting

    Get PDF
    Machine learning approaches are increasingly suggested as tools to improve prediction of clinical outcomes. We aimed to identify when machine learning methods perform better than a classical learning method. We hereto examined the impact of the data-generating process on the relative predictive accuracy of six machine and statistical learning methods: bagged classification trees, stochastic gradient boosting machines using trees as the base learners, random forests, the lasso, ridge regression, and unpenalized logistic regression. We performed simulations in two large cardiovascular datasets which each comprised an independent derivation and validation sample collected from temporally distinct periods: patients hospitalized with acute myocardial infarction (AMI, n = 9484 vs. n = 7000) and patients hospitalized with congestive heart failure (CHF, n = 8240 vs. n = 7608). We used six data-generating processes based on each of the six learning methods to simulate outcomes in the derivation and validation samples based on 33 and 28 predictors in the AMI and CHF data sets, respectively. We applied six prediction methods in each of the simulated derivation samples and evaluated performance in the simulated validation samples according to c-statistic, generalized R-2, Brier score, and calibration. While no method had uniformly superior performance across all six data-generating process and eight performance metrics, (un)penalized logistic regression and boosted trees tended to have superior performance to the other methods across a range of data-generating processes and performance metrics. This study confirms that classical statistical learning methods perform well in low-dimensional settings with large data sets.Development and application of statistical models for medical scientific researc

    A two year old infected with Dicrocoelium dendriticum: a case report

    Get PDF
    Dicrocoelium dendriticum infection is rare in human and generally non-fatal unless infection of liver is severe. The disease could lead to biliary colic, digestive disturbances that include bloating and diarrhea. In heavy infections, bile ducts and the biliary epithelium may become enlarged with the generation of fibrous tissues around the ducts leading to hepatomegaly or inflammation of the liver (cirrhosis). This is a case presentation of a two-year old male child who is infected with Human Immune deficiency virus (HIV). BMI was 10.3kg/m2, which is below the fifth percentile. Child lost 1.5kg on retrospective review of case file and another 0.5kg after presentation with loss of appetite, cough and fever. Patient was reportedly fed with liver on several occasion. Stool examination revealed many Dicrocoelium dendriticum and Ascaris lumbricoides ova. Albendazole treatment was instituted and after three months, body weight improved to 10.5kg. It becomes important to screen underweight children for helminthiasis, particularly HIV/AIDS patients whose HIV treatment plan might be of priority to the physician.Keywords: Dicrocoelium dendriticum, Child, Ascaris lumbricoides, HIV/AIDS, Albendazole

    Observation of a new chi_b state in radiative transitions to Upsilon(1S) and Upsilon(2S) at ATLAS

    Get PDF
    The chi_b(nP) quarkonium states are produced in proton-proton collisions at the Large Hadron Collider (LHC) at sqrt(s) = 7 TeV and recorded by the ATLAS detector. Using a data sample corresponding to an integrated luminosity of 4.4 fb^-1, these states are reconstructed through their radiative decays to Upsilon(1S,2S) with Upsilon->mu+mu-. In addition to the mass peaks corresponding to the decay modes chi_b(1P,2P)->Upsilon(1S)gamma, a new structure centered at a mass of 10.530+/-0.005 (stat.)+/-0.009 (syst.) GeV is also observed, in both the Upsilon(1S)gamma and Upsilon(2S)gamma decay modes. This is interpreted as the chi_b(3P) system.Comment: 5 pages plus author list (18 pages total), 2 figures, 1 table, corrected author list, matches final version in Physical Review Letter

    Search for displaced vertices arising from decays of new heavy particles in 7 TeV pp collisions at ATLAS

    Get PDF
    We present the results of a search for new, heavy particles that decay at a significant distance from their production point into a final state containing charged hadrons in association with a high-momentum muon. The search is conducted in a pp-collision data sample with a center-of-mass energy of 7 TeV and an integrated luminosity of 33 pb^-1 collected in 2010 by the ATLAS detector operating at the Large Hadron Collider. Production of such particles is expected in various scenarios of physics beyond the standard model. We observe no signal and place limits on the production cross-section of supersymmetric particles in an R-parity-violating scenario as a function of the neutralino lifetime. Limits are presented for different squark and neutralino masses, enabling extension of the limits to a variety of other models.Comment: 8 pages plus author list (20 pages total), 8 figures, 1 table, final version to appear in Physics Letters

    Reducing heterotic M-theory to five dimensional supergravity on a manifold with boundary

    Get PDF
    This paper constructs the reduction of heterotic MM-theory in eleven dimensions to a supergravity model on a manifold with boundary in five dimensions using a Calabi-Yau three-fold. New results are presented for the boundary terms in the action and for the boundary conditions on the bulk fields. Some general features of dualisation on a manifold with boundary are used to explain the origin of some topological terms in the action. The effect of gaugino condensation on the fermion boundary conditions leads to a `twist' in the chirality of the gravitino which can provide an uplifting mechanism in the vacuum energy to cancel the cosmological constant after moduli stabilisation.Comment: 16 pages, RevTe

    Measurement of the inclusive isolated prompt photon cross-section in pp collisions at sqrt(s)= 7 TeV using 35 pb-1 of ATLAS data

    Get PDF
    A measurement of the differential cross-section for the inclusive production of isolated prompt photons in pp collisions at a center-of-mass energy sqrt(s) = 7 TeV is presented. The measurement covers the pseudorapidity ranges |eta|<1.37 and 1.52<=|eta|<2.37 in the transverse energy range 45<=E_T<400GeV. The results are based on an integrated luminosity of 35 pb-1, collected with the ATLAS detector at the LHC. The yields of the signal photons are measured using a data-driven technique, based on the observed distribution of the hadronic energy in a narrow cone around the photon candidate and the photon selection criteria. The results are compared with next-to-leading order perturbative QCD calculations and found to be in good agreement over four orders of magnitude in cross-section.Comment: 7 pages plus author list (18 pages total), 2 figures, 4 tables, final version published in Physics Letters

    Measurement of the production cross section of prompt j/psi mesons in association with a W (+/-) boson in pp collisions root s=7 TeV with the ATLAS detector

    Get PDF
    The process pp → W±J/ψ provides a powerful probe of the production mechanism of charmonium in hadronic collisions, and is also sensitive to multiple parton interactions in the colliding protons. Using the 2011 ATLAS dataset of 4.5 fb-1 of p s = 7TeV pp collisions at the LHC, the first observation is made of the production of W± + prompt J/ events in hadronic collisions, using W± → μ and J/ψ → μ+μ-. A yield of 27.4±7.5 -6.5 W± + prompt J/ψ events is observed, with a statistical significance of 5.1. The production rate as a ratio to the inclusive W± boson production rate is measured, and the double parton scattering contribution to the cross section is estimated. Copyright CERN, for the benefit of the ATLAS Collaboration
    corecore