386 research outputs found

    A Recurrent Neural Network Survival Model: Predicting Web User Return Time

    Full text link
    The size of a website's active user base directly affects its value. Thus, it is important to monitor and influence a user's likelihood to return to a site. Essential to this is predicting when a user will return. Current state of the art approaches to solve this problem come in two flavors: (1) Recurrent Neural Network (RNN) based solutions and (2) survival analysis methods. We observe that both techniques are severely limited when applied to this problem. Survival models can only incorporate aggregate representations of users instead of automatically learning a representation directly from a raw time series of user actions. RNNs can automatically learn features, but can not be directly trained with examples of non-returning users who have no target value for their return time. We develop a novel RNN survival model that removes the limitations of the state of the art methods. We demonstrate that this model can successfully be applied to return time prediction on a large e-commerce dataset with a superior ability to discriminate between returning and non-returning users than either method applied in isolation.Comment: Accepted into ECML PKDD 2018; 8 figures and 1 tabl

    A family history of breast cancer will not predict female early onset breast cancer in a population-based setting

    Get PDF
    ABSTRACT: BACKGROUND: An increased risk of breast cancer for relatives of breast cancer patients has been demonstrated in many studies, and having a relative diagnosed with breast cancer at an early age is an indication for breast cancer screening. This indication has been derived from estimates based on data from cancer-prone families or from BRCA1/2 mutation families, and might be biased because BRCA1/2 mutations explain only a small proportion of the familial clustering of breast cancer. The aim of the current study was to determine the predictive value of a family history of cancer with regard to early onset of female breast cancer in a population based setting. METHODS: An unselected sample of 1,987 women with and without breast cancer was studied with regard to the age of diagnosis of breast cancer. RESULTS: The risk of early-onset breast cancer was increased when there were: (1) at least 2 cases of female breast cancer in first-degree relatives (yes/no; HR at age 30: 3.09; 95% CI: 128-7.44), (2) at least 2 cases of female breast cancer in first or second-degree relatives under the age of 50 (yes/no; HR at age 30: 3.36; 95% CI: 1.12-10.08), (3) at least 1 case of female breast cancer under the age of 40 in a first- or second-degree relative (yes/no; HR at age 30: 2.06; 95% CI: 0.83-5.12) and (4) any case of bilateral breast cancer (yes/no; HR at age 30: 3.47; 95%: 1.33-9.05). The positive predictive value of having 2 or more of these characteristics was 13% for breast cancer before the age of 70, 11% for breast cancer before the age of 50, and 1% for breast cancer before the age of 30. CONCLUSION: Applying family history related criteria in an unselected population could result in the screening of many women who will not develop breast cancer at an early age

    Using ordinal logistic regression to evaluate the performance of laser-Doppler predictions of burn-healing time

    Get PDF
    Background Laser-Doppler imaging (LDI) of cutaneous blood flow is beginning to be used by burn surgeons to predict the healing time of burn wounds; predicted healing time is used to determine wound treatment as either dressings or surgery. In this paper, we do a statistical analysis of the performance of the technique. Methods We used data from a study carried out by five burn centers: LDI was done once between days 2 to 5 post burn, and healing was assessed at both 14 days and 21 days post burn. Random-effects ordinal logistic regression and other models such as the continuation ratio model were used to model healing-time as a function of the LDI data, and of demographic and wound history variables. Statistical methods were also used to study the false-color palette, which enables the laser-Doppler imager to be used by clinicians as a decision-support tool. Results Overall performance is that diagnoses are over 90% correct. Related questions addressed were what was the best blood flow summary statistic and whether, given the blood flow measurements, demographic and observational variables had any additional predictive power (age, sex, race, % total body surface area burned (%TBSA), site and cause of burn, day of LDI scan, burn center). It was found that mean laser-Doppler flux over a wound area was the best statistic, and that, given the same mean flux, women recover slightly more slowly than men. Further, the likely degradation in predictive performance on moving to a patient group with larger %TBSA than those in the data sample was studied, and shown to be small. Conclusion Modeling healing time is a complex statistical problem, with random effects due to multiple burn areas per individual, and censoring caused by patients missing hospital visits and undergoing surgery. This analysis applies state-of-the art statistical methods such as the bootstrap and permutation tests to a medical problem of topical interest. New medical findings are that age and %TBSA are not important predictors of healing time when the LDI results are known, whereas gender does influence recovery time, even when blood flow is controlled for. The conclusion regarding the palette is that an optimum three-color palette can be chosen 'automatically', but the optimum choice of a 5-color palette cannot be made solely by optimizing the percentage of correct diagnoses

    Adolescents with metabolic syndrome have a history of low aerobic fitness and physical activity levels

    Get PDF
    Abstract: Purpose: Metabolic syndrome (MS) is a clustering of cardiovascular disease risk factors that identifies individuals with the highest risk for heart disease. Two factors that may influence the MS are physical activity and aerobic fitness. This study determined if adolescent with the MS had low levels of aerobic fitness and physical activity as children. Methods: This longitudinal, exploratory study had 389 participants: 51% girls, 84% Caucasian, 12% African American, 1% Hispanic, and 3% other races, from the State of North Carolina. Habitual physical activity (PA survey), aerobic fitness (VO2max), body mass index (BMI), blood pressure, and lipids obtained at 7–10 y of age were compared to their results obtained 7 y later at ages 14–17 y. Results: Eighteen adolescents (4.6%) developed 3 or more characteristics of the MS. Logistic regression, adjusting for BMI percentile, blood pressure, and cholesterol levels, found that adolescents with the MS were 6.08 (95%CI = 1.18–60.08) times more likely to have low aerobic fitness as children and 5.16 (95%CI = 1.06–49.66) times more likely to have low PA levels. Conclusion: Low levels of childhood physical activity and aerobic fitness are associated with the presence of the metabolic syndrome in adolescents. Thus, efforts need to begin early in childhood to increase exercise

    An assessment of existing models for individualized breast cancer risk estimation in a screening program in Spain

    Get PDF
    Background: The aim of this study was to evaluate the calibration and discriminatory power of three predictive models of breast cancer risk. Methods: We included 13,760 women who were first-time participants in the Sabadell-Cerdanyola Breast Cancer Screening Program, in Catalonia, Spain. Projections of risk were obtained at three and five years for invasive cancer using the Gail, Chen and Barlow models. Incidence and mortality data were obtained from the Catalan registries. The calibration and discrimination of the models were assessed using the Hosmer-Lemeshow C statistic, the area under the receiver operating characteristic curve (AUC) and the Harrell’s C statistic. Results: The Gail and Chen models showed good calibration while the Barlow model overestimated the number of cases: the ratio between estimated and observed values at 5 years ranged from 0.86 to 1.55 for the first two models and from 1.82 to 3.44 for the Barlow model. The 5-year projection for the Chen and Barlow models had the highest discrimination, with an AUC around 0.58. The Harrell’s C statistic showed very similar values in the 5-year projection for each of the models. Although they passed the calibration test, the Gail and Chen models overestimated the number of cases in some breast density categories. Conclusions: These models cannot be used as a measure of individual risk in early detection programs to customize screening strategies. The inclusion of longitudinal measures of breast density or other risk factors in joint models of survival and longitudinal data may be a step towards personalized early detection of BC.This study was funded by grant PS09/01340 and The Spanish Network on Chronic Diseases REDISSEC (RD12/0001/0007) from the Health Research Fund (Fondo de Investigación Sanitaria) of the Spanish Ministry of Health

    Development and validation of the Measure of Indigenous Racism Experiences (MIRE)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In recent decades there has been increasing evidence of a relationship between self-reported racism and health. Although a plethora of instruments to measure racism have been developed, very few have been described conceptually or psychometrically Furthermore, this research field has been limited by a dearth of instruments that examine reactions/responses to racism and by a restricted focus on African American populations.</p> <p>Methods</p> <p>In response to these limitations, the 31-item Measure of Indigenous Racism Experiences (MIRE) was developed to assess self-reported racism for Indigenous Australians. This paper describes the development of the MIRE together with an opportunistic examination of its content, construct and convergent validity in a population health study involving 312 Indigenous Australians.</p> <p>Results</p> <p>Focus group research supported the content validity of the MIRE, and inter-item/scale correlations suggested good construct validity. A good fit with <it>a priori </it>conceptual dimensions was demonstrated in factor analysis, and convergence with a separate item on discrimination was satisfactory.</p> <p>Conclusion</p> <p>The MIRE has considerable utility as an instrument that can assess multiple facets of racism together with responses/reactions to racism among indigenous populations and, potentially, among other ethnic/racial groups.</p

    Correlating changes in lung function with patient outcomes in chronic obstructive pulmonary disease: a pooled analysis

    Get PDF
    Background Relationships between improvements in lung function and other clinical outcomes in chronic obstructive pulmonary disease (COPD) are not documented extensively. We examined whether changes in trough forced expiratory volume in 1 second (FEV1) are correlated with changes in patient-reported outcomes. Methods Pooled data from three indacaterol studies (n = 3313) were analysed. Means and responder rates for outcomes including change from baseline in Transition Dyspnoea Index (TDI), St. George's Respiratory Questionnaire (SGRQ) scores (at 12, 26 and 52 weeks), and COPD exacerbation frequency (rate/year) were tabulated across categories of ΔFEV1. Also, generalised linear modelling was performed adjusting for covariates such as baseline severity and inhaled corticosteroid use. Results With increasing positive ΔFEV1, TDI and ΔSGRQ improved at all timepoints, exacerbation rate over the study duration declined (P < 0.001). Individual-level correlations were 0.03-0.18, but cohort-level correlations were 0.79-0.95. At 26 weeks, a 100 ml increase in FEV1 was associated with improved TDI (0.46 units), ΔSGRQ (1.3-1.9 points) and exacerbation rate (12% decrease). Overall, adjustments for baseline covariates had little impact on the relationship between ΔFEV1 and outcomes. Conclusions These results suggest that larger improvements in FEV1 are likely to be associated with larger patient-reported benefits across a range of clinical outcomes

    Elevated antiphospholipid antibody titers and adverse pregnancy outcomes: analysis of a population-based hospital dataset

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The primary objective of this study was to determine if elevated antiphospholipid antibody titers were correlated with the presence of preeclampsia/eclampsia, systemic lupus erythematosus (SLE), placental insufficiency, and a prolonged length of stay (PLOS), in women who delivered throughout Florida, USA.</p> <p>Methods</p> <p>Cross-sectional analyses were conducted using a statewide hospital database. Prevalence odds ratios (OR) were calculated to quantify the association between elevated antiphospholipid antibody titers and four outcomes in 141,286 women who delivered in Florida in 2001. The possibility that the relationship between elevated antiphospholipid antibody titers and the outcomes of preeclampsia/eclampsia, placental insufficiency, and PLOS, may have been modified by the presence of SLE was evaluated in a multiple logistic regression model by creating a composite interaction term.</p> <p>Results</p> <p>Women with elevated antiphospholipid antibody titers (n = 88) were older, more likely to be of white race and not on Medicaid than women who did not have elevated antiphospholipid antibody titers. Women who had elevated antiphospholipid antibody titers had an increased adjusted odds ratio for preeclampsia and eclampsia, (OR = 2.93 p = 0.0015), SLE (OR = 61.24 p < 0.0001), placental insufficiency (OR = 4.58 p = 0.0003), and PLOS (OR = 3.93 p < 0.0001). Patients who had both an elevated antiphospholipid antibody titer and SLE were significantly more likely than the comparison group (women without an elevated titer who did not have SLE) to have the outcomes of preeclampsia, placental insufficiency and PLOS.</p> <p>Conclusion</p> <p>This exploratory epidemiologic investigation found moderate to very strong associations between elevated antiphospholipid antibody titers and four important outcomes in a large sample of women.</p

    Boosting the discriminatory power of sparse survival models via optimization of the concordance index and stability selection

    Get PDF
    Background When constructing new biomarker or gene signature scores for time-to-event outcomes, the underlying aims are to develop a discrimination model that helps to predict whether patients have a poor or good prognosis and to identify the most influential variables for this task. In practice, this is often done fitting Cox models. Those are, however, not necessarily optimal with respect to the resulting discriminatory power and are based on restrictive assumptions. We present a combined approach to automatically select and fit sparse discrimination models for potentially high-dimensional survival data based on boosting a smooth version of the concordance index (C-index). Due to this objective function, the resulting prediction models are optimal with respect to their ability to discriminate between patients with longer and shorter survival times. The gradient boosting algorithm is combined with the stability selection approach to enhance and control its variable selection properties. Results The resulting algorithm fits prediction models based on the rankings of the survival times and automatically selects only the most stable predictors. The performance of the approach, which works best for small numbers of informative predictors, is demonstrated in a large scale simulation study: C-index boosting in combination with stability selection is able to identify a small subset of informative predictors from a much larger set of non-informative ones while controlling the per-family error rate. In an application to discover biomarkers for breast cancer patients based on gene expression data, stability selection yielded sparser models and the resulting discriminatory power was higher than with lasso penalized Cox regression models. Conclusion The combination of stability selection and C-index boosting can be used to select small numbers of informative biomarkers and to derive new prediction rules that are optimal with respect to their discriminatory power. Stability selection controls the per-family error rate which makes the new approach also appealing from an inferential point of view, as it provides an alternative to classical hypothesis tests for single predictor effects. Due to the shrinkage and variable selection properties of statistical boosting algorithms, the latter tests are typically unfeasible for prediction models fitted by boosting
    corecore