1,273 research outputs found

    Variable ranking and selection with random forest for unbalanced data

    Get PDF
    When one or several classes are much less prevalent than another class (unbalanced data), class error rates and variable importances of the machine learning algorithm random forest can be biased, particularly when sample sizes are smaller, imbalance levels higher, and effect sizes of important variables smaller. Using simulated data varying in size, imbalance level, number of true variables, their effect sizes, and the strength of multicollinearity between covariates, we evaluated how eight versions of random forest ranked and selected true variables out of a large number of covariates despite class imbalance. The version that calculated variable importance based on the area under the curve (AUC) was least adversely affected by class imbalance. For the same number of true variables, effect sizes, and multicollinearity between covariates, the AUC variable importance ranked true variables still highly at the lower sample sizes and higher imbalance levels at which the other seven versions no longer achieved high ranks for true variables. Conversely, using the Hellinger distance to split trees or downsampling the majority class already ranked true variables lower and more variably at the larger sample sizes and lower imbalance levels at which the other algorithms still ranked true variables highly. In variable selection, a higher proportion of true variables were identified when covariates were ranked by AUC importances and the proportion increased further when the AUC was used as the criterion in forward variable selection. In three case studies, known species–habitat relationships and their spatial scales were identified despite unbalanced data

    Comprehensive molecular characterisation of epilepsy-associated glioneuronal tumours

    Get PDF
    Glioneuronal tumours are an important cause of treatment-resistant epilepsy. Subtypes of tumour are often poorly discriminated by histological features and may be difficult to diagnose due to a lack of robust diagnostic tools. This is illustrated by marked variability in the reported frequencies across different epilepsy surgical series. To address this, we used DNA methylation arrays and RNA sequencing to assay the methylation and expression profiles within a large cohort of glioneuronal tumours. By adopting a class discovery approach, we were able to identify two distinct groups of glioneuronal tumour, which only partially corresponded to the existing histological classification. Furthermore, by additional molecular analyses, we were able to identify pathogenic mutations in BRAF and FGFR1, specific to each group, in a high proportion of cases. Finally, by interrogating our expression data, we were able to show that each molecular group possessed expression phenotypes suggesting different cellular differentiation: astrocytic in one group and oligodendroglial in the second. Informed by this, we were able to identify CCND1, CSPG4, and PDGFRA as immunohistochemical targets which could distinguish between molecular groups. Our data suggest that the current histological classification of glioneuronal tumours does not adequately represent their underlying biology. Instead, we show that there are two molecular groups within glioneuronal tumours. The first of these displays astrocytic differentiation and is driven by BRAF mutations, while the second displays oligodendroglial differentiation and is driven by FGFR1 mutations

    Cost-effectiveness of Sertraline in primary care according to initial severity and duration of depressive symptoms: findings from the PANDA RCT

    Get PDF
    BACKGROUND: Antidepressants are commonly prescribed for depression, but it is unclear whether treatment efficacy depends on severity and duration of symptoms and how prescribing might be targeted cost-effectively. OBJECTIVES: We investigated the cost-effectiveness of the antidepressant sertraline compared with placebo in subgroups defined by severity and duration of depressive symptoms. METHODS: We undertook a cost-effectiveness analysis from the perspective of the NHS and Personal and Social Services (PSS) in the UK alongside the PANDA (What are the indications for Prescribing ANtiDepressants that will leAd to a clinical benefit?) randomised controlled trial (RCT), which compared sertraline with placebo over a 12-week period. Quality of life data were collected at baseline and at 2, 6, and 12 weeks post-randomisation using EQ-5D-5L, from which we calculated quality-adjusted life years (QALYs). Costs (in 2017/18£) were collected using patient records and from resource use questionnaires administered at each follow-up interval. Differences in mean costs and mean QALYs and net monetary benefits were estimated. Our primary analysis used net monetary benefit regressions to identify any interaction between the cost-effectiveness of sertraline and subgroups defined by baseline symptom severity (0-11; 12-19; 20+ on the Clinical Interview Schedule-Revised) and, separately, duration of symptoms (greater or less than 2 years duration). A secondary analysis estimated the cost-effectiveness of sertraline versus placebo, irrespective of duration or severity. RESULTS: There was no evidence of an association between the baseline severity of depressive symptoms and the cost-effectiveness of sertraline. Compared to patients with low symptom severity, the expected net benefits in patients with moderate symptoms were £24 (95% CI - £280 to £328; p value 0.876) and the expected net benefits in patients with high symptom severity were £37 (95% CI - £221 to £296; p value 0.776). Patients who had a longer history of depressive symptoms at baseline had lower expected net benefits from sertraline than those with a shorter history; however, the difference was uncertain (- £27 [95% CI - £258 to £204]; p value 0.817). In the secondary analysis, patients treated with sertraline had higher expected net benefits (£122 [95% CI £18 to £226]; p value 0.101) than those in the placebo group. Sertraline had a high probability (> 95%) of being cost-effective if the health system was willing to pay at least £20,000 per QALY gained. CONCLUSIONS: We found insufficient evidence of a prespecified threshold based on severity or symptom duration that GPs could use to target prescribing to a subgroup of patients where sertraline is most cost-effective. Sertraline is probably a cost-effective treatment for depressive symptoms in UK primary care. TRIAL REGISTRATION: Controlled Trials ISRCTN Registry, ISRCTN84544741

    Repeat prenatal corticosteroid prior to preterm birth: a systematic review and individual participant data meta-analysis for the PRECISE study group (prenatal repeat corticosteroid international IPD study group: assessing the effects using the best level of evidence) - study protocol

    Get PDF
    This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.BACKGROUND The aim of this individual participant data (IPD) meta-analysis is to assess whether the effects of repeat prenatal corticosteroid treatment given to women at risk of preterm birth to benefit their babies are modified in a clinically meaningful way by factors related to the women or the trial protocol. METHODS/DESIGN The Prenatal Repeat Corticosteroid International IPD Study Group: assessing the effects using the best level of Evidence (PRECISE) Group will conduct an IPD meta-analysis. The PRECISE International Collaborative Group was formed in 2010 and data collection commenced in 2011. Eleven trials with up to 5,000 women and 6,000 infants are eligible for the PRECISE IPD meta-analysis. The primary study outcomes for the infants will be serious neonatal outcome (defined by the PRECISE International IPD Study Group as one of death (foetal, neonatal or infant); severe respiratory disease; severe intraventricular haemorrhage (grade 3 and 4); chronic lung disease; necrotising enterocolitis; serious retinopathy of prematurity; and cystic periventricular leukomalacia); use of respiratory support (defined as mechanical ventilation or continuous positive airways pressure or other respiratory support); and birth weight (Z-scores). For the children, the primary study outcomes will be death or any neurological disability (however defined by trialists at childhood follow up and may include developmental delay or intellectual impairment (developmental quotient or intelligence quotient more than one standard deviation below the mean), cerebral palsy (abnormality of tone with motor dysfunction), blindness (for example, corrected visual acuity worse than 6/60 in the better eye) or deafness (for example, hearing loss requiring amplification or worse)). For the women, the primary outcome will be maternal sepsis (defined as chorioamnionitis; pyrexia after trial entry requiring the use of antibiotics; puerperal sepsis; intrapartum fever requiring the use of antibiotics; or postnatal pyrexia). DISCUSSION Data analyses are expected to commence in 2011 with results publicly available in 2012.Caroline A Crowther ... Tanya K Bubner ... Philippa F Middleton ... Lisa Yelland ... Sasha Zhang ... et al

    Regional variation in hospitalization for stroke among Asians/Pacific Islanders in the United States: a nationwide retrospective cohort study

    Get PDF
    BACKGROUND: In Asia, stroke incidence varies dramatically from country to country. Little is known about stroke incidence in Asians/Pacific Islanders in the US, where regional heterogeneity in Asian/Pacific Islander sub-populations is great. We sought to characterize both the national and regional incidences of first and recurrent hospitalized acute ischemic stroke, subarachnoid hemorrhage, and intracerebral hemorrhage in Asians/Pacific Islanders compared to non-Hispanic whites. METHODS: We used the National Inpatient Sample of the 1997 Healthcare Cost and Utilization Project. It is a 20% stratified sample of hospitalizations to nonfederal hospitals in the US. National and regional projections were made using sampling weights specific for patients and hospitals. We identified stroke subtypes using previously validated ICD-9 codes. Age-adjusted incidence rates were calculated using the direct method with the US population in 2000 as the standard. RESULTS: There were 169,386 stroke hospitalizations in the database. Nationally, compared to whites, Asians/Pacific Islanders were more likely to have subarachnoid hemorrhage (incidence rate ratio {RR} female: 1.53, 95% CI 1.41–1.65; male RR: 1.13, 95% CI 1.00–1.27) and intracerebral hemorrhage (female RR 1.29, 95% CI 1.22–1.36; male RR: 1.58, 95% CI 1.50–1.67). However, when examined by geographic regions, Asians/Pacific Islanders had higher incidence rates of subarachnoid hemorrhage and intracerebral hemorrhage predominantly in the West, and lower rates of stroke elsewhere. CONCLUSION: Stroke incidence varies 3-fold among Asians/Pacific Islanders residing in different US regions. Geographic variation is less dramatic in whites. Whether genetic or cultural differences are responsible for dramatic heterogeneity among Asian/Pacific Islander populations is unclear and deserves further study

    Prospective risk of stillbirth and neonatal complications in twin pregnancies: systematic review and meta-analysis.

    Get PDF
    OBJECTIVE: To determine the risks of stillbirth and neonatal complications by gestational age in uncomplicated monochorionic and dichorionic twin pregnancies. DESIGN: Systematic review and meta-analysis. DATA SOURCES: Medline, Embase, and Cochrane databases (until December 2015). REVIEW METHODS: Databases were searched without language restrictions for studies of women with uncomplicated twin pregnancies that reported rates of stillbirth and neonatal outcomes at various gestational ages. Pregnancies with unclear chorionicity, monoamnionicity, and twin to twin transfusion syndrome were excluded. Meta-analyses of observational studies and cohorts nested within randomised studies were undertaken. Prospective risk of stillbirth was computed for each study at a given week of gestation and compared with the risk of neonatal death among deliveries in the same week. Gestational age specific differences in risk were estimated for stillbirths and neonatal deaths in monochorionic and dichorionic twin pregnancies after 34 weeks' gestation. RESULTS: 32 studies (29 685 dichorionic, 5486 monochorionic pregnancies) were included. In dichorionic twin pregnancies beyond 34 weeks (15 studies, 17 830 pregnancies), the prospective weekly risk of stillbirths from expectant management and the risk of neonatal death from delivery were balanced at 37 weeks' gestation (risk difference 1.2/1000, 95% confidence interval -1.3 to 3.6; I(2)=0%). Delay in delivery by a week (to 38 weeks) led to an additional 8.8 perinatal deaths per 1000 pregnancies (95% confidence interval 3.6 to 14.0/1000; I(2)=0%) compared with the previous week. In monochorionic pregnancies beyond 34 weeks (13 studies, 2149 pregnancies), there was a trend towards an increase in stillbirths compared with neonatal deaths after 36 weeks, with an additional 2.5 per 1000 perinatal deaths, which was not significant (-12.4 to 17.4/1000; I(2)=0%). The rates of neonatal morbidity showed a consistent reduction with increasing gestational age in monochorionic and dichorionic pregnancies, and admission to the neonatal intensive care unit was the commonest neonatal complication. The actual risk of stillbirth near term might be higher than reported estimates because of the policy of planned delivery in twin pregnancies. CONCLUSIONS: To minimise perinatal deaths, in uncomplicated dichorionic twin pregnancies delivery should be considered at 37 weeks' gestation; in monochorionic pregnancies delivery should be considered at 36 weeks. SYSTEMATIC REVIEW REGISTRATION: PROSPERO CRD42014007538

    Search for Kaluza-Klein Graviton Emission in ppˉp\bar{p} Collisions at s=1.8\sqrt{s}=1.8 TeV using the Missing Energy Signature

    Get PDF
    We report on a search for direct Kaluza-Klein graviton production in a data sample of 84 pb1{pb}^{-1} of \ppb collisions at s\sqrt{s} = 1.8 TeV, recorded by the Collider Detector at Fermilab. We investigate the final state of large missing transverse energy and one or two high energy jets. We compare the data with the predictions from a 3+1+n3+1+n-dimensional Kaluza-Klein scenario in which gravity becomes strong at the TeV scale. At 95% confidence level (C.L.) for nn=2, 4, and 6 we exclude an effective Planck scale below 1.0, 0.77, and 0.71 TeV, respectively.Comment: Submitted to PRL, 7 pages 4 figures/Revision includes 5 figure

    TRY plant trait database - enhanced coverage and open access

    Get PDF
    Plant traits-the morphological, anatomical, physiological, biochemical and phenological characteristics of plants-determine how plants respond to environmental factors, affect other trophic levels, and influence ecosystem properties and their benefits and detriments to people. Plant trait data thus represent the basis for a vast area of research spanning from evolutionary biology, community and functional ecology, to biodiversity conservation, ecosystem and landscape management, restoration, biogeography and earth system modelling. Since its foundation in 2007, the TRY database of plant traits has grown continuously. It now provides unprecedented data coverage under an open access data policy and is the main plant trait database used by the research community worldwide. Increasingly, the TRY database also supports new frontiers of trait-based plant research, including the identification of data gaps and the subsequent mobilization or measurement of new data. To support this development, in this article we evaluate the extent of the trait data compiled in TRY and analyse emerging patterns of data coverage and representativeness. Best species coverage is achieved for categorical traits-almost complete coverage for 'plant growth form'. However, most traits relevant for ecology and vegetation modelling are characterized by continuous intraspecific variation and trait-environmental relationships. These traits have to be measured on individual plants in their respective environment. Despite unprecedented data coverage, we observe a humbling lack of completeness and representativeness of these continuous traits in many aspects. We, therefore, conclude that reducing data gaps and biases in the TRY database remains a key challenge and requires a coordinated approach to data mobilization and trait measurements. This can only be achieved in collaboration with other initiatives
    corecore