59 research outputs found

    Practical considerations for specifying a super learner

    Full text link
    Common tasks encountered in epidemiology, including disease incidence estimation and causal inference, rely on predictive modeling. Constructing a predictive model can be thought of as learning a prediction function, i.e., a function that takes as input covariate data and outputs a predicted value. Many strategies for learning these functions from data are available, from parametric regressions to machine learning algorithms. It can be challenging to choose an approach, as it is impossible to know in advance which one is the most suitable for a particular dataset and prediction task at hand. The super learner (SL) is an algorithm that alleviates concerns over selecting the one "right" strategy while providing the freedom to consider many of them, such as those recommended by collaborators, used in related research, or specified by subject-matter experts. It is an entirely pre-specified and data-adaptive strategy for predictive modeling. To ensure the SL is well-specified for learning the prediction function, the analyst does need to make a few important choices. In this Education Corner article, we provide step-by-step guidelines for making these choices, walking the reader through each of them and providing intuition along the way. In doing so, we aim to empower the analyst to tailor the SL specification to their prediction task, thereby ensuring their SL performs as well as possible. A flowchart provides a concise, easy-to-follow summary of key suggestions and heuristics, based on our accumulated experience, and guided by theory.Comment: This article has been submitted for publication as an Education Corner article in the International Journal of Epidemiology published by Oxford University Pres

    Application of targeted maximum likelihood estimation in public health and epidemiological studies: a systematic review

    Get PDF
    BACKGROUND: The Targeted Maximum Likelihood Estimation (TMLE) statistical data analysis framework integrates machine learning, statistical theory, and statistical inference to provide a least biased, efficient and robust strategy for estimation and inference of a variety of statistical and causal parameters. We describe and evaluate the epidemiological applications that have benefited from recent methodological developments. METHODS: We conducted a systematic literature review in PubMed for articles that applied any form of TMLE in observational studies. We summarised the epidemiological discipline, geographical location, expertise of the authors, and TMLE methods over time. We used the Roadmap of Targeted Learning and Causal Inference to extract key methodological aspects of the publications. We showcase the contributions to the literature of these TMLE results. RESULTS: Of the 89 publications included, 33% originated from the University of California at Berkeley, where the framework was first developed by Professor Mark van der Laan. By 2022, 59% of the publications originated from outside the United States and explored up to 7 different epidemiological disciplines in 2021-22. Double-robustness, bias reduction and model misspecification were the main motivations that drew researchers towards the TMLE framework. Through time, a wide variety of methodological, tutorial and software-specific articles were cited, owing to the constant growth of methodological developments around TMLE. CONCLUSIONS: There is a clear dissemination trend of the TMLE framework to various epidemiological disciplines and to increasing numbers of geographical areas. The availability of R packages, publication of tutorial papers, and involvement of methodological experts in applied publications have contributed to an exponential increase in the number of studies that understood the benefits, and adoption, of TMLE

    Diabetes and risk of pancreatic cancer: a pooled analysis from the pancreatic cancer cohort consortium

    Get PDF
    Diabetes is a suspected risk factor for pancreatic cancer, but questions remain about whether it is a risk factor or a result of the disease. This study prospectively examined the association between diabetes and the risk of pancreatic adenocarcinoma in pooled data from the NCI pancreatic cancer cohort consortium (PanScan). The pooled data included 1,621 pancreatic adenocarcinoma cases and 1,719 matched controls from twelve cohorts using a nested case-control study design. Subjects who were diagnosed with diabetes near the time (< 2 years) of pancreatic cancer diagnosis were excluded from all analyses. All analyses were adjusted for age, race, gender, study, alcohol use, smoking, BMI, and family history of pancreatic cancer. Self-reported diabetes was associated with a forty percent increased risk of pancreatic cancer (OR = 1.40, 95 % CI: 1.07, 1.84). The association differed by duration of diabetes; risk was highest for those with a duration of 2-8 years (OR = 1.79, 95 % CI: 1.25, 2.55); there was no association for those with 9+ years of diabetes (OR = 1.02, 95 % CI: 0.68, 1.52). These findings provide support for a relationship between diabetes and pancreatic cancer risk. The absence of association in those with the longest duration of diabetes may reflect hypoinsulinemia and warrants further investigation

    Somatic mutational landscape of hereditary hematopoietic malignancies caused by germline variants in <i>RUNX1</i>, <i>GATA2</i>, and <i>DDX41</i>

    Get PDF
    Individuals with germ line variants associated with hereditary hematopoietic malignancies (HHMs) have a highly variable risk for leukemogenesis. Gaps in our understanding of premalignant states in HHMs have hampered efforts to design effective clinical surveillance programs, provide personalized preemptive treatments, and inform appropriate counseling for patients. We used the largest known comparative international cohort of germline RUNX1, GATA2, or DDX41 variant carriers without and with hematopoietic malignancies (HMs) to identify patterns of genetic drivers that are unique to each HHM syndrome before and after leukemogenesis. These patterns included striking heterogeneity in rates of early-onset clonal hematopoiesis (CH), with a high prevalence of CH in RUNX1 and GATA2 variant carriers who did not have malignancies (carriers-without HM). We observed a paucity of CH in DDX41 carriers-without HM. In RUNX1 carriers-without HM with CH, we detected variants in TET2, PHF6, and, most frequently, BCOR. These genes were recurrently mutated in RUNX1-driven malignancies, suggesting CH is a direct precursor to malignancy in RUNX1-driven HHMs. Leukemogenesis in RUNX1 and DDX41 carriers was often driven by second hits in RUNX1 and DDX41, respectively. This study may inform the development of HHM-specific clinical trials and gene-specific approaches to clinical monitoring. For example, trials investigating the potential benefits of monitoring DDX41 carriers-without HM for low-frequency second hits in DDX41 may now be beneficial. Similarly, trials monitoring carriers-without HM with RUNX1 germ line variants for the acquisition of somatic variants in BCOR, PHF6, and TET2 and second hits in RUNX1 are warranted

    AMI radio continuum observations of young stellar objects with known outflows

    Get PDF
    We present 16 GHz (1.9 cm) deep radio continuum observations made with the Arcminute Microkelvin Imager (AMI) of a sample of low-mass young stars driving jets. We combine these new data with archival information from an extensive literature search to examine spectral energy distributions (SEDs) for each source and calculate both the radio and sub-mm spectral indices in two different scenarios: (1) fixing the dust temperature (Td) according to evolutionary class; and (2) allowing Td to vary. We use the results of this analysis to place constraints on the physical mechanisms responsible for the radio emission. From AMI data alone, as well as from model fitting to the full SED in both scenarios, we find that 80 per cent of the objects in this sample have spectral indices consistent with freefree emission. We find an average spectral index in both Td scenarios, consistent with freefree emission. We examine correlations of the radio luminosity with bolometric luminosity, envelope mass and outflow force, and find that these data are consistent with the strong correlation with envelope mass seen in lower luminosity samples. We examine the errors associated with determining the radio luminosity and find that the dominant source of error is the uncertainty on the opacity index, beta. We examine the SEDs for variability in these young objects, and find evidence for possible radio flare events in the histories of L1551 IRS 5 and Serpens SMM 1

    Dolutegravir twice-daily dosing in children with HIV-associated tuberculosis: a pharmacokinetic and safety study within the open-label, multicentre, randomised, non-inferiority ODYSSEY trial

    Get PDF
    Background: Children with HIV-associated tuberculosis (TB) have few antiretroviral therapy (ART) options. We aimed to evaluate the safety and pharmacokinetics of dolutegravir twice-daily dosing in children receiving rifampicin for HIV-associated TB. Methods: We nested a two-period, fixed-order pharmacokinetic substudy within the open-label, multicentre, randomised, controlled, non-inferiority ODYSSEY trial at research centres in South Africa, Uganda, and Zimbabwe. Children (aged 4 weeks to <18 years) with HIV-associated TB who were receiving rifampicin and twice-daily dolutegravir were eligible for inclusion. We did a 12-h pharmacokinetic profile on rifampicin and twice-daily dolutegravir and a 24-h profile on once-daily dolutegravir. Geometric mean ratios for trough plasma concentration (Ctrough), area under the plasma concentration time curve from 0 h to 24 h after dosing (AUC0–24 h), and maximum plasma concentration (Cmax) were used to compare dolutegravir concentrations between substudy days. We assessed rifampicin Cmax on the first substudy day. All children within ODYSSEY with HIV-associated TB who received rifampicin and twice-daily dolutegravir were included in the safety analysis. We described adverse events reported from starting twice-daily dolutegravir to 30 days after returning to once-daily dolutegravir. This trial is registered with ClinicalTrials.gov (NCT02259127), EudraCT (2014–002632-14), and the ISRCTN registry (ISRCTN91737921). Findings: Between Sept 20, 2016, and June 28, 2021, 37 children with HIV-associated TB (median age 11·9 years [range 0·4–17·6], 19 [51%] were female and 18 [49%] were male, 36 [97%] in Africa and one [3%] in Thailand) received rifampicin with twice-daily dolutegravir and were included in the safety analysis. 20 (54%) of 37 children enrolled in the pharmacokinetic substudy, 14 of whom contributed at least one evaluable pharmacokinetic curve for dolutegravir, including 12 who had within-participant comparisons. Geometric mean ratios for rifampicin and twice-daily dolutegravir versus once-daily dolutegravir were 1·51 (90% CI 1·08–2·11) for Ctrough, 1·23 (0·99–1·53) for AUC0–24 h, and 0·94 (0·76–1·16) for Cmax. Individual dolutegravir Ctrough concentrations were higher than the 90% effective concentration (ie, 0·32 mg/L) in all children receiving rifampicin and twice-daily dolutegravir. Of 18 children with evaluable rifampicin concentrations, 15 (83%) had a Cmax of less than the optimal target concentration of 8 mg/L. Rifampicin geometric mean Cmax was 5·1 mg/L (coefficient of variation 71%). During a median follow-up of 31 weeks (IQR 30–40), 15 grade 3 or higher adverse events occurred among 11 (30%) of 37 children, ten serious adverse events occurred among eight (22%) children, including two deaths (one tuberculosis-related death, one death due to traumatic injury); no adverse events, including deaths, were considered related to dolutegravir. Interpretation: Twice-daily dolutegravir was shown to be safe and sufficient to overcome the rifampicin enzyme-inducing effect in children, and could provide a practical ART option for children with HIV-associated TB

    The FANCM:p.Arg658* truncating variant is associated with risk of triple-negative breast cancer

    Get PDF
    Abstract: Breast cancer is a common disease partially caused by genetic risk factors. Germline pathogenic variants in DNA repair genes BRCA1, BRCA2, PALB2, ATM, and CHEK2 are associated with breast cancer risk. FANCM, which encodes for a DNA translocase, has been proposed as a breast cancer predisposition gene, with greater effects for the ER-negative and triple-negative breast cancer (TNBC) subtypes. We tested the three recurrent protein-truncating variants FANCM:p.Arg658*, p.Gln1701*, and p.Arg1931* for association with breast cancer risk in 67,112 cases, 53,766 controls, and 26,662 carriers of pathogenic variants of BRCA1 or BRCA2. These three variants were also studied functionally by measuring survival and chromosome fragility in FANCM−/− patient-derived immortalized fibroblasts treated with diepoxybutane or olaparib. We observed that FANCM:p.Arg658* was associated with increased risk of ER-negative disease and TNBC (OR = 2.44, P = 0.034 and OR = 3.79; P = 0.009, respectively). In a country-restricted analysis, we confirmed the associations detected for FANCM:p.Arg658* and found that also FANCM:p.Arg1931* was associated with ER-negative breast cancer risk (OR = 1.96; P = 0.006). The functional results indicated that all three variants were deleterious affecting cell survival and chromosome stability with FANCM:p.Arg658* causing more severe phenotypes. In conclusion, we confirmed that the two rare FANCM deleterious variants p.Arg658* and p.Arg1931* are risk factors for ER-negative and TNBC subtypes. Overall our data suggest that the effect of truncating variants on breast cancer risk may depend on their position in the gene. Cell sensitivity to olaparib exposure, identifies a possible therapeutic option to treat FANCM-associated tumors

    TRY plant trait database – enhanced coverage and open access

    Get PDF
    Plant traits - the morphological, anatomical, physiological, biochemical and phenological characteristics of plants - determine how plants respond to environmental factors, affect other trophic levels, and influence ecosystem properties and their benefits and detriments to people. Plant trait data thus represent the basis for a vast area of research spanning from evolutionary biology, community and functional ecology, to biodiversity conservation, ecosystem and landscape management, restoration, biogeography and earth system modelling. Since its foundation in 2007, the TRY database of plant traits has grown continuously. It now provides unprecedented data coverage under an open access data policy and is the main plant trait database used by the research community worldwide. Increasingly, the TRY database also supports new frontiers of trait‐based plant research, including the identification of data gaps and the subsequent mobilization or measurement of new data. To support this development, in this article we evaluate the extent of the trait data compiled in TRY and analyse emerging patterns of data coverage and representativeness. Best species coverage is achieved for categorical traits - almost complete coverage for ‘plant growth form’. However, most traits relevant for ecology and vegetation modelling are characterized by continuous intraspecific variation and trait–environmental relationships. These traits have to be measured on individual plants in their respective environment. Despite unprecedented data coverage, we observe a humbling lack of completeness and representativeness of these continuous traits in many aspects. We, therefore, conclude that reducing data gaps and biases in the TRY database remains a key challenge and requires a coordinated approach to data mobilization and trait measurements. This can only be achieved in collaboration with other initiatives
    • 

    corecore