121 research outputs found
Assessing the association between pre-course metrics of student preparation and student performance in introductory statistics: Results from early data on simulation-based inference vs. nonsimulation based inference
The recent simulation-based inference (SBI) movement in algebra-based
introductory statistics courses (Stat 101) has provided preliminary evidence of
improved student conceptual understanding and retention. However, little is
known about whether these positive effects are preferentially distributed
across types of students entering the course. We consider how two metrics of
Stat 101 student preparation (pre-course performance on concept inventory and
math ACT score) may or may not be associated with end of course student
performance on conceptual inventories. Students across all preparation levels
tended to show improvement in Stat 101, but more improvement was observed
across all student preparation levels in early versions of a SBI course.
Furthermore, students' gains tended to be similar regardless of whether
students entered the course with more preparation or less. Recent data on a
sample of students using a current version of an SBI course showed similar
results, though direct comparison with non-SBI students was not possible.
Overall, our analysis provides additional evidence that SBI curricula are
effective at improving students' conceptual understanding of statistical ideas
post-course regardless student preparation. Further work is needed to better
understand nuances of student improvement based on other student demographics,
prior coursework, as well as instructor and institutional variables.Comment: 16 page
Combating anti-statistical thinking using simulation-based methods throughout the undergraduate curriculum
The use of simulation-based methods for introducing inference is growing in
popularity for the Stat 101 course, due in part to increasing evidence of the
methods ability to improve students' statistical thinking. This impact comes
from simulation-based methods (a) clearly presenting the overarching logic of
inference, (b) strengthening ties between statistics and probability or
mathematical concepts, (c) encouraging a focus on the entire research process,
(d) facilitating student thinking about advanced statistical concepts, (e)
allowing more time to explore, do, and talk about real research and messy data,
and (f) acting as a firmer foundation on which to build statistical intuition.
Thus, we argue that simulation-based inference should be an entry point to an
undergraduate statistics program for all students, and that simulation-based
inference should be used throughout all undergraduate statistics courses. In
order to achieve this goal and fully recognize the benefits of simulation-based
inference on the undergraduate statistics program we will need to break free of
historical forces tying undergraduate statistics curricula to mathematics,
consider radical and innovative new pedagogical approaches in our courses,
fully implement assessment-driven content innovations, and embrace computation
throughout the curriculum.Comment: To be published in "The American Statistician
Challenging the State of the Art in Post-Introductory Statistics: Preparation, Concepts, and Pedagogy
The demands for a statistically literate society are increasing, and the introductory statistics course ( Stat 101 ) remains the primary venue for learning statistics for the majority of high school and undergraduate students. After three decades of very fruitful activity in the areas of pedagogy and assessment, but with comparatively little pressure for rethinking the content of this course, the statistics education community has recently turned its attention to use of randomization-based methods to illustrate core concepts of statistical inference. This new focus not only presents an opportunity to address documented shortcomings in the standard Stat 101 course (for example, improving students’ reasoning about inference), but provides an impetus for re-thinking the timing of the introduction of multivariable statistical methods (for example, multiple regression and general linear models). Multivariable methods dominate modern statistical practice but are rarely seen in the introductory course. Instead these methods have been, traditionally, relegated to second courses in statistics for students with a background in calculus and linear algebra. Recently, curricula have been developed to bring multivariable content to students who have only taken a Stat 101 course. However, these courses tend to focus on models and model-building as an end in itself. We have developed a preliminary version of an integrated one to two semester curriculum which introduces students to the core-logic of statistical inference through randomization-methods, and then introduces students to approaches for protecting against confounding and variability through multivariable statistical design and analysis techniques. The course has been developed by putting primary emphasis on the development of students’ conceptual understanding in an intuitive, cyclical, active-learning pedagogy, while continuing to emphasize the overall process of statistical investigations, from asking questions and collecting data through making inferences and drawing conclusions. The curriculum successfully introduces introductory statistics students to multivariable techniques in their first or second course
Broadening the Impact and Effectiveness of Simulation-Based Curricula for Introductory Statistics
The demands for a statistically literate society are increasing, and the introductory statistics course “Stat 101” remains the primary venue for learning statistics for the majority of high school and undergraduate students. After three decades of very fruitful activity in the areas of pedagogy and assessment, but with comparatively little pressure for rethinking the content of this course, the statistics education community has recently turned its attention to focusing on simulation-based methods, including bootstrapping and permutation tests, to illustrate core concepts of statistical inference within the context of the overall statistical investigative process. This new focus presents an opportunity to address documented shortcomings in the standard Stat 101 course (e.g., seeing the big picture; improving statistical thinking over mere knowledge of procedures).
Our group has developed and implemented one of the first cohesive curricula that (a) emphasizes the core logic of inference using simulation-based methods in an intuitive, cyclical, active-learning pedagogy, and (b) emphasizes the overall process of statistical investigations, from asking questions and collecting data through making inferences and drawing conclusions. Improved conceptual understanding and retention of inference and study design that had been observed when using early versions of the curriculum at a single institution, are now being evaluated at dozens of institutions across the country with thousands of students using the fully integrated, stand-alone version of the curriculum. Encouraging preliminary results continue to be observed.
We are now leveraging the tremendous national momentum and excitement about the approach to greatly expand implementations of simulation-based curricula by offering workshops around the country to diverse sets of faculty, offering numerous online support structures including: a blog, freely available applets, free instructor materials, earning objective-based instructional videos, free instructor-focused training videos, a listserv, and peer-reviewed publications covering both rationale and assessment results. Many hundreds of instructors have been directly impacted by our workshops and hundreds more through access to the free online materials. We are also in the midst of valuating widespread transferability of the approach across diverse institutions, students, and learning environments and deepening our understanding of how students’ attitudes and conceptual understanding develop using this approach through an assessment project involving concept and attitude inventories with over 10,000 students across 200 different instructors
Identification of novel genetic susceptibility loci for Behçet's disease using a genome-wide association study
Introduction Behcet's disease is a chronic systemic inflammatory disease that remains incompletely understood. Herein, we perform the first genome-wide association study in Behcet's disease
Quantitative Evidence for the Use of Simulation and Randomization in the Introductory Statistics Course
The use of simulation and randomization in the introductory statistics course is gaining popularity, but what evidence is there that these approaches are improving students’ conceptual understanding and attitudes as we hope? In this talk I will discuss evidence from early full-length versions of such a curriculum, covering issues such as (a) items and scales showing improved conceptual performance compared to traditional curriculum, (b) transferability of findings to different institutions, (c) retention of conceptual understanding post-course and (d) student attitudes. Along the way I will discuss a few areas in which students in both simulation/randomization courses and the traditional course still perform poorly on standardized assessments
QUANTITATIVE EVIDENCE FOR THE USE SIMULATION AND RANDOMIZATION IN THE INTRODUCTORY STATISTICS COURSE
Recommended from our members
EMR-linked GWAS study: investigation of variation landscape of loci for body mass index in children
Common variations at the loci harboring the fat mass and obesity gene (FTO), MC4R, and TMEM18 are consistently reported as being associated with obesity and body mass index (BMI) especially in adult population. In order to confirm this effect in pediatric population five European ancestry cohorts from pediatric eMERGE-II network (CCHMC-BCH) were evaluated. Method: Data on 5049 samples of European ancestry were obtained from the Electronic Medical Records (EMRs) of two large academic centers in five different genotyped cohorts. For all available samples, gender, age, height, and weight were collected and BMI was calculated. To account for age and sex differences in BMI, BMI z-scores were generated using 2000 Centers of Disease Control and Prevention (CDC) growth charts. A Genome-wide association study (GWAS) was performed with BMI z-score. After removing missing data and outliers based on principal components (PC) analyses, 2860 samples were used for the GWAS study. The association between each single nucleotide polymorphism (SNP) and BMI was tested using linear regression adjusting for age, gender, and PC by cohort. The effects of SNPs were modeled assuming additive, recessive, and dominant effects of the minor allele. Meta-analysis was conducted using a weighted z-score approach. Results: The mean age of subjects was 9.8 years (range 2–19). The proportion of male subjects was 56%. In these cohorts, 14% of samples had a BMI ≥95 and 28 ≥ 85%. Meta analyses produced a signal at 16q12 genomic region with the best result of p = 1.43 × 10-7 [p(rec) = 7.34 × 10-8) for the SNP rs8050136 at the first intron of FTO gene (z = 5.26) and with no heterogeneity between cohorts (p = 0.77). Under a recessive model, another published SNP at this locus, rs1421085, generates the best result [z = 5.782, p(rec) = 8.21 × 10-9]. Imputation in this region using dense 1000-Genome and Hapmap CEU samples revealed 71 SNPs with p < 10-6, all at the first intron of FTO locus. When hetero-geneity was permitted between cohorts, signals were also obtained in other previously identified loci, including MC4R (rs12964056, p = 6.87 × 10-7, z = -4.98), cholecystokinin CCK (rs8192472, p = 1.33 × 10-6, z = -4.85), Interleukin 15 (rs2099884, p = 1.27 × 10-5, z = 4.34), low density lipoprotein receptor-related protein 1B [LRP1B (rs7583748, p = 0.00013, z = -3.81)] and near transmembrane protein 18 (TMEM18) (rs7561317, p = 0.001, z = -3.17). We also detected a novel locus at chromosome 3 at COL6A5 [best SNP = rs1542829, minor allele frequency (MAF) of 5% p = 4.35 × 10-9, z = 5.89]. Conclusion: An EMR linked cohort study demonstrates that the BMI-Z measurements can be successfully extracted and linked to genomic data with meaningful confirmatory results. We verified the high prevalence of childhood rate of overweight and obesity in our cohort (28%). In addition, our data indicate that genetic variants in the first intron of FTO, a known adult genetic risk factor for BMI, are also robustly associated with BMI in pediatric population
Recommended from our members
Phenome-wide association study (PheWAS) in EMR-linked pediatric cohorts, genetically links PLCL1 to speech language development and IL5-IL13 to Eosinophilic Esophagitis
Objective: We report the first pediatric specific Phenome-Wide Association Study (PheWAS) using electronic medical records (EMRs). Given the early success of PheWAS in adult populations, we investigated the feasibility of this approach in pediatric cohorts in which associations between a previously known genetic variant and a wide range of clinical or physiological traits were evaluated. Although computationally intensive, this approach has potential to reveal disease mechanistic relationships between a variant and a network of phenotypes. Method: Data on 5049 samples of European ancestry were obtained from the EMRs of two large academic centers in five different genotyped cohorts. Recently, these samples have undergone whole genome imputation. After standard quality controls, removing missing data and outliers based on principal components analyses (PCA), 4268 samples were used for the PheWAS study. We scanned for associations between 2476 single-nucleotide polymorphisms (SNP) with available genotyping data from previously published GWAS studies and 539 EMR-derived phenotypes. The false discovery rate was calculated and, for any new PheWAS findings, a permutation approach (with up to 1,000,000 trials) was implemented. Results: This PheWAS found a variety of common variants (MAF > 10%) with prior GWAS associations in our pediatric cohorts including Juvenile Rheumatoid Arthritis (JRA), Asthma, Autism and Pervasive Developmental Disorder (PDD) and Type 1 Diabetes with a false discovery rate < 0.05 and power of study above 80%. In addition, several new PheWAS findings were identified including a cluster of association near the NDFIP1 gene for mental retardation (best SNP rs10057309, p = 4.33 × 10−7, OR = 1.70, 95%CI = 1.38 − 2.09); association near PLCL1 gene for developmental delays and speech disorder [best SNP rs1595825, p = 1.13 × 10−8, OR = 0.65(0.57 − 0.76)]; a cluster of associations in the IL5-IL13 region with Eosinophilic Esophagitis (EoE) [best at rs12653750, p = 3.03 × 10−9, OR = 1.73 95%CI = (1.44 − 2.07)], previously implicated in asthma, allergy, and eosinophilia; and association of variants in GCKR and JAZF1 with allergic rhinitis in our pediatric cohorts [best SNP rs780093, p = 2.18 × 10−5, OR = 1.39, 95%CI = (1.19 − 1.61)], previously demonstrated in metabolic disease and diabetes in adults. Conclusion: The PheWAS approach with re-mapping ICD-9 structured codes for our European-origin pediatric cohorts, as with the previous adult studies, finds many previously reported associations as well as presents the discovery of associations with potentially important clinical implications
Evaluation of the feasibility, diagnostic yield, and clinical utility of rapid genome sequencing in infantile epilepsy (Gene-STEPS): an international, multicentre, pilot cohort study
BACKGROUND: Most neonatal and infantile-onset epilepsies have presumed genetic aetiologies, and early genetic diagnoses have the potential to inform clinical management and improve outcomes. We therefore aimed to determine the feasibility, diagnostic yield, and clinical utility of rapid genome sequencing in this population. METHODS: We conducted an international, multicentre, cohort study (Gene-STEPS), which is a pilot study of the International Precision Child Health Partnership (IPCHiP). IPCHiP is a consortium of four paediatric centres with tertiary-level subspecialty services in Australia, Canada, the UK, and the USA. We recruited infants with new-onset epilepsy or complex febrile seizures from IPCHiP centres, who were younger than 12 months at seizure onset. We excluded infants with simple febrile seizures, acute provoked seizures, known acquired cause, or known genetic cause. Blood samples were collected from probands and available biological parents. Clinical data were collected from medical records, treating clinicians, and parents. Trio genome sequencing was done when both parents were available, and duo or singleton genome sequencing was done when one or neither parent was available. Site-specific protocols were used for DNA extraction and library preparation. Rapid genome sequencing and analysis was done at clinically accredited laboratories, and results were returned to families. We analysed summary statistics for cohort demographic and clinical characteristics and the timing, diagnostic yield, and clinical impact of rapid genome sequencing. FINDINGS: Between Sept 1, 2021, and Aug 31, 2022, we enrolled 100 infants with new-onset epilepsy, of whom 41 (41%) were girls and 59 (59%) were boys. Median age of seizure onset was 128 days (IQR 46-192). For 43 (43% [binomial distribution 95% CI 33-53]) of 100 infants, we identified genetic diagnoses, with a median time from seizure onset to rapid genome sequencing result of 37 days (IQR 25-59). Genetic diagnosis was associated with neonatal seizure onset versus infantile seizure onset (14 [74%] of 19 vs 29 [36%] of 81; p=0·0027), referral setting (12 [71%] of 17 for intensive care, 19 [44%] of 43 non-intensive care inpatient, and 12 [28%] of 40 outpatient; p=0·0178), and epilepsy syndrome (13 [87%] of 15 for self-limited epilepsies, 18 [35%] of 51 for developmental and epileptic encephalopathies, 12 [35%] of 34 for other syndromes; p=0·001). Rapid genome sequencing revealed genetic heterogeneity, with 34 unique genes or genomic regions implicated. Genetic diagnoses had immediate clinical utility, informing treatment (24 [56%] of 43), additional evaluation (28 [65%]), prognosis (37 [86%]), and recurrence risk counselling (all cases). INTERPRETATION: Our findings support the feasibility of implementation of rapid genome sequencing in the clinical care of infants with new-onset epilepsy. Longitudinal follow-up is needed to further assess the role of rapid genetic diagnosis in improving clinical, quality-of-life, and economic outcomes. FUNDING: American Academy of Pediatrics, Boston Children's Hospital Children's Rare Disease Cohorts Initiative, Canadian Institutes of Health Research, Epilepsy Canada, Feiga Bresver Academic Foundation, Great Ormond Street Hospital Charity, Medical Research Council, Murdoch Children's Research Institute, National Institute of Child Health and Human Development, National Institute for Health and Care Research Great Ormond Street Hospital Biomedical Research Centre, One8 Foundation, Ontario Brain Institute, Robinson Family Initiative for Transformational Research, The Royal Children's Hospital Foundation, University of Toronto McLaughlin Centre
- …