379 research outputs found

    Dropout Model Evaluation in MOOCs

    Full text link
    The field of learning analytics needs to adopt a more rigorous approach for predictive model evaluation that matches the complex practice of model-building. In this work, we present a procedure to statistically test hypotheses about model performance which goes beyond the state-of-the-practice in the community to analyze both algorithms and feature extraction methods from raw data. We apply this method to a series of algorithms and feature sets derived from a large sample of Massive Open Online Courses (MOOCs). While a complete comparison of all potential modeling approaches is beyond the scope of this paper, we show that this approach reveals a large gap in dropout prediction performance between forum-, assignment-, and clickstream-based feature extraction methods, where the latter is significantly better than the former two, which are in turn indistinguishable from one another. This work has methodological implications for evaluating predictive or AI-based models of student success, and practical implications for the design and targeting of at-risk student models and interventions

    Subgroup Robustness Grows On Trees: An Empirical Baseline Investigation

    Full text link
    Researchers have proposed many methods for fair and robust machine learning, but comprehensive empirical evaluation of their subgroup robustness is lacking. In this work, we address this gap in the context of tabular data, where sensitive subgroups are clearly-defined, real-world fairness problems abound, and prior works often do not compare to state-of-the-art tree-based models as baselines. We conduct an empirical comparison of several previously-proposed methods for fair and robust learning alongside state-of-the-art tree-based methods and other baselines. Via experiments with more than 340,000340{,}000 model configurations on eight datasets, we show that tree-based methods have strong subgroup robustness, even when compared to robustness- and fairness-enhancing methods. Moreover, the best tree-based models tend to show good performance over a range of metrics, while robust or group-fair models can show brittleness, with significant performance differences across different metrics for a fixed model. We also demonstrate that tree-based models show less sensitivity to hyperparameter configurations, and are less costly to train. Our work suggests that tree-based ensemble models make an effective baseline for tabular data, and are a sensible default when subgroup robustness is desired. For associated code and detailed results, see https://github.com/jpgard/subgroup-robustness-grows-on-trees .Comment: To appear at Neural Information Processing Systems (NeurIPS) 2022. Code at https://github.com/jpgard/subgroup-robustness-grows-on-tree

    VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use

    Full text link
    We introduce VisIT-Bench (Visual InsTruction Benchmark), a benchmark for evaluation of instruction-following vision-language models for real-world use. Our starting point is curating 70 'instruction families' that we envision instruction tuned vision-language models should be able to address. Extending beyond evaluations like VQAv2 and COCO, tasks range from basic recognition to game playing and creative generation. Following curation, our dataset comprises 592 test queries, each with a human-authored instruction-conditioned caption. These descriptions surface instruction-specific factors, e.g., for an instruction asking about the accessibility of a storefront for wheelchair users, the instruction-conditioned caption describes ramps/potential obstacles. These descriptions enable 1) collecting human-verified reference outputs for each instance; and 2) automatic evaluation of candidate multimodal generations using a text-only LLM, aligning with human judgment. We quantify quality gaps between models and references using both human and automatic evaluations; e.g., the top-performing instruction-following model wins against the GPT-4 reference in just 27% of the comparison. VisIT-Bench is dynamic to participate, practitioners simply submit their model's response on the project website; Data, code and leaderboard is available at visit-bench.github.io

    Influence of diabetes on ambulation and inflammation in men and women with symptomatic peripheral artery disease

    Get PDF
    AbstractObjectiveTo determine whether diabetes and sex were factors associated with ambulatory function, endothelial cell inflammation, oxidative stress, and apoptosis, and with circulating biomarkers of inflammation and antioxidant capacity in patients with peripheral artery disease (PAD) and claudication.Materials/MethodsAmbulatory function of 180 symptomatic men and women with PAD was assessed during a graded maximal treadmill test, 6-minute walk test, and 4-meter walk test. Patients were further characterized on endothelial effects of circulating factors present in the sera using a cell culture-based bioassay on primary human arterial endothelial cells, and on circulating inflammatory and vascular biomarkers.ResultsMen and women with diabetes had greater prevalence (p = 0.007 and p = 0.015, respectively) of coronary artery disease (CAD) than patients without diabetes. To assure that this difference did not influence planned comparisons, the data set was stratified on CAD. Diabetic men with CAD had a lower peak walking time (PWT) during the treadmill test and a slower 4-meter gait speed compared to non-diabetic men with CAD (p < 0.05). Diabetic women with CAD had a lower PWT compared to their non-diabetic counterparts (p < 0.01). Additionally, diabetic men with CAD had higher pigment epithelium-derived factor (p < 0.05) than their non-diabetic counterparts, and diabetic women with CAD had higher leptin (p < 0.01) and interleukin-8 levels (p < 0.05).ConclusionsIn patients with PAD, diabetic men and women with CAD had more severe claudication than their non-diabetic counterparts, as measured by shorter PWT, and the men had further ambulatory impairment manifested by slower 4-meter gait speed. Furthermore, the diabetic patients with CAD had elevations in interleukin-8, leptin, and PEDF

    Oral abstracts 3: RA Treatment and outcomesO13. Validation of jadas in all subtypes of juvenile idiopathic arthritis in a clinical setting

    Get PDF
    Background: Juvenile Arthritis Disease Activity Score (JADAS) is a 4 variable composite disease activity (DA) score for JIA (including active 10, 27 or 71 joint count (AJC), physician global (PGA), parent/child global (PGE) and ESR). The validity of JADAS for all ILAR subtypes in the routine clinical setting is unknown. We investigated the construct validity of JADAS in the clinical setting in all subtypes of JIA through application to a prospective inception cohort of UK children presenting with new onset inflammatory arthritis. Methods: JADAS 10, 27 and 71 were determined for all children in the Childhood Arthritis Prospective Study (CAPS) with complete data available at baseline. Correlation of JADAS 10, 27 and 71 with single DA markers was determined for all subtypes. All correlations were calculated using Spearman's rank statistic. Results: 262/1238 visits had sufficient data for calculation of JADAS (1028 (83%) AJC, 744 (60%) PGA, 843 (68%) PGE and 459 (37%) ESR). Median age at disease onset was 6.0 years (IQR 2.6-10.4) and 64% were female. Correlation between JADAS 10, 27 and 71 approached 1 for all subtypes. Median JADAS 71 was 5.3 (IQR 2.2-10.1) with a significant difference between median JADAS scores between subtypes (p < 0.01). Correlation of JADAS 71 with each single marker of DA was moderate to high in the total cohort (see Table 1). Overall, correlation with AJC, PGA and PGE was moderate to high and correlation with ESR, limited JC, parental pain and CHAQ was low to moderate in the individual subtypes. Correlation coefficients in the extended oligoarticular, rheumatoid factor negative and enthesitis related subtypes were interpreted with caution in view of low numbers. Conclusions: This study adds to the body of evidence supporting the construct validity of JADAS. JADAS correlates with other measures of DA in all ILAR subtypes in the routine clinical setting. Given the high frequency of missing ESR data, it would be useful to assess the validity of JADAS without inclusion of the ESR. Disclosure statement: All authors have declared no conflicts of interest. Table 1Spearman's correlation between JADAS 71 and single markers DA by ILAR subtype ILAR Subtype Systemic onset JIA Persistent oligo JIA Extended oligo JIA Rheumatoid factor neg JIA Rheumatoid factor pos JIA Enthesitis related JIA Psoriatic JIA Undifferentiated JIA Unknown subtype Total cohort Number of children 23 111 12 57 7 9 19 7 17 262 AJC 0.54 0.67 0.53 0.75 0.53 0.34 0.59 0.81 0.37 0.59 PGA 0.63 0.69 0.25 0.73 0.14 0.05 0.50 0.83 0.56 0.64 PGE 0.51 0.68 0.83 0.61 0.41 0.69 0.71 0.9 0.48 0.61 ESR 0.28 0.31 0.35 0.4 0.6 0.85 0.43 0.7 0.5 0.53 Limited 71 JC 0.29 0.51 0.23 0.37 0.14 -0.12 0.4 0.81 0.45 0.41 Parental pain 0.23 0.62 0.03 0.57 0.41 0.69 0.7 0.79 0.42 0.53 Childhood health assessment questionnaire 0.25 0.57 -0.07 0.36 -0.47 0.84 0.37 0.8 0.66 0.4
    • 

    corecore