379 research outputs found
Dropout Model Evaluation in MOOCs
The field of learning analytics needs to adopt a more rigorous approach for
predictive model evaluation that matches the complex practice of
model-building. In this work, we present a procedure to statistically test
hypotheses about model performance which goes beyond the state-of-the-practice
in the community to analyze both algorithms and feature extraction methods from
raw data. We apply this method to a series of algorithms and feature sets
derived from a large sample of Massive Open Online Courses (MOOCs). While a
complete comparison of all potential modeling approaches is beyond the scope of
this paper, we show that this approach reveals a large gap in dropout
prediction performance between forum-, assignment-, and clickstream-based
feature extraction methods, where the latter is significantly better than the
former two, which are in turn indistinguishable from one another. This work has
methodological implications for evaluating predictive or AI-based models of
student success, and practical implications for the design and targeting of
at-risk student models and interventions
Subgroup Robustness Grows On Trees: An Empirical Baseline Investigation
Researchers have proposed many methods for fair and robust machine learning,
but comprehensive empirical evaluation of their subgroup robustness is lacking.
In this work, we address this gap in the context of tabular data, where
sensitive subgroups are clearly-defined, real-world fairness problems abound,
and prior works often do not compare to state-of-the-art tree-based models as
baselines. We conduct an empirical comparison of several previously-proposed
methods for fair and robust learning alongside state-of-the-art tree-based
methods and other baselines. Via experiments with more than model
configurations on eight datasets, we show that tree-based methods have strong
subgroup robustness, even when compared to robustness- and fairness-enhancing
methods. Moreover, the best tree-based models tend to show good performance
over a range of metrics, while robust or group-fair models can show
brittleness, with significant performance differences across different metrics
for a fixed model. We also demonstrate that tree-based models show less
sensitivity to hyperparameter configurations, and are less costly to train. Our
work suggests that tree-based ensemble models make an effective baseline for
tabular data, and are a sensible default when subgroup robustness is desired.
For associated code and detailed results, see
https://github.com/jpgard/subgroup-robustness-grows-on-trees .Comment: To appear at Neural Information Processing Systems (NeurIPS) 2022.
Code at https://github.com/jpgard/subgroup-robustness-grows-on-tree
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
We introduce VisIT-Bench (Visual InsTruction Benchmark), a benchmark for
evaluation of instruction-following vision-language models for real-world use.
Our starting point is curating 70 'instruction families' that we envision
instruction tuned vision-language models should be able to address. Extending
beyond evaluations like VQAv2 and COCO, tasks range from basic recognition to
game playing and creative generation. Following curation, our dataset comprises
592 test queries, each with a human-authored instruction-conditioned caption.
These descriptions surface instruction-specific factors, e.g., for an
instruction asking about the accessibility of a storefront for wheelchair
users, the instruction-conditioned caption describes ramps/potential obstacles.
These descriptions enable 1) collecting human-verified reference outputs for
each instance; and 2) automatic evaluation of candidate multimodal generations
using a text-only LLM, aligning with human judgment. We quantify quality gaps
between models and references using both human and automatic evaluations; e.g.,
the top-performing instruction-following model wins against the GPT-4 reference
in just 27% of the comparison. VisIT-Bench is dynamic to participate,
practitioners simply submit their model's response on the project website;
Data, code and leaderboard is available at visit-bench.github.io
Influence of diabetes on ambulation and inflammation in men and women with symptomatic peripheral artery disease
AbstractObjectiveTo determine whether diabetes and sex were factors associated with ambulatory function, endothelial cell inflammation, oxidative stress, and apoptosis, and with circulating biomarkers of inflammation and antioxidant capacity in patients with peripheral artery disease (PAD) and claudication.Materials/MethodsAmbulatory function of 180 symptomatic men and women with PAD was assessed during a graded maximal treadmill test, 6-minute walk test, and 4-meter walk test. Patients were further characterized on endothelial effects of circulating factors present in the sera using a cell culture-based bioassay on primary human arterial endothelial cells, and on circulating inflammatory and vascular biomarkers.ResultsMen and women with diabetes had greater prevalence (pâ=â0.007 and pâ=â0.015, respectively) of coronary artery disease (CAD) than patients without diabetes. To assure that this difference did not influence planned comparisons, the data set was stratified on CAD. Diabetic men with CAD had a lower peak walking time (PWT) during the treadmill test and a slower 4-meter gait speed compared to non-diabetic men with CAD (pâ<â0.05). Diabetic women with CAD had a lower PWT compared to their non-diabetic counterparts (pâ<â0.01). Additionally, diabetic men with CAD had higher pigment epithelium-derived factor (pâ<â0.05) than their non-diabetic counterparts, and diabetic women with CAD had higher leptin (pâ<â0.01) and interleukin-8 levels (pâ<â0.05).ConclusionsIn patients with PAD, diabetic men and women with CAD had more severe claudication than their non-diabetic counterparts, as measured by shorter PWT, and the men had further ambulatory impairment manifested by slower 4-meter gait speed. Furthermore, the diabetic patients with CAD had elevations in interleukin-8, leptin, and PEDF
Oral abstracts 3: RA Treatment and outcomesO13.âValidation of jadas in all subtypes of juvenile idiopathic arthritis in a clinical setting
Background: Juvenile Arthritis Disease Activity Score (JADAS) is a 4 variable composite disease activity (DA) score for JIA (including active 10, 27 or 71 joint count (AJC), physician global (PGA), parent/child global (PGE) and ESR). The validity of JADAS for all ILAR subtypes in the routine clinical setting is unknown. We investigated the construct validity of JADAS in the clinical setting in all subtypes of JIA through application to a prospective inception cohort of UK children presenting with new onset inflammatory arthritis. Methods: JADAS 10, 27 and 71 were determined for all children in the Childhood Arthritis Prospective Study (CAPS) with complete data available at baseline. Correlation of JADAS 10, 27 and 71 with single DA markers was determined for all subtypes. All correlations were calculated using Spearman's rank statistic. Results: 262/1238 visits had sufficient data for calculation of JADAS (1028 (83%) AJC, 744 (60%) PGA, 843 (68%) PGE and 459 (37%) ESR). Median age at disease onset was 6.0 years (IQR 2.6-10.4) and 64% were female. Correlation between JADAS 10, 27 and 71 approached 1 for all subtypes. Median JADAS 71 was 5.3 (IQR 2.2-10.1) with a significant difference between median JADAS scores between subtypes (p < 0.01). Correlation of JADAS 71 with each single marker of DA was moderate to high in the total cohort (see Table 1). Overall, correlation with AJC, PGA and PGE was moderate to high and correlation with ESR, limited JC, parental pain and CHAQ was low to moderate in the individual subtypes. Correlation coefficients in the extended oligoarticular, rheumatoid factor negative and enthesitis related subtypes were interpreted with caution in view of low numbers. Conclusions: This study adds to the body of evidence supporting the construct validity of JADAS. JADAS correlates with other measures of DA in all ILAR subtypes in the routine clinical setting. Given the high frequency of missing ESR data, it would be useful to assess the validity of JADAS without inclusion of the ESR. Disclosure statement: All authors have declared no conflicts of interest. Table 1Spearman's correlation between JADAS 71 and single markers DA by ILAR subtype ILAR Subtype Systemic onset JIA Persistent oligo JIA Extended oligo JIA Rheumatoid factor neg JIA Rheumatoid factor pos JIA Enthesitis related JIA Psoriatic JIA Undifferentiated JIA Unknown subtype Total cohort Number of children 23 111 12 57 7 9 19 7 17 262 AJC 0.54 0.67 0.53 0.75 0.53 0.34 0.59 0.81 0.37 0.59 PGA 0.63 0.69 0.25 0.73 0.14 0.05 0.50 0.83 0.56 0.64 PGE 0.51 0.68 0.83 0.61 0.41 0.69 0.71 0.9 0.48 0.61 ESR 0.28 0.31 0.35 0.4 0.6 0.85 0.43 0.7 0.5 0.53 Limited 71 JC 0.29 0.51 0.23 0.37 0.14 -0.12 0.4 0.81 0.45 0.41 Parental pain 0.23 0.62 0.03 0.57 0.41 0.69 0.7 0.79 0.42 0.53 Childhood health assessment questionnaire 0.25 0.57 -0.07 0.36 -0.47 0.84 0.37 0.8 0.66 0.4
Recommended from our members
Comprehensive molecular characterization of gastric adenocarcinoma
Gastric cancer is a leading cause of cancer deaths, but analysis of its molecular and clinical characteristics has been complicated by histological and aetiological heterogeneity. Here we describe a comprehensive molecular evaluation of 295 primary gastric adenocarcinomas as part of The Cancer Genome Atlas (TCGA) project. We propose a molecular classification dividing gastric cancer into four subtypes: tumours positive for EpsteinâBarr virus, which display recurrent PIK3CA mutations, extreme DNA hypermethylation, and amplification of JAK2, CD274 (also known as PD-L1) and PDCD1LG2 (also knownasPD-L2); microsatellite unstable tumours, which show elevated mutation rates, including mutations of genes encoding targetable oncogenic signalling proteins; genomically stable tumours, which are enriched for the diffuse histological variant and mutations of RHOA or fusions involving RHO-family GTPase-activating proteins; and tumours with chromosomal instability, which show marked aneuploidy and focal amplification of receptor tyrosine kinases. Identification of these subtypes provides a roadmap for patient stratification and trials of targeted therapies
- âŠ