117 research outputs found
Machine Learning Models for Educational Platforms
Scaling up education online and onlife is presenting numerous key challenges, such as hardly manageable classes, overwhelming content alternatives, and academic dishonesty while interacting remotely. However, thanks to the wider availability of learning-related data and increasingly higher performance computing, Artificial Intelligence has the potential to turn such challenges into an unparalleled opportunity. One of its sub-fields, namely Machine Learning, is enabling machines to receive data and learn for themselves, without being programmed with rules. Bringing this intelligent support to education at large scale has a number of advantages, such as avoiding manual error-prone tasks and reducing the chance that learners do any misconduct. Planning, collecting, developing, and predicting become essential steps to make it concrete into real-world education.
This thesis deals with the design, implementation, and evaluation of Machine Learning models in the context of online educational platforms deployed at large scale. Constructing and assessing the performance of intelligent models is a crucial step towards increasing reliability and convenience of such an educational medium. The contributions result in large data sets and high-performing models that capitalize on Natural Language Processing, Human Behavior Mining, and Machine Perception. The model decisions aim to support stakeholders over the instructional pipeline, specifically on content categorization, content recommendation, learners’ identity verification, and learners’ sentiment analysis. Past research in this field often relied on statistical processes hardly applicable at large scale. Through our studies, we explore opportunities and challenges introduced by Machine Learning for the above goals, a relevant and timely topic in literature.
Supported by extensive experiments, our work reveals a clear opportunity in combining human and machine sensing for researchers interested in online education. Our findings illustrate the feasibility of designing and assessing Machine Learning models for categorization, recommendation, authentication, and sentiment prediction in this research area. Our results provide guidelines on model motivation, data collection, model design, and analysis techniques concerning the above applicative scenarios. Researchers can use our findings to improve data collection on educational platforms, to reduce bias in data and models, to increase model effectiveness, and to increase the reliability of their models, among others. We expect that this thesis can support the adoption of Machine Learning models in educational platforms even more, strengthening the role of data as a precious asset. The thesis outputs are publicly available at https://www.mirkomarras.com
Evaluation framework for context-aware speaker recognition in noisy smart living environments
The integration of voice control into connected devices is expected to improve the efficiency and comfort of our daily lives. However, the underlying biometric systems often impose constraints on the individual or the environment during interaction (e.g., quiet surroundings). Such constraints have to be surmounted in order to seamlessly recognize individuals. In this paper, we propose an evaluation framework for speaker recognition in noisy smart living environments. To this end, we designed a taxonomy of sounds (e.g., home-related, mechanical) that characterize representative indoor and outdoor environments where speaker recognition is adopted. Then, we devised an approach for off-line simulation of challenging noisy conditions in vocal audios originally collected under controlled environments, by leveraging our taxonomy. Our approach adds a (combination of) sound(s) belonging to the target environment into the current vocal example. Experiments on a large-scale public dataset and two state-of-the-art speaker recognition models show that adding certain background sounds to clean vocal audio leads to a substantial deterioration of recognition performance. In several noisy settings, our findings reveal that a speaker recognition model might end up to make unreliable decisions. Our framework is intended to help system designers evaluate performance deterioration and develop speaker recognition models more robust to smart living environments
The More Secure, The Less Equally Usable: Gender and Ethnicity (Un)fairness of Deep Face Recognition along Security Thresholds
Face biometrics are playing a key role in making modern smart city
applications more secure and usable. Commonly, the recognition threshold of a
face recognition system is adjusted based on the degree of security for the
considered use case. The likelihood of a match can be for instance decreased by
setting a high threshold in case of a payment transaction verification. Prior
work in face recognition has unfortunately showed that error rates are usually
higher for certain demographic groups. These disparities have hence brought
into question the fairness of systems empowered with face biometrics. In this
paper, we investigate the extent to which disparities among demographic groups
change under different security levels. Our analysis includes ten face
recognition models, three security thresholds, and six demographic groups based
on gender and ethnicity. Experiments show that the higher the security of the
system is, the higher the disparities in usability among demographic groups
are. Compelling unfairness issues hence exist and urge countermeasures in
real-world high-stakes environments requiring severe security levels.Comment: Accepted as a full paper at the 2nd International Workshop on
Artificial Intelligence Methods for Smart Cities (AISC 2022
Combining mitigation treatments against biases in personalized rankings: Use case on item popularity
Historical interactions leveraged by recommender systems are often non-uniformly distributed across items. Though they are of interest for consumers, certain items end up therefore being biasedly under-recommended. Existing treatments for mitigating these biases act at a single step of the pipeline (either pre-, in-, or post-processing), and it remains unanswered whether simultaneously introducing treatments throughout the pipeline leads to a better mitigation. In this paper, we analyze the impact of bias treatments along the steps of the pipeline under a use case on popularity bias. Experiments show that, with small losses in accuracy, the combination of treatments leads to better trade-offs than treatments applied separately. Our findings call for treatments rooting out bias at different steps simultaneously
Interplay between upsampling and regularization for provider fairness in recommender systems
Considering the impact of recommendations on item providers is one of the duties of multi-sided recommender systems. Item providers are key stakeholders in online platforms, and their earnings and plans are influenced by the exposure their items receive in recommended lists. Prior work showed that certain minority groups of providers, characterized by a common sensitive attribute (e.g., gender or race), are being disproportionately affected by indirect and unintentional discrimination. Our study in this paper handles a situation where (i) the same provider is associated with multiple items of a list suggested to a user, (ii) an item is created by more than one provider jointly, and (iii) predicted user–item relevance scores are biasedly estimated for items of provider groups. Under this scenario, we assess disparities in relevance, visibility, and exposure, by simulating diverse representations of the minority group in the catalog and the interactions. Based on emerged unfair outcomes, we devise a treatment that combines observation upsampling and loss regularization, while learning user–item relevance scores. Experiments on real-world data demonstrate that our treatment leads to lower disparate relevance. The resulting recommended lists show fairer visibility and exposure, higher minority item coverage, and negligible loss in recommendation utility
Early Prediction of Conceptual Understanding in Interactive Simulations
Interactive simulations allow students to independently explore scientific phenomena and ideally infer the underlying principles through their exploration. Effectively using such environments is challenging for many students and therefore, adaptive guidance has the potential to improve student learning. Providing effective support is, however, also a challenge because it is not clear how effective inquiry in such environments looks like. Previous research in this area has mostly focused on grouping students with similar strategies or identifying learning strategies through sequence mining. In this paper, we investigate features and models for an early prediction of conceptual understanding based on clickstream data of students using an interactive Physics simulation. To this end, we measure students’ conceptual understanding through a task they need to solve through their exploration. Then, we propose a novel pipeline to transform clickstream data into predictive features, using latent feature representations and interaction frequency vectors for different components of the environment. Our results on interaction data from 192 undergraduate students show that the proposed approach is able to detect struggling students early on
Fair Voice Biometrics: Impact of Demographic Imbalance on Group Fairness in Speaker Recognition
Speaker recognition systems are playing a key role in modern online applications. Though the susceptibility of these systems to discrimination according to group fairness metrics has been recently studied, their assessment has been mainly focused on the difference in equal error rate across groups, not accounting for other fairness criteria important in anti-discrimination policies, defined for demographic groups characterized by sensitive attributes. In this paper, we therefore study how existing group fairness metrics relate with the balancing settings of the training data set in speaker recognition. We conduct this analysis by operationalizing several definitions of fairness and monitoring them under varied data balancing settings. Experiments performed on three deep neural architectures, evaluated on a data set including gender/age-based groups, show that balancing group representation positively impacts on fairness and that the friction across security, usability, and fairness depends on the fairness metric and the recognition threshold
Trusting the Explainers: Teacher Validation of Explainable Artificial Intelligence for Course Design
Deep learning models for learning analytics have become increasingly popular
over the last few years; however, these approaches are still not widely adopted
in real-world settings, likely due to a lack of trust and transparency. In this
paper, we tackle this issue by implementing explainable AI methods for
black-box neural networks. This work focuses on the context of online and
blended learning and the use case of student success prediction models. We use
a pairwise study design, enabling us to investigate controlled differences
between pairs of courses. Our analyses cover five course pairs that differ in
one educationally relevant aspect and two popular instance-based explainable AI
methods (LIME and SHAP). We quantitatively compare the distances between the
explanations across courses and methods. We then validate the explanations of
LIME and SHAP with 26 semi-structured interviews of university-level educators
regarding which features they believe contribute most to student success, which
explanations they trust most, and how they could transform these insights into
actionable course design decisions. Our results show that quantitatively,
explainers significantly disagree with each other about what is important, and
qualitatively, experts themselves do not agree on which explanations are most
trustworthy. All code, extended results, and the interview protocol are
provided at https://github.com/epfl-ml4ed/trusting-explainers.Comment: Accepted as a full paper at LAK 2023: The 13th International Learning
Analytics and Knowledge Conference, March 13-17, 2023, Arlington, Texas, US
Can Feature Predictive Power Generalize? Benchmarking Early Predictors of Student Success across Flipped and Online Courses
Early predictors of student success are becoming a key tool in flipped and online courses to ensure that no student is left behind along course activities. However, with an increased interest in this area, it has become hard to keep track of what the state of the art in early success prediction is. Moreover, prior work on early success prediction based on clickstreams has mostly focused on implementing features and models for a specific online course (e.g., a MOOC). It remains therefore under-explored how different features and models enable early predictions, based on the domain, structure, and educational setting of a given course. In this paper, we report the results of a systematic analysis of early success predictors for both flipped and online courses. In the first part, we focus on a specific flipped course. Specifically, we investigate eight feature sets, presented at top-level educational venues over the last few years, and a novel feature set proposed in this paper and tailored to this setting. We benchmark the performance of these feature sets using a RF classifier, and we provide and discuss an ensemble feature set optimized for the target flipped course. In the second part, we extend our analysis to courses with different educational settings (i.e., MOOCs), domains, and structure. Our results show that (i) the ensemble of optimal features varies depending on the course setting and structure, and (ii) the predictive performance of the optimal e
- …