Building Robust AI Systems: Addressing Uncertainty, Data Noise and Scarcity in Modern Machine Learning

Li, Changbin 1990-

thesistext

oai:utd-ir.tdl.org:10735.1/10298

Building Robust AI Systems: Addressing Uncertainty, Data Noise and Scarcity in Modern Machine Learning

Authors: Changbin 1990- Li
Publication date: 10 March 2025
Publisher

Abstract

The increasing complexity of real-world machine learning applications, such as autonomous systems, medical diagnostics, and natural language processing, demands models that can operate reliably in environments characterized by uncertainty, data scarcity, and noisy or ambiguous inputs. Traditional machine learning approaches, which rely on large, clean, well-labeled datasets, often fail when faced with ambiguous inputs, limited labeled data, or abundant but unlabeled data, leading to unreliable predictions and poor generalization. This thesis addresses these critical challenges by developing robust learning frameworks that enhance the uncertainty handling, adaptability, reliability, and performance of models in such challenging environments. First, the Hyper-Evidential Neural Network (HENN) is introduced to model vagueness uncertainty in classification tasks with composite class labels. By leveraging Subjective Logic and Dirichlet distributions, HENN quantifies uncertainty and improves decision-making in ambiguous data scenarios, such as medical diagnostics. Second, NestedMAML, a nested bi-level optimization framework, is proposed to improve robustness in corrupted few-shot learning with noisy and out-of-distribution tasks or instances. By weighting tasks and instances during meta-training, NestedMAML reduces the influence of noisy or irrelevant tasks and instances, improving robustness to distributional shifts and label noise during meta-training, ensuring better generalization. Third, a semi-supervised meta- learning framework, Platinum, is presented to leverage Submodular Mutual Information (SMI) functions to select the most informative unlabeled data during meta-training in inner and outer loops, ensuring that the model can leverage large amounts of unlabeled data while minimizing the impact of noise, leading to better generalization in diverse tasks, even when only a few labeled examples are available. These contributions provide a comprehensive approach to tackle vagueness uncertainty, data scarcity, and noisy inputs in machine learning, advancing methods for robust and adaptive learning in real-world applications

Similar works

Full text

Treasures @ UT Dallas

oai:utd-ir.tdl.org:10735.1/102...

Last time updated on 26/04/2025

This paper was published in Treasures @ UT Dallas.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.