3 research outputs found

    Detection of data drift and outliers affecting machine learning model performance over time

    Full text link
    A trained ML model is deployed on another `test' dataset where target feature values (labels) are unknown. Drift is distribution change between the training and deployment data, which is concerning if model performance changes. For a cat/dog image classifier, for instance, drift during deployment could be rabbit images (new class) or cat/dog images with changed characteristics (change in distribution). We wish to detect these changes but can't measure accuracy without deployment data labels. We instead detect drift indirectly by nonparametrically testing the distribution of model prediction confidence for changes. This generalizes our method and sidesteps domain-specific feature representation. We address important statistical issues, particularly Type-1 error control in sequential testing, using Change Point Models (CPMs; see Adams and Ross 2012). We also use nonparametric outlier methods to show the user suspicious observations for model diagnosis, since the before/after change confidence distributions overlap significantly. In experiments to demonstrate robustness, we train on a subset of MNIST digit classes, then insert drift (e.g., unseen digit class) in deployment data in various settings (gradual/sudden changes in the drift proportion). A novel loss function is introduced to compare the performance (detection delay, Type-1 and 2 errors) of a drift detector under different levels of drift class contamination.Comment: In: JSM Proceedings, Nonparametric Statistics Section, 20202. Philadelphia, PA: American Statistical Association. 144--16

    Classifier Data Quality: A Geometric Complexity Based Method for Automated Baseline And Insights Generation

    Full text link
    Testing Machine Learning (ML) models and AI-Infused Applications (AIIAs), or systems that contain ML models, is highly challenging. In addition to the challenges of testing classical software, it is acceptable and expected that statistical ML models sometimes output incorrect results. A major challenge is to determine when the level of incorrectness, e.g., model accuracy or F1 score for classifiers, is acceptable and when it is not. In addition to business requirements that should provide a threshold, it is a best practice to require any proposed ML solution to out-perform simple baseline models, such as a decision tree. We have developed complexity measures, which quantify how difficult given observations are to assign to their true class label; these measures can then be used to automatically determine a baseline performance threshold. These measures are superior to the best practice baseline in that, for a linear computation cost, they also quantify each observation' classification complexity in an explainable form, regardless of the classifier model used. Our experiments with both numeric synthetic data and real natural language chatbot data demonstrate that the complexity measures effectively highlight data regions and observations that are likely to be misclassified.Comment: Accepted to EDSMLS workshop at AAAI conferenc

    Exact Solution of the Elastica with Transverse Shear Effects

    No full text
    A new derivation of Euler's Elastica with transverse shear effects included is presented. The elastic potential energy of bending and transverse shear is set up. The work of the axial compression force is determined. The equation of equilibrium is derived using the variation of the total potential. Using substitution of variables an exact solution is derived. The equation is transcendental and does not have a closed form solution. It is evaluated in a dimensionless form by using a numerical procedure. Finally, numerical examples of laminates made of composite material (fiber reinforced) and sandwich panels are provided
    corecore