3 research outputs found
Detection of data drift and outliers affecting machine learning model performance over time
A trained ML model is deployed on another `test' dataset where target feature
values (labels) are unknown. Drift is distribution change between the training
and deployment data, which is concerning if model performance changes. For a
cat/dog image classifier, for instance, drift during deployment could be rabbit
images (new class) or cat/dog images with changed characteristics (change in
distribution). We wish to detect these changes but can't measure accuracy
without deployment data labels. We instead detect drift indirectly by
nonparametrically testing the distribution of model prediction confidence for
changes. This generalizes our method and sidesteps domain-specific feature
representation.
We address important statistical issues, particularly Type-1 error control in
sequential testing, using Change Point Models (CPMs; see Adams and Ross 2012).
We also use nonparametric outlier methods to show the user suspicious
observations for model diagnosis, since the before/after change confidence
distributions overlap significantly. In experiments to demonstrate robustness,
we train on a subset of MNIST digit classes, then insert drift (e.g., unseen
digit class) in deployment data in various settings (gradual/sudden changes in
the drift proportion). A novel loss function is introduced to compare the
performance (detection delay, Type-1 and 2 errors) of a drift detector under
different levels of drift class contamination.Comment: In: JSM Proceedings, Nonparametric Statistics Section, 20202.
Philadelphia, PA: American Statistical Association. 144--16
Classifier Data Quality: A Geometric Complexity Based Method for Automated Baseline And Insights Generation
Testing Machine Learning (ML) models and AI-Infused Applications (AIIAs), or
systems that contain ML models, is highly challenging. In addition to the
challenges of testing classical software, it is acceptable and expected that
statistical ML models sometimes output incorrect results. A major challenge is
to determine when the level of incorrectness, e.g., model accuracy or F1 score
for classifiers, is acceptable and when it is not. In addition to business
requirements that should provide a threshold, it is a best practice to require
any proposed ML solution to out-perform simple baseline models, such as a
decision tree.
We have developed complexity measures, which quantify how difficult given
observations are to assign to their true class label; these measures can then
be used to automatically determine a baseline performance threshold. These
measures are superior to the best practice baseline in that, for a linear
computation cost, they also quantify each observation' classification
complexity in an explainable form, regardless of the classifier model used. Our
experiments with both numeric synthetic data and real natural language chatbot
data demonstrate that the complexity measures effectively highlight data
regions and observations that are likely to be misclassified.Comment: Accepted to EDSMLS workshop at AAAI conferenc
Exact Solution of the Elastica with Transverse Shear Effects
A new derivation of Euler's Elastica with transverse shear effects included is presented. The elastic potential energy of bending and transverse shear is set up. The work of the axial compression force is determined. The equation of equilibrium is derived using the variation of the total potential. Using substitution of variables an exact solution is derived. The equation is transcendental and does not have a closed form solution. It is evaluated in a dimensionless form by using a numerical procedure. Finally, numerical examples of laminates made of composite material (fiber reinforced) and sandwich panels are provided