16 research outputs found
Towards Lightweight AI: Leveraging Stochasticity, Quantization, and Tensorization for Forecasting
The deep neural network is an intriguing prognostic model capable of learning meaningful patterns that generalize to new data. The deep learning paradigm has been widely adopted across many domains, including for natural language processing, genomics, and automatic music transcription. However, deep neural networks rely on a plethora of underlying computational units and data, collectively demanding a wealth of compute and memory resources for practical tasks. This model complexity prohibits the use of larger deep neural networks for resource-critical applications, such as edge computing. In order to reduce model complexity, several research groups are actively studying compression methods, hardware accelerators, and alternative computing paradigms. These orthogonal research explorations often leave a gap in understanding the interplay of the optimization mechanisms and their overall feasibility for a given task.
In this thesis, we address this gap by developing a holistic solution to assess the model complexity reduction theoretically and quantitatively at both high-level and low-level abstractions for training and inference. At the algorithmic level, a novel deep, yet lightweight, recurrent architecture is proposed that extends the conventional echo state network. The architecture employs random dynamics, brain-inspired plasticity mechanisms, tensor decomposition, and hierarchy as the key features to enrich learning. Furthermore, the hyperparameter landscape is optimized via a particle swarm optimization algorithm. To deploy these networks efficiently onto low-end edge devices, both ultra-low and mixed-precision numerical formats are studied within our feedforward deep neural network hardware accelerator. More importantly, the tapered-precision posit format with a novel exact-dot-product algorithm is employed in the low-level digital architectures to study its efficacy in resource utilization.
The dynamics of the architecture are characterized through neuronal partitioning and Lyapunov stability, and we show that superlative networks emerge beyond the edge of chaos with an agglomeration of weak learners. We also demonstrate that tensorization improves model performance by preserving correlations present in multi-way structures. Low-precision posits are found to consistently outperform other formats on various image classification tasks and, in conjunction with compression, we achieve magnitudes of speedup and memory savings for both training and inference for the forecasting of chaotic time series and polyphonic music tasks. This culmination of methods greatly improves the feasibility of deploying rich predictive models on edge devices
On the Objective Evaluation of Post Hoc Explainers
Many applications of data-driven models demand transparency of decisions,
especially in health care, criminal justice, and other high-stakes
environments. Modern trends in machine learning research have led to algorithms
that are increasingly intricate to the degree that they are considered to be
black boxes. In an effort to reduce the opacity of decisions, methods have been
proposed to construe the inner workings of such models in a
human-comprehensible manner. These post hoc techniques are described as being
universal explainers - capable of faithfully augmenting decisions with
algorithmic insight. Unfortunately, there is little agreement about what
constitutes a "good" explanation. Moreover, current methods of explanation
evaluation are derived from either subjective or proxy means. In this work, we
propose a framework for the evaluation of post hoc explainers on ground truth
that is directly derived from the additive structure of a model. We demonstrate
the efficacy of the framework in understanding explainers by evaluating popular
explainers on thousands of synthetic and several real-world tasks. The
framework unveils that explanations may be accurate but misattribute the
importance of individual features.Comment: 14 pages, 4 figures. Under revie
Unfooling Perturbation-Based Post Hoc Explainers
Monumental advancements in artificial intelligence (AI) have lured the
interest of doctors, lenders, judges, and other professionals. While these
high-stakes decision-makers are optimistic about the technology, those familiar
with AI systems are wary about the lack of transparency of its decision-making
processes. Perturbation-based post hoc explainers offer a model agnostic means
of interpreting these systems while only requiring query-level access. However,
recent work demonstrates that these explainers can be fooled adversarially.
This discovery has adverse implications for auditors, regulators, and other
sentinels. With this in mind, several natural questions arise - how can we
audit these black box systems? And how can we ascertain that the auditee is
complying with the audit in good faith? In this work, we rigorously formalize
this problem and devise a defense against adversarial attacks on
perturbation-based explainers. We propose algorithms for the detection
(CAD-Detect) and defense (CAD-Defend) of these attacks, which are aided by our
novel conditional anomaly detection approach, KNN-CAD. We demonstrate that our
approach successfully detects whether a black box system adversarially conceals
its decision-making process and mitigates the adversarial attack on real-world
data for the prevalent explainers, LIME and SHAP.Comment: Accepted to AAAI-23. 9 pages (not including references and
supplemental
How Well Do Feature-Additive Explainers Explain Feature-Additive Predictors?
Surging interest in deep learning from high-stakes domains has precipitated
concern over the inscrutable nature of black box neural networks. Explainable
AI (XAI) research has led to an abundance of explanation algorithms for these
black boxes. Such post hoc explainers produce human-comprehensible
explanations, however, their fidelity with respect to the model is not well
understood - explanation evaluation remains one of the most challenging issues
in XAI. In this paper, we ask a targeted but important question: can popular
feature-additive explainers (e.g., LIME, SHAP, SHAPR, MAPLE, and PDP) explain
feature-additive predictors? Herein, we evaluate such explainers on ground
truth that is analytically derived from the additive structure of a model. We
demonstrate the efficacy of our approach in understanding these explainers
applied to symbolic expressions, neural networks, and generalized additive
models on thousands of synthetic and several real-world tasks. Our results
suggest that all explainers eventually fail to correctly attribute the
importance of features, especially when a decision-making process involves
feature interactions.Comment: Accepted to NeurIPS Workshop XAI in Action: Past, Present, and Future
Applications. arXiv admin note: text overlap with arXiv:2106.0837
Analysis of Wide and Deep Echo State Networks for Multiscale Spatiotemporal Time Series Forecasting
Echo state networks are computationally lightweight reservoir models inspired
by the random projections observed in cortical circuitry. As interest in
reservoir computing has grown, networks have become deeper and more intricate.
While these networks are increasingly applied to nontrivial forecasting tasks,
there is a need for comprehensive performance analysis of deep reservoirs. In
this work, we study the influence of partitioning neurons given a budget and
the effect of parallel reservoir pathways across different datasets exhibiting
multi-scale and nonlinear dynamics.Comment: 10 pages, 10 figures, Proceedings of the Neuro-inspired Computational
Elements Workshop (NICE '19), March 26-28, 2019, Albany, NY, US
Learning Interpretable Models Through Multi-Objective Neural Architecture Search
Monumental advances in deep learning have led to unprecedented achievements
across a multitude of domains. While the performance of deep neural networks is
indubitable, the architectural design and interpretability of such models are
nontrivial. Research has been introduced to automate the design of neural
network architectures through neural architecture search (NAS). Recent progress
has made these methods more pragmatic by exploiting distributed computation and
novel optimization algorithms. However, there is little work in optimizing
architectures for interpretability. To this end, we propose a multi-objective
distributed NAS framework that optimizes for both task performance and
introspection. We leverage the non-dominated sorting genetic algorithm
(NSGA-II) and explainable AI (XAI) techniques to reward architectures that can
be better comprehended by humans. The framework is evaluated on several image
classification datasets. We demonstrate that jointly optimizing for
introspection ability and task error leads to more disentangled architectures
that perform within tolerable error.Comment: 14 pages main text, 5 pages references, 17 pages supplementa