129 research outputs found
AdvCat: Domain-Agnostic Robustness Assessment for Cybersecurity-Critical Applications with Categorical Inputs
Machine Learning-as-a-Service systems (MLaaS) have been largely developed for
cybersecurity-critical applications, such as detecting network intrusions and
fake news campaigns. Despite effectiveness, their robustness against
adversarial attacks is one of the key trust concerns for MLaaS deployment. We
are thus motivated to assess the adversarial robustness of the Machine Learning
models residing at the core of these security-critical applications with
categorical inputs. Previous research efforts on accessing model robustness
against manipulation of categorical inputs are specific to use cases and
heavily depend on domain knowledge, or require white-box access to the target
ML model. Such limitations prevent the robustness assessment from being as a
domain-agnostic service provided to various real-world applications. We propose
a provably optimal yet computationally highly efficient adversarial robustness
assessment protocol for a wide band of ML-driven cybersecurity-critical
applications. We demonstrate the use of the domain-agnostic robustness
assessment method with substantial experimental study on fake news detection
and intrusion detection problems.Comment: IEEE BigData 202
Towards understanding the robustness against evasion attack on categorical inputs
International audienceCharacterizing and assessing the adversarial risk of a classifier with categorical inputs has been a practically important yet rarely explored research problem. Conventional wisdom attributes the difficulty of solving the problem to its combinatorial nature. Previous research efforts tackling this problem are specific to use cases and heavily depend on domain knowledge. Such limitations prevent their general applicability in real-world applications with categorical data. Our study novelly shows that provably optimal adversarial robustness assessment is computationally feasible for any classifier with a mild smoothness constraint. We theoretically analyze the impact factors of adversarial vulnerability of a classifier with categorical inputs via an information-theoretic adversarial risk analysis. Corroborating these theoretical findings with a substantial experimental study over various real-world categorical datasets, we can empirically assess the impact of the key adversarial risk factors over a targeted learning system with categorical inputs
Greedy PIG: Adaptive Integrated Gradients
Deep learning has become the standard approach for most machine learning
tasks. While its impact is undeniable, interpreting the predictions of deep
learning models from a human perspective remains a challenge. In contrast to
model training, model interpretability is harder to quantify and pose as an
explicit optimization problem. Inspired by the AUC softmax information curve
(AUC SIC) metric for evaluating feature attribution methods, we propose a
unified discrete optimization framework for feature attribution and feature
selection based on subset selection. This leads to a natural adaptive
generalization of the path integrated gradients (PIG) method for feature
attribution, which we call Greedy PIG. We demonstrate the success of Greedy PIG
on a wide variety of tasks, including image feature attribution, graph
compression/explanation, and post-hoc feature selection on tabular data. Our
results show that introducing adaptivity is a powerful and versatile method for
making attribution methods more powerful
Scalable Projection-Free Optimization
As a projection-free algorithm, Frank-Wolfe (FW) method, also known as conditional gradient, has recently received considerable attention in the machine learning community. In this dissertation, we study several topics on the FW variants for scalable projection-free optimization. We first propose 1-SFW, the first projection-free method that requires only one sample per iteration to update the optimization variable and yet achieves the best known complexity bounds for convex, non-convex, and monotone DR-submodular settings. Then we move forward to the distributed setting, and develop Quantized Frank-Wolfe (QFW), ageneral communication-efficient distributed FW framework for both convex and non-convex objective functions. We study the performance of QFW in two widely recognized settings: 1) stochastic optimization and 2) finite-sum optimization. Finally, we propose Black-Box Continuous Greedy, a derivative-free and projection-free algorithm, that maximizes a monotone continuous DR-submodular function over a bounded convex body in Euclidean space
Robust Counterfactual Explanations on Graph Neural Networks
Massive deployment of Graph Neural Networks (GNNs) in high-stake applications
generates a strong demand for explanations that are robust to noise and align
well with human intuition. Most existing methods generate explanations by
identifying a subgraph of an input graph that has a strong correlation with the
prediction. These explanations are not robust to noise because independently
optimizing the correlation for a single input can easily overfit noise.
Moreover, they do not align well with human intuition because removing an
identified subgraph from an input graph does not necessarily change the
prediction result. In this paper, we propose a novel method to generate robust
counterfactual explanations on GNNs by explicitly modelling the common decision
logic of GNNs on similar input graphs. Our explanations are naturally robust to
noise because they are produced from the common decision boundaries of a GNN
that govern the predictions of many similar input graphs. The explanations also
align well with human intuition because removing the set of edges identified by
an explanation from the input graph changes the prediction significantly.
Exhaustive experiments on many public datasets demonstrate the superior
performance of our method
Learning in the Real World: Constraints on Cost, Space, and Privacy
The sheer demand for machine learning in fields as varied as: healthcare, web-search ranking, factory automation, collision prediction, spam filtering, and many others, frequently outpaces the intended use-case of machine learning models. In fact, a growing number of companies hire machine learning researchers to rectify this very problem: to tailor and/or design new state-of-the-art models to the setting at hand.
However, we can generalize a large set of the machine learning problems encountered in practical settings into three categories: cost, space, and privacy. The first category (cost) considers problems that need to balance the accuracy of a machine learning model with the cost required to evaluate it. These include problems in web-search, where results need to be delivered to a user in under a second and be as accurate as possible. The second category (space) collects problems that require running machine learning algorithms on low-memory computing devices. For instance, in search-and-rescue operations we may opt to use many small unmanned aerial vehicles (UAVs) equipped with machine learning algorithms for object detection to find a desired search target. These algorithms should be small to fit within the physical memory limits of the UAV (and be energy efficient) while reliably detecting objects. The third category (privacy) considers problems where one wishes to run machine learning algorithms on sensitive data. It has been shown that seemingly innocuous analyses on such data can be exploited to reveal data individuals would prefer to keep private. Thus, nearly any algorithm that runs on patient or economic data falls under this set of problems.
We devise solutions for each of these problem categories including (i) a fast tree-based model for explicitly trading off accuracy and model evaluation time, (ii) a compression method for the k-nearest neighbor classifier, and (iii) a private causal inference algorithm that protects sensitive data
Information overload in structured data
Information overload refers to the difficulty of making decisions caused by too much information. In this dissertation, we address information overload problem in two separate structured domains, namely, graphs and text.
Graph kernels have been proposed as an efficient and theoretically sound approach to compute graph similarity. They decompose graphs into certain sub-structures, such as subtrees, or subgraphs. However, existing graph kernels suffer from a few drawbacks. First, the dimension of the feature space associated with the kernel often grows exponentially as the complexity of sub-structures increase. One immediate consequence of this behavior is that small, non-informative, sub-structures occur more frequently and cause information overload. Second, as the number of features increase, we encounter sparsity: only a few informative sub-structures will co-occur in multiple graphs. In the first part of this dissertation, we propose to tackle the above problems by exploiting the dependency relationship among sub-structures. First, we propose a novel framework that learns the latent representations of sub-structures by leveraging recent advancements in deep learning. Second, we propose a general smoothing framework that takes structural similarity into account, inspired by state-of-the-art smoothing techniques used in natural language processing. Both the proposed frameworks are applicable to popular graph kernel families, and achieve significant performance improvements over state-of-the-art graph kernels.
In the second part of this dissertation, we tackle information overload in text. We first focus on a popular social news aggregation website, Reddit, and design a submodular recommender system that tailors a personalized frontpage for individual users. Second, we propose a novel submodular framework to summarize videos, where both transcript and comments are available. Third, we demonstrate how to apply filtering techniques to select a small subset of informative features from virtual machine logs in order to predict resource usage
- âŠ