336 research outputs found
Saturated locally optimal designs under differentiable optimality criteria
We develop general theory for finding locally optimal designs in a class of
single-covariate models under any differentiable optimality criterion. Yang and
Stufken [Ann. Statist. 40 (2012) 1665-1681] and Dette and Schorning [Ann.
Statist. 41 (2013) 1260-1267] gave complete class results for optimal designs
under such models. Based on their results, saturated optimal designs exist;
however, how to find such designs has not been addressed. We develop tools to
find saturated optimal designs, and also prove their uniqueness under mild
conditions.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1263 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Computing SHAP Efficiently Using Model Structure Information
SHAP (SHapley Additive exPlanations) has become a popular method to attribute
the prediction of a machine learning model on an input to its features. One
main challenge of SHAP is the computation time. An exact computation of Shapley
values requires exponential time complexity. Therefore, many approximation
methods are proposed in the literature. In this paper, we propose methods that
can compute SHAP exactly in polynomial time or even faster for SHAP definitions
that satisfy our additivity and dummy assumptions (eg, kernal SHAP and baseline
SHAP). We develop different strategies for models with different levels of
model structure information: known functional decomposition, known order of
model (defined as highest order of interaction in the model), or unknown order.
For the first case, we demonstrate an additive property and a way to compute
SHAP from the lower-order functional components. For the second case, we derive
formulas that can compute SHAP in polynomial time. Both methods yield exact
SHAP results. Finally, if even the order of model is unknown, we propose an
iterative way to approximate Shapley values. The three methods we propose are
computationally efficient when the order of model is not high which is
typically the case in practice. We compare with sampling approach proposed in
Castor & Gomez (2008) using simulation studies to demonstrate the efficacy of
our proposed methods.Comment: 15 page
Bestimmung von hydraulischen Parametern in Lockergesteinen: Ein Vergleich unterschiedlicher Feldmethoden
Zusammenfassung: In dieser Feldstudie werden die laufzeitbasierte tomographische Inversion von Daten aus Kurzzeitpumpversuchen mit der analytischen Auswertung verglichen und die ermittelten hydraulischen Parameter hinsichtlich ihrer räumlichen Auflösung diskutiert und bewertet. Als Datenbasis dienen Messergebnisse aus Kurzzeitpumpversuchen, die in einer tomographischen Messanordnung in einem zwei Meter mächtigen, gut charakterisierten Sand- und Kiesgrundwasserleiter unter Verwendung eines 2"-Brunnens und eines Multikammerbrunnens, beide mit Direct-Push-Technik installiert, durchgeführt wurden. Die analytische Auswertung der Kurzzeitpumpversuche hat gezeigt, dass es nicht möglich ist, Bereiche mit unterschiedlichen hydraulischen Eigenschaften voneinander abzugrenzen. Entsprechend einem Vergleich mit den Ergebnissen von Multilevel-Slug-Tests werden die ermittelten hydraulischen Parameter, trotz einer geringen Pumpdauer von 200 Sekunden und hydraulisch isolierten Pump- und Beobachtungsintervallen, von einem hydraulisch höher durchlässigen Bereich am unteren Rand des Grundwasserleiters dominiert. Die laufzeitbasierte tomographische Inversion ermöglicht hingegen, vertikale und laterale Änderungen der Diffusivitätsverteilung zwischen Pump- und Beobachtungsbrunnen hochaufgelöst zu rekonstruiere
Shapley Computations Using Surrogate Model-Based Trees
Shapley-related techniques have gained attention as both global and local
interpretation tools because of their desirable properties. However, their
computation using conditional expectations is computationally expensive.
Approximation methods suggested in the literature have limitations. This paper
proposes the use of a surrogate model-based tree to compute Shapley and SHAP
values based on conditional expectation. Simulation studies show that the
proposed algorithm provides improvements in accuracy, unifies global Shapley
and SHAP interpretation, and the thresholding method provides a way to
trade-off running time and accuracy
Using Model-Based Trees with Boosting to Fit Low-Order Functional ANOVA Models
Low-order functional ANOVA (fANOVA) models have been rediscovered in the
machine learning (ML) community under the guise of inherently interpretable
machine learning. Explainable Boosting Machines or EBM (Lou et al. 2013) and
GAMI-Net (Yang et al. 2021) are two recently proposed ML algorithms for fitting
functional main effects and second-order interactions. We propose a new
algorithm, called GAMI-Tree, that is similar to EBM, but has a number of
features that lead to better performance. It uses model-based trees as base
learners and incorporates a new interaction filtering method that is better at
capturing the underlying interactions. In addition, our iterative training
method converges to a model with better predictive performance, and the
embedded purification ensures that interactions are hierarchically orthogonal
to main effects. The algorithm does not need extensive tuning, and our
implementation is fast and efficient. We use simulated and real datasets to
compare the performance and interpretability of GAMI-Tree with EBM and
GAMI-Net.Comment: 25 pages plus appendi
Monotone Tree-Based GAMI Models by Adapting XGBoost
Recent papers have used machine learning architecture to fit low-order
functional ANOVA models with main effects and second-order interactions. These
GAMI (GAM + Interaction) models are directly interpretable as the functional
main effects and interactions can be easily plotted and visualized.
Unfortunately, it is not easy to incorporate the monotonicity requirement into
the existing GAMI models based on boosted trees, such as EBM (Lou et al. 2013)
and GAMI-Lin-T (Hu et al. 2022). This paper considers models of the form
and develops monotone tree-based GAMI
models, called monotone GAMI-Tree, by adapting the XGBoost algorithm. It is
straightforward to fit a monotone model to using the options in XGBoost.
However, the fitted model is still a black box. We take a different approach:
i) use a filtering technique to determine the important interactions, ii) fit a
monotone XGBoost algorithm with the selected interactions, and finally iii)
parse and purify the results to get a monotone GAMI model. Simulated datasets
are used to demonstrate the behaviors of mono-GAMI-Tree and EBM, both of which
use piecewise constant fits. Note that the monotonicity requirement is for the
full model. Under certain situations, the main effects will also be monotone.
But, as seen in the examples, the interactions will not be monotone.Comment: 12 page
Characterization of immune microenvironment identifies prognostic and immunotherapy benefit for trastuzumab-based therapy
Background and Purpose: The tumor immune microenvironment (TIME) of breast cancer with positive human epidermal growth factor receptor 2 (HER2) is significantly related to the efficacy of trastuzumab, indicating the clinical potential of immunocheckpoint therapy combined with trastuzumab. This study aimed to explore the predictors of HER2-positive breast cancer combination therapy and screen the potential beneficiaries of combination therapy. Methods: Transcriptome and genome data of 509 HER2-positive breast cancer samples of patients receiving trastuzumab treatment from Gene Expression Omnibus (GEO) database and 67 HER2-positive breast cancer samples from The Cancer Genome Atlas (TCGA) databases were collected. Trastuzumab-resistant group’s differentially expressed genes were identified and analyzed for functional enrichment and protein-protein interaction. The log-rank test and multivariate COX proportional hazards regression were used with clinical data to create the prediction model. The TIME landscape was characterized using the CIBERSORT. The immunotherapy benefit was valued by the tumor immune dysfunction and exclusion (TIDE) score. Results: The trastuzumab related genetic prognostic index (TRGPI) consisting of four hub genes (GATA6, TRPV6, AMACR, ZHX2) was constructed by analyzing the immune microenvironment and gene expression characteristics between trastuzumab-remission group and trastuzumab-resistance group. Importantly, the results revealed that patients with lower TRPGI were trastuzumab-sensitive and more likely to benefit from immunotherapy because of the increased percentages of CD8+ T cells, active natural killer cells and programmed death-1 (PD-1) expression. Conclusion: This study redefined the benefit population through TIME and provided a selectable strategy of trastuzumab plus immunotherapy for HER2-positive breast cancer
- …