336 research outputs found

    Saturated locally optimal designs under differentiable optimality criteria

    Full text link
    We develop general theory for finding locally optimal designs in a class of single-covariate models under any differentiable optimality criterion. Yang and Stufken [Ann. Statist. 40 (2012) 1665-1681] and Dette and Schorning [Ann. Statist. 41 (2013) 1260-1267] gave complete class results for optimal designs under such models. Based on their results, saturated optimal designs exist; however, how to find such designs has not been addressed. We develop tools to find saturated optimal designs, and also prove their uniqueness under mild conditions.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1263 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Computing SHAP Efficiently Using Model Structure Information

    Full text link
    SHAP (SHapley Additive exPlanations) has become a popular method to attribute the prediction of a machine learning model on an input to its features. One main challenge of SHAP is the computation time. An exact computation of Shapley values requires exponential time complexity. Therefore, many approximation methods are proposed in the literature. In this paper, we propose methods that can compute SHAP exactly in polynomial time or even faster for SHAP definitions that satisfy our additivity and dummy assumptions (eg, kernal SHAP and baseline SHAP). We develop different strategies for models with different levels of model structure information: known functional decomposition, known order of model (defined as highest order of interaction in the model), or unknown order. For the first case, we demonstrate an additive property and a way to compute SHAP from the lower-order functional components. For the second case, we derive formulas that can compute SHAP in polynomial time. Both methods yield exact SHAP results. Finally, if even the order of model is unknown, we propose an iterative way to approximate Shapley values. The three methods we propose are computationally efficient when the order of model is not high which is typically the case in practice. We compare with sampling approach proposed in Castor & Gomez (2008) using simulation studies to demonstrate the efficacy of our proposed methods.Comment: 15 page

    Bestimmung von hydraulischen Parametern in Lockergesteinen: Ein Vergleich unterschiedlicher Feldmethoden

    Get PDF
    Zusammenfassung: In dieser Feldstudie werden die laufzeitbasierte tomographische Inversion von Daten aus Kurzzeitpumpversuchen mit der analytischen Auswertung verglichen und die ermittelten hydraulischen Parameter hinsichtlich ihrer räumlichen Auflösung diskutiert und bewertet. Als Datenbasis dienen Messergebnisse aus Kurzzeitpumpversuchen, die in einer tomographischen Messanordnung in einem zwei Meter mächtigen, gut charakterisierten Sand- und Kiesgrundwasserleiter unter Verwendung eines 2"-Brunnens und eines Multikammerbrunnens, beide mit Direct-Push-Technik installiert, durchgeführt wurden. Die analytische Auswertung der Kurzzeitpumpversuche hat gezeigt, dass es nicht möglich ist, Bereiche mit unterschiedlichen hydraulischen Eigenschaften voneinander abzugrenzen. Entsprechend einem Vergleich mit den Ergebnissen von Multilevel-Slug-Tests werden die ermittelten hydraulischen Parameter, trotz einer geringen Pumpdauer von 200 Sekunden und hydraulisch isolierten Pump- und Beobachtungsintervallen, von einem hydraulisch höher durchlässigen Bereich am unteren Rand des Grundwasserleiters dominiert. Die laufzeitbasierte tomographische Inversion ermöglicht hingegen, vertikale und laterale Änderungen der Diffusivitätsverteilung zwischen Pump- und Beobachtungsbrunnen hochaufgelöst zu rekonstruiere

    Shapley Computations Using Surrogate Model-Based Trees

    Full text link
    Shapley-related techniques have gained attention as both global and local interpretation tools because of their desirable properties. However, their computation using conditional expectations is computationally expensive. Approximation methods suggested in the literature have limitations. This paper proposes the use of a surrogate model-based tree to compute Shapley and SHAP values based on conditional expectation. Simulation studies show that the proposed algorithm provides improvements in accuracy, unifies global Shapley and SHAP interpretation, and the thresholding method provides a way to trade-off running time and accuracy

    Using Model-Based Trees with Boosting to Fit Low-Order Functional ANOVA Models

    Full text link
    Low-order functional ANOVA (fANOVA) models have been rediscovered in the machine learning (ML) community under the guise of inherently interpretable machine learning. Explainable Boosting Machines or EBM (Lou et al. 2013) and GAMI-Net (Yang et al. 2021) are two recently proposed ML algorithms for fitting functional main effects and second-order interactions. We propose a new algorithm, called GAMI-Tree, that is similar to EBM, but has a number of features that lead to better performance. It uses model-based trees as base learners and incorporates a new interaction filtering method that is better at capturing the underlying interactions. In addition, our iterative training method converges to a model with better predictive performance, and the embedded purification ensures that interactions are hierarchically orthogonal to main effects. The algorithm does not need extensive tuning, and our implementation is fast and efficient. We use simulated and real datasets to compare the performance and interpretability of GAMI-Tree with EBM and GAMI-Net.Comment: 25 pages plus appendi

    Monotone Tree-Based GAMI Models by Adapting XGBoost

    Full text link
    Recent papers have used machine learning architecture to fit low-order functional ANOVA models with main effects and second-order interactions. These GAMI (GAM + Interaction) models are directly interpretable as the functional main effects and interactions can be easily plotted and visualized. Unfortunately, it is not easy to incorporate the monotonicity requirement into the existing GAMI models based on boosted trees, such as EBM (Lou et al. 2013) and GAMI-Lin-T (Hu et al. 2022). This paper considers models of the form f(x)=∑j,kfj,k(xj,xk)f(x)=\sum_{j,k}f_{j,k}(x_j, x_k) and develops monotone tree-based GAMI models, called monotone GAMI-Tree, by adapting the XGBoost algorithm. It is straightforward to fit a monotone model to f(x)f(x) using the options in XGBoost. However, the fitted model is still a black box. We take a different approach: i) use a filtering technique to determine the important interactions, ii) fit a monotone XGBoost algorithm with the selected interactions, and finally iii) parse and purify the results to get a monotone GAMI model. Simulated datasets are used to demonstrate the behaviors of mono-GAMI-Tree and EBM, both of which use piecewise constant fits. Note that the monotonicity requirement is for the full model. Under certain situations, the main effects will also be monotone. But, as seen in the examples, the interactions will not be monotone.Comment: 12 page

    Characterization of immune microenvironment identifies prognostic and immunotherapy benefit for trastuzumab-based therapy

    Get PDF
    Background and Purpose: The tumor immune microenvironment (TIME) of breast cancer with positive human epidermal growth factor receptor 2 (HER2) is significantly related to the efficacy of trastuzumab, indicating the clinical potential of immunocheckpoint therapy combined with trastuzumab. This study aimed to explore the predictors of HER2-positive breast cancer combination therapy and screen the potential beneficiaries of combination therapy. Methods: Transcriptome and genome data of 509 HER2-positive breast cancer samples of patients receiving trastuzumab treatment from Gene Expression Omnibus (GEO) database and 67 HER2-positive breast cancer samples from The Cancer Genome Atlas (TCGA) databases were collected. Trastuzumab-resistant group’s differentially expressed genes were identified and analyzed for functional enrichment and protein-protein interaction. The log-rank test and multivariate COX proportional hazards regression were used with clinical data to create the prediction model. The TIME landscape was characterized using the CIBERSORT. The immunotherapy benefit was valued by the tumor immune dysfunction and exclusion (TIDE) score. Results: The trastuzumab related genetic prognostic index (TRGPI) consisting of four hub genes (GATA6, TRPV6, AMACR, ZHX2) was constructed by analyzing the immune microenvironment and gene expression characteristics between trastuzumab-remission group and trastuzumab-resistance group. Importantly, the results revealed that patients with lower TRPGI were trastuzumab-sensitive and more likely to benefit from immunotherapy because of the increased percentages of CD8+ T cells, active natural killer cells and programmed death-1 (PD-1) expression. Conclusion: This study redefined the benefit population through TIME and provided a selectable strategy of trastuzumab plus immunotherapy for HER2-positive breast cancer
    • …
    corecore