Search CORE

22 research outputs found

A Flexible and Adaptive Framework for Abstention Under Class Imbalance

Author: Alexandari Amr
Kundaje Anshul
Shrikumar Avanti
Publication venue
Publication date: 30/10/2019
Field of study

In practical applications of machine learning, it is often desirable to identify and abstain on examples where the model's predictions are likely to be incorrect. Much of the prior work on this topic focused on out-of-distribution detection or performance metrics such as top-k accuracy. Comparatively little attention was given to metrics such as area-under-the-curve or Cohen's Kappa, which are extremely relevant for imbalanced datasets. Abstention strategies aimed at top-k accuracy can produce poor results on these metrics when applied to imbalanced datasets, even when all examples are in-distribution. We propose a framework to address this gap. Our framework leverages the insight that calibrated probability estimates can be used as a proxy for the true class labels, thereby allowing us to estimate the change in an arbitrary metric if an example were abstained on. Using this framework, we derive computationally efficient metric-specific abstention algorithms for optimizing the sensitivity at a target specificity level, the area under the ROC, and the weighted Cohen's Kappa. Because our method relies only on calibrated probability estimates, we further show that by leveraging recent work on domain adaptation under label shift, we can generalize to test-set distributions that may have a different class imbalance compared to the training set distribution. On various experiments involving medical imaging, natural language processing, computer vision and genomics, we demonstrate the effectiveness of our approach. Source code available at https://github.com/blindauth/abstention. Colab notebooks reproducing results available at https://github.com/blindauth/abstention_experiments

arXiv.org e-Print Archive

Making Neural Networks Interpretable with Attribution: Application to Implicit Signals Prediction

Author: Alvarez-Melis David
Chen Xi
Devlin Jacob
Freund Yoav
Hansen Christian
Hidasi Balázs
Huang Junzhou
Kim Been
Kingma P
Lundberg M
Schulz Karl
Shrikumar Avanti
Simonyan Karen
Smilkov Daniel
Voynov Andrey
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/08/2020
Field of study

Explaining recommendations enables users to understand whether recommended items are relevant to their needs and has been shown to increase their trust in the system. More generally, if designing explainable machine learning models is key to check the sanity and robustness of a decision process and improve their efficiency, it however remains a challenge for complex architectures, especially deep neural networks that are often deemed "black-box". In this paper, we propose a novel formulation of interpretable deep neural networks for the attribution task. Differently to popular post-hoc methods, our approach is interpretable by design. Using masked weights, hidden features can be deeply attributed, split into several input-restricted sub-networks and trained as a boosted mixture of experts. Experimental results on synthetic data and real-world recommendation tasks demonstrate that our method enables to build models achieving close predictive performances to their non-interpretable counterparts, while providing informative attribution interpretations.Comment: 14th ACM Conference on Recommender Systems (RecSys '20

arXiv.org e-Print Archive

Crossref

Complaint-driven Training Data Debugging for Query 2.0

Author: Abuzaid Firas
Agarwal Alekh
Boehm Matthias
Chapman Adriane
Gilpin Leilani H.
Giordano Ryan
Green Todd J.
Kang Daniel
Kantchelian Alex
Khanna Rajiv
Koh Pang Wei
Konda Pradap
Krishnan Sanjay
Li Yuliang
Matthew
Metsis Vangelis
Rahm Erhard
Ribeiro Marco Túlio
Ré Christopher
Shrikumar Avanti
Sundararajan Mukund
Tanaka Daiki
Xu Jingyi
Zhang Xuezhou
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/04/2020
Field of study

As the need for machine learning (ML) increases rapidly across all industry sectors, there is a significant interest among commercial database providers to support "Query 2.0", which integrates model inference into SQL queries. Debugging Query 2.0 is very challenging since an unexpected query result may be caused by the bugs in training data (e.g., wrong labels, corrupted features). In response, we propose Rain, a complaint-driven training data debugging system. Rain allows users to specify complaints over the query's intermediate or final output, and aims to return a minimum set of training examples so that if they were removed, the complaints would be resolved. To the best of our knowledge, we are the first to study this problem. A naive solution requires retraining an exponential number of ML models. We propose two novel heuristic approaches based on influence functions which both require linear retraining steps. We provide an in-depth analytical and empirical analysis of the two approaches and conduct extensive experiments to evaluate their effectiveness using four real-world datasets. Results show that Rain achieves the highest recall@k among all the baselines while still returns results interactively.Comment: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Dat

arXiv.org e-Print Archive

Crossref

Deciphering regulatory DNA sequences and noncoding genetic variants using neural network models of massively parallel reporter assays.

Author: Anshul Kundaje
Avanti Shrikumar
Georgi K Marinov
Peyton Greenside
Rajiv Movva
Surag Nair
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

The relationship between noncoding DNA sequence and gene expression is not well-understood. Massively parallel reporter assays (MPRAs), which quantify the regulatory activity of large libraries of DNA sequences in parallel, are a powerful approach to characterize this relationship. We present MPRA-DragoNN, a convolutional neural network (CNN)-based framework to predict and interpret the regulatory activity of DNA sequences as measured by MPRAs. While our method is generally applicable to a variety of MPRA designs, here we trained our model on the Sharpr-MPRA dataset that measures the activity of ∼500,000 constructs tiling 15,720 regulatory regions in human K562 and HepG2 cell lines. MPRA-DragoNN predictions were moderately correlated (Spearman ρ = 0.28) with measured activity and were within range of replicate concordance of the assay. State-of-the-art model interpretation methods revealed high-resolution predictive regulatory sequence features that overlapped transcription factor (TF) binding motifs. We used the model to investigate the cell type and chromatin state preferences of predictive TF motifs. We explored the ability of our model to predict the allelic effects of regulatory variants in an independent MPRA experiment and fine map putative functional SNPs in loci associated with lipid traits. Our results suggest that interpretable deep learning models trained on MPRA data have the potential to reveal meaningful patterns in regulatory DNA sequences and prioritize regulatory genetic variants, especially as larger, higher-quality datasets are produced

Directory of Open Access Journals

Bias factorized, base-resolution deep learning models of chromatin accessibility reveal cis-regulatory sequence syntax, transcription factor footprints and regulatory variants.

Author: Kundaje Anshul
Kundu Soumya
Nair Surag
Pampari Anusri
Patel Aman
Schreiber Jacob
Shcherbina Anna
Shrikumar Avanti
Wang Austin
Publication venue: Zenodo
Publication date: 11/02/2023
Field of study

<ul> <li>(MAJOR) Bug in chrombpnet modisco_motifs command. seqlets was limited to 50000. If users wanted to change it to 1 million this did not happen.</li> <li>Filter peaks at edges for pred_bw command and bias pipleline. So bias evaluation now done on these filtered peaks.</li> <li>Preprocessing deafulted to use unix sort. Provided option to switch to bedtools sort.</li> <li>Provided option to use filter chromosomes option in preprocessing.</li> </ul> <p><strong>Full Changelog</strong>: https://github.com/kundajelab/chrombpnet/compare/v0.1.3...v0.1.4</p>If you use this software, please cite it as below

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY