Search CORE

7 research outputs found

Adversarial Infidelity Learning for Model Interpretation

Author: Abadi Mart'in
Ancona Marco
Chakraborti Tathagata
Chen Jianbo
Dombrowski Ann-Kathrin
Heo Juyeon
Howard Andrew G
Jain Sarthak
Schwab Patrick
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/08/2020
Field of study

Model interpretation is essential in data mining and knowledge discovery. It can help understand the intrinsic model working mechanism and check if the model has undesired characteristics. A popular way of performing model interpretation is Instance-wise Feature Selection (IFS), which provides an importance score of each feature representing the data samples to explain how the model generates the specific output. In this paper, we propose a Model-agnostic Effective Efficient Direct (MEED) IFS framework for model interpretation, mitigating concerns about sanity, combinatorial shortcuts, model identifiability, and information transmission. Also, we focus on the following setting: using selected features to directly predict the output of the given model, which serves as a primary evaluation metric for model-interpretation methods. Apart from the features, we involve the output of the given model as an additional input to learn an explainer based on more accurate information. To learn the explainer, besides fidelity, we propose an Adversarial Infidelity Learning (AIL) mechanism to boost the explanation learning by screening relatively unimportant features. Through theoretical and experimental analysis, we show that our AIL mechanism can help learn the desired conditional distribution between selected features and targets. Moreover, we extend our framework by integrating efficient interpretation methods as proper priors to provide a warm start. Comprehensive empirical evaluation results are provided by quantitative metrics and human evaluation to demonstrate the effectiveness and superiority of our proposed method. Our code is publicly available online at https://github.com/langlrsw/MEED.Comment: 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '20), August 23--27, 2020, Virtual Event, US

arXiv.org e-Print Archive

Crossref

Representation Engineering: A Top-Down Approach to AI Transparency

Author: Basart Steven
Byun Michael J.
Campbell James
Chen Sarah
Dombrowski Ann-Kathrin
Fredrikson Matt
Goel Shashwat
Guo Phillip
Hendrycks Dan
Kolter J. Zico
Koyejo Sanmi
Li Nathaniel
Mallen Alex
Mazeika Mantas
Pan Alexander
Phan Long
Ren Richard
Song Dawn
Wang Zifan
Yin Xuwang
Zou Andy
Publication venue
Publication date: 10/10/2023
Field of study

In this paper, we identify and characterize the emerging area of representation engineering (RepE), an approach to enhancing the transparency of AI systems that draws on insights from cognitive neuroscience. RepE places population-level representations, rather than neurons or circuits, at the center of analysis, equipping us with novel methods for monitoring and manipulating high-level cognitive phenomena in deep neural networks (DNNs). We provide baselines and an initial analysis of RepE techniques, showing that they offer simple yet effective solutions for improving our understanding and control of large language models. We showcase how these methods can provide traction on a wide range of safety-relevant problems, including honesty, harmlessness, power-seeking, and more, demonstrating the promise of top-down transparency research. We hope that this work catalyzes further exploration of RepE and fosters advancements in the transparency and safety of AI systems.Comment: Code is available at https://github.com/andyzoujm/representation-engineerin

arXiv.org e-Print Archive

CNN cascades for segmenting sparse objects in gigapixel whole slide images

Author: Boor Peter
Dombrowski Ann-Kathrin
Gadermayr Michael
Klinkhammer Barbara Mara
Merhof Dorit
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Publikationsserver der RWTH Aachen University

Adversarial Attacks and Defenses

Author: Allen-Zhu Zeyuan
Athalye Anish
Baker Nicholas
Barbu Andrei
Battaglia Peter W
Bengio Yoshua
Brown Tom B
Brown Tom B
Che Zhengping
Chen Chaofan
Dabkowski Piotr
Dhurandhar Amit
Dombrowski Ann-Kathrin
Dong Yinpeng
Doshi-Velez Finale
Erhan Dumitru
Feinman Reuben
Fidel Gil
Gao Jun
Ghorbani Amirata
Gong Zhitao
Goodfellow Ian J
Goyal Yash
Grosse Kathrin
Gu Tianyu
Guo Wenbo
Hamilton William L
Harradon Michael
Hempel Carl G
Higgins Irina
Ilyas Andrew
Kang Jian
Kim Been
Kim Been
Kim Been
Koh Pang Wei
Kurakin Alexey
Kurakin Alexey
Lei Qi
Levine Alexander
Li Jiwei
Liu Ninghao
Lombrozo Tania
Lundberg Scott M
Ma Jianxin
Madry Aleksander
Mathew Binny
Metzen Jan Hendrik
Montavon Gr´egoire
Mopuri Konda Reddy
Murdoch W James
Narayanan Menaka
Narendra Tanmayee
Panigrahi Abhishek
Papernot Nicolas
Papernot Nicolas
Raha
Sabour Sara
Santurkar Shibani
Sharif Mahmood
Shi Baifeng
Simonyan Karen
Smilkov Daniel
Song Dawn
Sundararajan Mukund
Szegedy Christian
Tao Guanhong
Tram'er Florian
Tsipras Dimitris
Wachter Sandra
Xie Cihang
Xie Cihang
Xie Cihang
Xu Weilin
Yang Fan
Yang Puyudi
Young Tom
Zhang Hongyang
Zhang Tianyuan
Zhou Qinghai
Zhu Ligeng
´c Petar
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref