Search CORE

9 research outputs found

Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance

Author: Crabbé Jonathan
van der Schaar Mihaela
Publication venue
Publication date: 13/04/2023
Field of study

Interpretability methods are valuable only if their explanations faithfully describe the explained model. In this work, we consider neural networks whose predictions are invariant under a specific symmetry group. This includes popular architectures, ranging from convolutional to graph neural networks. Any explanation that faithfully explains this type of model needs to be in agreement with this invariance property. We formalize this intuition through the notion of explanation invariance and equivariance by leveraging the formalism from geometric deep learning. Through this rigorous formalism, we derive (1) two metrics to measure the robustness of any interpretability method with respect to the model symmetry group; (2) theoretical robustness guarantees for some popular interpretability methods and (3) a systematic approach to increase the invariance of any interpretability method with respect to a symmetry group. By empirically measuring our metrics for explanations of models associated with various modalities and symmetry groups, we derive a set of 5 guidelines to allow users and developers of interpretability methods to produce robust explanations.Comment: 26 pages, 7 figure

arXiv.org e-Print Archive

TRIAGE: Characterizing and auditing training data for improved regression

Author: Crabbé Jonathan
Qian Zhaozhi
Seedat Nabeel
van der Schaar Mihaela
Publication venue
Publication date: 29/10/2023
Field of study

Data quality is crucial for robust machine learning algorithms, with the recent interest in data-centric AI emphasizing the importance of training data characterization. However, current data characterization methods are largely focused on classification settings, with regression settings largely understudied. To address this, we introduce TRIAGE, a novel data characterization framework tailored to regression tasks and compatible with a broad class of regressors. TRIAGE utilizes conformal predictive distributions to provide a model-agnostic scoring method, the TRIAGE score. We operationalize the score to analyze individual samples' training dynamics and characterize samples as under-, over-, or well-estimated by the model. We show that TRIAGE's characterization is consistent and highlight its utility to improve performance via data sculpting/filtering, in multiple regression settings. Additionally, beyond sample level, we show TRIAGE enables new approaches to dataset selection and feature acquisition. Overall, TRIAGE highlights the value unlocked by data characterization in real-world regression applicationsComment: Presented at NeurIPS 202

arXiv.org e-Print Archive

Joint Training of Deep Ensembles Fails Due to Learner Collusion

Author: Crabbé Jonathan
Jeffares Alan
Liu Tennison
van der Schaar Mihaela
Publication venue
Publication date: 31/10/2023
Field of study

Ensembles of machine learning models have been well established as a powerful method of improving performance over a single model. Traditionally, ensembling algorithms train their base learners independently or sequentially with the goal of optimizing their joint performance. In the case of deep ensembles of neural networks, we are provided with the opportunity to directly optimize the true objective: the joint performance of the ensemble as a whole. Surprisingly, however, directly minimizing the loss of the ensemble appears to rarely be applied in practice. Instead, most previous research trains individual models independently with ensembling performed post hoc. In this work, we show that this is for good reason - joint optimization of ensemble loss results in degenerate behavior. We approach this problem by decomposing the ensemble objective into the strength of the base learners and the diversity between them. We discover that joint optimization results in a phenomenon in which base learners collude to artificially inflate their apparent diversity. This pseudo-diversity fails to generalize beyond the training data, causing a larger generalization gap. We proceed to comprehensively demonstrate the practical implications of this effect on a range of standard machine learning tasks and architectures by smoothly interpolating between independent training and joint optimization.Comment: To appear in the Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023

arXiv.org e-Print Archive

DAGnosis: Localized Identification of Data Inconsistencies using Structures

Author: Berrevoets Jeroen
Crabbé Jonathan
Huynh Nicolas
Qian Zhaozhi
Seedat Nabeel
van der Schaar Mihaela
Publication venue
Publication date: 28/02/2024
Field of study

Identification and appropriate handling of inconsistencies in data at deployment time is crucial to reliably use machine learning models. While recent data-centric methods are able to identify such inconsistencies with respect to the training set, they suffer from two key limitations: (1) suboptimality in settings where features exhibit statistical independencies, due to their usage of compressive representations and (2) lack of localization to pin-point why a sample might be flagged as inconsistent, which is important to guide future data collection. We solve these two fundamental limitations using directed acyclic graphs (DAGs) to encode the training set's features probability distribution and independencies as a structure. Our method, called DAGnosis, leverages these structural interactions to bring valuable and insightful data-centric conclusions. DAGnosis unlocks the localization of the causes of inconsistencies on a DAG, an aspect overlooked by previous approaches. Moreover, we show empirically that leveraging these interactions (1) leads to more accurate conclusions in detecting inconsistencies, as well as (2) provides more detailed insights into why some samples are flagged.Comment: AISTATS 2024; added correspondance emai

arXiv.org e-Print Archive

MatterGen: a generative model for inorganic materials design

Author: Crabbé Jonathan
Fowler Andrew
Fu Xiang
Hao Hongxia
Horton Matthew
Huang Chin-Wei
Lewis Sarah
Li Jielan
Lu Ziheng
Nguyen Bichlien
Pinsler Robert
Schulz Hannes
Shysheya Sasha
Smith Jake
Sun Lixin
Tomioka Ryota
Xie Tian
Yang Han
Zeni Claudio
Zhou Yichi
Zügner Daniel
Publication venue
Publication date: 29/01/2024
Field of study

The design of functional materials with desired properties is essential in driving technological advances in areas like energy storage, catalysis, and carbon capture. Generative models provide a new paradigm for materials design by directly generating entirely novel materials given desired property constraints. Despite recent progress, current generative models have low success rate in proposing stable crystals, or can only satisfy a very limited set of property constraints. Here, we present MatterGen, a model that generates stable, diverse inorganic materials across the periodic table and can further be fine-tuned to steer the generation towards a broad range of property constraints. To enable this, we introduce a new diffusion-based generative process that produces crystalline structures by gradually refining atom types, coordinates, and the periodic lattice. We further introduce adapter modules to enable fine-tuning towards any given property constraints with a labeled dataset. Compared to prior generative models, structures produced by MatterGen are more than twice as likely to be novel and stable, and more than 15 times closer to the local energy minimum. After fine-tuning, MatterGen successfully generates stable, novel materials with desired chemistry, symmetry, as well as mechanical, electronic and magnetic properties. Finally, we demonstrate multi-property materials design capabilities by proposing structures that have both high magnetic density and a chemical composition with low supply-chain risk. We believe that the quality of generated materials and the breadth of MatterGen's capabilities represent a major advancement towards creating a universal generative model for materials design.Comment: 13 pages main text, 35 pages supplementary informatio

arXiv.org e-Print Archive

Data-SUITE: Data-centric identification of in-distribution incongruous examples

Author: Crabbé Jonathan
Seedat Nabeel
van der Schaar Mihaela
Publication venue
Publication date: 13/06/2022
Field of study

Systematic quantification of data quality is critical for consistent model performance. Prior works have focused on out-of-distribution data. Instead, we tackle an understudied yet equally important problem of characterizing incongruous regions of in-distribution (ID) data, which may arise from feature space heterogeneity. To this end, we propose a paradigm shift with Data-SUITE: a data-centric AI framework to identify these regions, independent of a task-specific model. Data-SUITE leverages copula modeling, representation learning, and conformal prediction to build feature-wise confidence interval estimators based on a set of training instances. These estimators can be used to evaluate the congruence of test instances with respect to the training set, to answer two practically useful questions: (1) which test instances will be reliably predicted by a model trained with the training instances? and (2) can we identify incongruous regions of the feature space so that data owners understand the data's limitations or guide future data collection? We empirically validate Data-SUITE's performance and coverage guarantees and demonstrate on cross-site medical data, biased data, and data with concept drift, that Data-SUITE best identifies ID regions where a downstream model may be reliable (independent of said model). We also illustrate how these identified regions can provide insights into datasets and highlight their limitations.Comment: Presented at the International Conference on Machine Learning (ICML) 202

arXiv.org e-Print Archive

Spaceflight Promotes Biofilm Formation by Pseudomonas aeruginosa

Author: A Crabbé
A Crabbé
A Crabbé
A Heydorn
BE Crucian
Christophe Beloin
CM Toutain
Cynthia H. Collins
D Klaus
ES Nelson
Farah K. Tengra
FP Baqai
G Horneck
GA O'Toole
HA Videla
Hon Kit Chan
Jasmine Shong
JD Gu
JD Shrout
Joel L. Plawsky
Jonathan S. Dordick
JW Wilson
JW Wilson
KB Barken
L Hall-Stoodley
LS England
M Harmsen
M Hogardt
M Klausen
M Klausen
Macarena Parra
MR Benoit
Nicholas Marchand
O Marcu
P Stoodley
Ravindra C. Pangule
RJ Castaneda
RJ McLean
S Guadarrama
SV Lynch
T Brooks
Wooseong Kim
Zachary Young
Publication venue: 'Public Library of Science (PLoS)'
Publication date
Field of study

Crossref

Universal Dependencies 2.3

Author: Abrams Mitchell
Ackermann Elia
Aepli Noëmi
Aghaei Hamid
Agić Željko
Ahmadi Amir
Ahrenberg Lars
Ajede Chika Kennedy
Akkurt Salih Furkan
Aleksandravičiūtė Gabrielė
Alfina Ika
Algom Avner
Alnajjar Khalid
Alzetta Chiara
Andersen Erik
Antonsen Lene
Aoyama Tatsuya
Aplonova Katya
Aquino Angelina
Aragon Carolina
Aranes Glyd
Aranzabe Maria Jesus
Arıcan Bilge Nas
Arnardóttir Þórunn
Arutie Gashaw
Arwidarasti Jessica Naraiswari
Asahara Masayuki
Aslan Deniz Baran
Asmazoğlu Cengiz
Ateyah Luma
Atmaca Furkan
Attia Mohammed
Atutxa Aitziber
Augustinus Liesbeth
Avelãs Mariana
Badmaeva Elena
Balasubramani Keerthana
Ballesteros Miguel
Banerjee Esha
Bank Sebastian
Barbu Mititelu Verginica
Barkarson Starkaður
Basile Rodolfo
Basmov Victoria
Batchelor Colin
Bauer John
Bedir Seyyit Talha
Behzad Shabnam
Belieni Juan
Ben Moshe Yifat
Bengoetxea Kepa
Benli İbrahim
Berk Gözde
Bhat Riyaz Ahmad
Biagetti Erica
Bick Eckhard
Bielinskienė Agnė
Bjarnadóttir Kristín
Blokland Rogier
Bobicev Victoria
Boizou Loïc
Borges Völker Emanuel
Bosco Cristina
Bouma Gosse
Bowman Sam
Boyd Adriane
Braggaar Anouck
Branco António
Brokaitė Kristina
Burchardt Aljoscha
Börstell Carl
Campos Marisa
Candito Marie
Caron Bernard
Caron Gauthier
Carvalheiro Catarina
Carvalho Rita
Cassidy Lauren
Castro Maria Clara
Castro Sérgio
Cavalcanti Tatiana
Cebiroğlu Eryiğit Gülşen
Cecchini Flavio Massimiliano
Celano Giuseppe G. A.
Cesur Neslihan
Cetin Savas
Chalub Fabricio
Chamila Liyanage
Chauhan Shweta
Chi Ethan
Chika Taishi
Cho Yongseok
Choi Jinho
Chun Jayeol
Chung Juyeon
Cignarella Alessandra T.
Cinková Silvie
Collomb Aurélie
Connor Miriam
Corbetta Claudia
Corbetta Daniela
Costa Francisco
Courtin Marine
Crabbé Benoît
Cristescu Mihaela
Cvetkoski Vladimir
Dale Ingerid Løyning
Daniel Philemon
Davidson Elizabeth
de Alencar Leonel Figueiredo
de Laurentiis Martina
de Marneffe Marie-Catherine
de Paiva Valeria
de Souza Elvis
Dehouck Mathieu
Derin Mehmet Oguz
Di Nuovo Elisa
Diaz de Ilarraza Arantza
Dickerson Carly
Dinakaramani Arawinda
Dione Bamba
Dirix Peter
Dobrovoljc Kaja
Doyle Adrian
Dozat Timothy
Droganova Kira
Duran Magali Sanches
Dwivedi Puneet
Ebert Christian
Eckhoff Hanne
Eguchi Masaki
Eiche Sandra
Eli Marhaba
Elkahky Ali
Ephrem Binyam
Erina Olga
Erjavec Tomaž
Essaidi Farah
Etienne Aline
Evelyn Wograine
Facundes Sidney
Farkas Richárd
Favero Federica
Ferdaousi Jannatul
Fernanda Marília
Fernandez Alcalde Hector
Fethi Amal
Foster Jennifer
Fransen Theodorus
Freitas Cláudia
Fujita Kazunori
Gajdošová Katarína
Galbraith Daniel
Gamba Federica
Garcia Marcos
Gerardi Fabrício Ferraz
Gerdes Kim
Gessler Luke
Ginter Filip
Godoy Gustavo
Goenaga Iakes
Gojenola Koldo
Goldberg Yoav
González Saavedra Berta
Griciūtė Bernadeta
Grioni Matias
Grobol Loïc
Grūzītis Normunds
Guillaume Bruno
Guiller Kirian
Guillot-Barbance Céline
Gärdenfors Moa
Gómez Guinovart Xavier
Gökırmak Memduh
Güngör Tunga
Habash Nizar
Hafsteinsson Hinrik
Hajič jr. Jan
Hajič Jan
Han Na-Rae
Hanifmuti Muhammad Yudistira
Harada Takahiro
Hardwick Sam
Harris Kim
Haug Dag
Heinecke Johannes
Hellwig Oliver
Hennig Felix
Hladká Barbora
Hlaváčová Jaroslava
Hociung Florinel
Hohle Petter
Huang Yidi
Huerta Mendez Marivel
Hwang Jena
Hà Mỹ Linh
Hämäläinen Mika
Ikeda Takumi
Ingason Anton Karl
Ion Radu
Irimia Elena
Ishola Ọlájídé
Islamaj Artan
Ito Kaoru
Jagodzińska Sandra
Jannat Siratun
Jelínek Tomáš
Jha Apoorva
Jiang Katharine
Johannsen Anders
Juutinen Markus
Jónsdóttir Hildur
Jørgensen Fredrik
Kabaeva Nadezhda
Kahane Sylvain
Kanayama Hiroshi
Kanerva Jenna
Kara Neslihan
Karahóǧa Ritván
Kayadelen Tolga
Kaşıkara Hüner
Kengatharaiyer Sarveswaran
Kettnerová Václava
Kharatyan Lilit
Kirchner Jesse
Klementieva Elena
Klyachko Elena
Kocharov Petr
Kopacewicz Kamil
Korkiakangas Timo
Koshevoy Alexey
Kotsyba Natalia
Kovalevskaitė Jolanta
Krek Simon
Krishnamurthy Parameswari
Kuqi Adrian
Kuyrukçu Oğuzhan
Kuzgun Aslı
Kwak Sookyoung
Kyle Kris
Kåsen Andre
Köhn Arne
Köksal Abdullatif
Köse Mehmet
Kübler Sandra
Laan Käbi
Laippala Veronika
Lambertino Lorenzo
Lando Tatiana
Larasati Septina Dian
Lavrentiev Alexei
Lee John
Lenci Alessandro
Lertpradit Saran
Leung Herman
Levina Maria
Levine Lauren
Li Cheuk Ying
Li Josie
Li Keying
Li Yixuan
Li Yuan
Lim KyungTae
Lima Padovani Bruna
Lin Yi-Ju Jessica
Lindén Krister
Liu Yang Janet
Ljubešić Nikola
Lobzhanidze Irina
Loginova Olga
Lopes Lucelene
Lusito Stefano
Luthfi Andry
Luukko Mikko
Lyashevskaya Olga
Lynn Teresa
Lê Hồng Phương
Macketanz Vivien
Mahamdi Menel
Maillard Jean
Makarchuk Ilya
Makazhanov Aibek
Mandl Michael
Manning Christopher
Manurung Ruli
Mareček David
Marheinecke Katrin
Markantonatou Stella
Martins André
Martins Cláudia
Martín Rodríguez Lorena
Martínez Alonso Héctor
Marşan Büşra
Matsuda Hiroshi
Matsumoto Yuji
Mazzei Alessandro
Mašek Jan
McDonald Ryan
McGuinness Sarah
Mendonça Gustavo
Merzhevich Tatiana
Miekka Niko
Miller Aaron
Mischenkova Karina
Missilä Anna
Mititelu Cătălin
Mitrofan Maria
Miyao Yusuke
Mojiri Foroushani AmirHossein
Molnár Judit
Moloodi Amirsaeid
Montemagni Simonetta
More Amir
Moreno Romero Laura
Moretti Giovanni
Mori Shinsuke
Morioka Tomohiko
Moro Shigeki
Mortensen Bjartur
Moskalevskyi Bohdan
Muischnek Kadri
Munro Robert
Murawaki Yugo
Müürisep Kaili
Mărănduc Cătălina
Nainwani Pinkey
Nakhlé Mariam
Navarro Horñiacek Juan Ignacio
Nedoluzhko Anna
Nevaci Manuela
Nešpore-Bērzkalne Gunta
Nguyễn Thị Minh Huyền
Nguyễn Thị Lương
Nikaido Yoshihiro
Nikolaev Vitaly
Nitisaroj Rattima
Nivre Joakim
Nourian Alireza
Nunes Maria das Graças Volpe
Nurmi Hanna
Ojala Stina
Ojha Atul Kr.
Olúòkun Adédayọ̀
Omura Mai
Onwuegbuzia Emeka
Ordan Noam
Osenova Petya
Paccosi Teresa
Palmero Aprosio Alessio
Panova Anastasia
Pardo Thiago Alexandre Salgueiro
Park Hyunji Hayley
Partanen Niko
Pascual Elena
Passarotti Marco
Patejuk Agnieszka
Paulino-Passos Guilherme
Pedonese Giulia
Peljak-Łapińska Angelika
Peng Siyao
Peng Siyao Logan
Pereira Rita
Pereira Sílvia
Perez Cenel-Augusto
Perkova Natalia
Perrier Guy
Petrov Slav
Petrova Daria
Peverelli Andrea
Phelan Jason
Pierre-Louis Claudel
Piitulainen Jussi
Pinter Yuval
Pinto Clara
Pintucci Rodrigo
Pirinen Tommi A
Pitler Emily
Plamada Magdalena
Plank Barbara
Poibeau Thierry
Ponomareva Larisa
Popel Martin
Pretkalniņa Lauma
Prokopidis Prokopis
Przepiórkowski Adam
Prévost Sophie
Pugh Robert
Puolakainen Tiina
Pyysalo Sampo
Qi Peng
Querido Andreia
Rademaker Alexandre
Rahoman Mizanur
Rama Taraka
Ramasamy Loganathan
Ramisch Carlos
Ramos Joana
Rashel Fam
Rasooli Mohammad Sadegh
Ravishankar Vinit
Real Livy
Rebeja Petru
Reddy Siva
Regnault Mathilde
Rehm Georg
Riabi Arij
Riabov Ivan
Rießler Michael
Rimkutė Erika
Rinaldi Larissa
Rituma Laura
Rizqiyah Putri
Rocha Luisa
Roksandic Ivan
Romanenko Mykhailo
Rosa Rudolf
Rovati Davide
Rozonoyer Ben
Roșca Valentin
Rudina Olga
Rueter Jack
Rääbis Andriela
Rögnvaldsson Eiríkur
Rúnarsson Kristján
Sadde Shoval
Safari Pegah
Sahala Aleksi
Saleh Shadi
Salomoni Alessio
Samardžić Tanja
Samson Stephanie
Sanguinetti Manuela
Sanıyar Ezgi
Sartor Marta
Sasaki Mitsuya
Saulīte Baiba
Savary Agata
Sawanakunanon Yanin
Saxena Shefali
Scannell Kevin
Scarlata Salvatore
Schang Emmanuel
Schneider Nathan
Schuster Sebastian
Schwartz Lane
Seddah Djamé
Seeker Wolfgang
Seraji Mojgan
Shahzadi Syeda
Shen Mo
Shimada Atsuko
Shirasu Hiroyuki
Shishkina Yana
Shohibussirri Muh
Shvedova Maria
Siewert Janine
Sigurðsson Einar Freyr
Silva João
Silveira Aline
Silveira Natalia
Silveira Sara
Simi Maria
Simionescu Radu
Simkó Katalin
Simov Kiril
Sitchinava Dmitri
Sither Ted
Skachedubova Maria
Smith Aaron
Soares-Bastos Isabela
Solberg Per Erik
Sonnenhauser Barbara
Sourov Shafi
Sprugnoli Rachele
Stamou Vivian
Steingrímsson Steinþór
Stella Antonio
Stephen Abishek
Straka Milan
Strickland Emmett
Strnadová Jana
Suhr Alane
Sulestio Yogi Lesmana
Sulubacak Umut
Suzuki Shingo
Swanson Daniel
Szántó Zsolt
Särg Dage
Símonarson Haukur Barri
Taguchi Chihiro
Taji Dima
Tamburini Fabio
Tan Mary Ann C.
Tanaka Takaaki
Tanaya Dipta
Tavoni Mirko
Tella Samson
Tellier Isabelle
Testori Marinella
Thomas Guillaume
Tonelli Sara
Torga Liisi
Toska Marsida
Trosterud Trond
Trukhina Anna
Tsarfaty Reut
Tyers Francis
Türk Utku
Uematsu Sumire
Untilov Roman
Urešová Zdeňka
Uria Larraitz
Uszkoreit Hans
Utka Andrius
Vagnoni Elena
Vajjala Sowmya
Vak Socrates
van der Goot Rob
van Niekerk Daniel
van Noord Gertjan
Vanhove Martine
Varga Viktor
Vedenina Uliana
Venturi Giulia
Villemonte de la Clergerie Eric
Vincze Veronika
Vlasova Natalia
Wakasa Aya
Wallenberg Joel C.
Wallin Lars
Walsh Abigail
Washington Jonathan North
Wendt Maximilan
Widmer Paul
Wigderson Shira
Wijono Sri Hartati
Wille Vanessa Berwanger
Williams Seyi
Wirén Mats
Wittern Christian
Woldemariam Tsegay
Wong Tak-sum
Wróblewska Alina
Wu Qishen
Yako Mary
Yamashita Kayo
Yamazaki Naoki
Yan Chunxiao
Yasuoka Koichi
Yavrumyan Marat M.
Yenice Arife Betül
Yıldız Olcay Taner
Yu Zhuoran
Yuliawati Arlisa
Zahra Shorouq
Zeldes Amir
Zeman Daniel
Zhou He
Zhu Hanzhi
Zhu Yilun
Zhuravleva Anna
Ziane Rayan
Ásgeirsdóttir Katla
Çetinoğlu Özlem
Çöltekin Çağrı
Óladóttir Hulda
Östling Robert
Özateş Şaziye Betül
Özgür Arzucan
Öztürk Başaran Balkız
Özçelik Merve
Øvrelid Lilja
Þorsteinsson Vilhjálmur
Þórðarson Sveinbjörn
Čéplö Slavomír
Šimková Mária
Žabokrtský Zdeněk
Publication venue: Universal Dependencies Consortium
Publication date: 01/01/2018
Field of study

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008)

HAL-ENS-LYON

HAL-UJM

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

HAL AMU

HAL Clermont Université

INRIA a CCSD electronic archive server