Search CORE

6 research outputs found

The Split Matters: Flat Minima Methods for Improving the Performance of GNNs

Author: Lell Nicolas
Scherp Ansgar
Publication venue
Publication date: 15/06/2023
Field of study

When training a Neural Network, it is optimized using the available training data with the hope that it generalizes well to new or unseen testing data. At the same absolute value, a flat minimum in the loss landscape is presumed to generalize better than a sharp minimum. Methods for determining flat minima have been mostly researched for independent and identically distributed (i. i. d.) data such as images. Graphs are inherently non-i. i. d. since the vertices are edge-connected. We investigate flat minima methods and combinations of those methods for training graph neural networks (GNNs). We use GCN and GAT as well as extend Graph-MLP to work with more layers and larger graphs. We conduct experiments on small and large citation, co-purchase, and protein datasets with different train-test splits in both the transductive and inductive training procedure. Results show that flat minima methods can improve the performance of GNN models by over 2 points, if the train-test split is randomized. Following Shchur et al., randomized splits are essential for a fair evaluation of GNNs, as other (fixed) splits like 'Planetoid' are biased. Overall, we provide important insights for improving and fairly evaluating flat minima methods on GNNs. We recommend practitioners to always use weight averaging techniques, in particular EWA when using early stopping. While weight averaging techniques are only sometimes the best performing method, they are less sensitive to hyperparameters, need no additional training, and keep the original model unchanged. All source code is available in https://github.com/Foisunt/FMMs-in-GNNs

arXiv.org e-Print Archive

Memorization of Named Entities in Fine-tuned BERT Models

Author: Diera Andor
Garifullina Aygul
Lell Nicolas
Scherp Ansgar
Publication venue
Publication date: 10/10/2023
Field of study

Privacy preserving deep learning is an emerging field in machine learning that aims to mitigate the privacy risks in the use of deep neural networks. One such risk is training data extraction from language models that have been trained on datasets, which contain personal and privacy sensitive information. In our study, we investigate the extent of named entity memorization in fine-tuned BERT models. We use single-label text classification as representative downstream task and employ three different fine-tuning setups in our experiments, including one with Differentially Privacy (DP). We create a large number of text samples from the fine-tuned BERT models utilizing a custom sequential sampling strategy with two prompting strategies. We search in these samples for named entities and check if they are also present in the fine-tuning datasets. We experiment with two benchmark datasets in the domains of emails and blogs. We show that the application of DP has a detrimental effect on the text generation capabilities of BERT. Furthermore, we show that a fine-tuned BERT does not generate more named entities specific to the fine-tuning dataset than a BERT model that is pre-trained only. This suggests that BERT is unlikely to emit personal or privacy sensitive named entities. Overall, our results are important to understand to what extent BERT-based services are prone to training data extraction attacks.Comment: accepted at CD-MAKE 202

arXiv.org e-Print Archive

Reducing a Set of Regular Expressions and Analyzing Differences of Domain-specific Statistic Reporting

Author: Hoffmann Marcel
Kalmbach Tobias
Lell Nicolas
Scherp Ansgar
Publication venue
Publication date: 24/11/2022
Field of study

Due to the large amount of daily scientific publications, it is impossible to manually review each one. Therefore, an automatic extraction of key information is desirable. In this paper, we examine STEREO, a tool for extracting statistics from scientific papers using regular expressions. By adapting an existing regular expression inclusion algorithm for our use case, we decrease the number of regular expressions used in STEREO by about

33.8\%

. We reveal common patterns from the condensed rule set that can be used for the creation of new rules. We also apply STEREO, which was previously trained in the life-sciences and medical domain, to a new scientific domain, namely Human-Computer-Interaction (HCI), and re-evaluate it. According to our research, statistics in the HCI domain are similar to those in the medical domain, although a higher percentage of APA-conform statistics were found in the HCI domain. Additionally, we compare extraction on PDF and LaTeX source files, finding LaTeX to be more reliable for extraction

arXiv.org e-Print Archive

Reconstructing Native American Population History

Author: A Kitchen
AL Price
Alejandra V. Contreras
Alkes L. Price
Amanda Maestre
Andrés Ruiz-Linares
Anna Di Rienzo
Antonio Salas
Arti Tandon
Carla Gallo
Carlos Aguilar-Salinas
Claudia Moreau
Claudio M. Bravi
Constanza Duque
D Reich
Damian Labuda
Daniel Corach
David B. Witonsky
David Pauls
David Reich
Desmond Campbell
DH Alexander
DH O’Rourke
DJ Meltzer
E Tamm
Francisco M. Salzano
Francisco Rothhammer
Gabriel Bedoya
Georges Larrouy
Gerardo Jimenez-Sanchez
Giovanni Poletti
Gorka Alkorta-Aranburu
Graciela Bailliet
Irma Silva-Zolezzi
JD Wall
Jean-Michel Dugoujon
JH Greenberg
JH Greenberg
JT Lell
Juan C. Dib
Juan Carlos Fernandez-Lopez
Judith Kidd
Julio Molina
K Bryc
K Liu
KB Schroeder
Kenneth Kidd
L Campbell
Laura Riba
Laurent Excoffier
LL Cavalli-Sforza
Ludmila Osipova
Luis F. García
M Balter
M Rasmussen
M Ruhlen
Mardia Lopez-Alarcón
Maria Cátira Bortolini
Maria José Gómez-Vázquez
Maria V. Parra
Maricela Rodríguez-Cruz
María Luiza Petzl-Erler
MC Bortolini
MD Brown
Mercedes Villena
N Patterson
N Ray
N Ray
N Saitou
Natalia Mesa
Nelson B. Freimer
Nick Patterson
Nicolas Ray
NJ Fagundes
NN Yang
NV Volodko
Omar Triana
R Cooke
Ramiro Barrantes
Ramón Coral-Vazquez
Rem I. Sukernik
René Vasquez
S de Azevedo
S Wang
S Wang
Samuel Canizales-Quinteros
Sardana A. Fedorova
Silvia Blair
SR Browning
Stéphane Mazieres
T Goebel
TD Dillehay
Teresa Tusié-Luna
Thelma Canto-Cetina
TM Karafet
Tábita Hünemeier
UA Perego
Victor Acuña-Alonzo
William Klitz
Winston Rojas
Ángel Carracedo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2012
Field of study

The peopling of the Americas has been the subject of extensive genetic, archaeological and linguistic research; however, central questions remain unresolved1–5. One contentious issue is whether the settlement occurred via a single6–8 or multiple streams of migration from Siberia9–15. The pattern of dispersals within the Americas is also poorly understood. To address these questions at higher resolution than was previously possible, we assembled data from 52 Native American and 17 Siberian groups genotyped at 364,470 single nucleotide polymorphisms. We show that Native Americans descend from at least three streams of Asian gene flow. Most descend entirely from a single ancestral population that we call “First American”. However, speakers of Eskimo-Aleut languages from the Arctic inherit almost half their ancestry from a second stream of Asian gene flow, and the Na-Dene-speaking Chipewyan from Canada inherit roughly one-tenth of their ancestry from a third stream. We show that the initial peopling followed a southward expansion facilitated by the coast, with sequential population splits and little gene flow after divergence, especially in South America. A major exception is in Chibchan-speakers on both sides of the Panama Isthmus, who have ancestry from both North and South America

Crossref

Harvard University - DASH

HAL AMU

HAL Descartes

PubMed Central

eScholarship - University of California

Enlighten

Hal-Diderot

Reconstructing Native American population history

Author: A Kitchen
AL Price
Alejandra V. Contreras
Alkes L. Price
Amanda Maestre
Andrés Ruiz-Linares
Anna Di Rienzo
Antonio Salas
Arti Tandon
Carla Gallo
Carlos Aguilar-Salinas
Claudia Moreau
Claudio M. Bravi
Constanza Duque
D Reich
Damian Labuda
Daniel Corach
David B. Witonsky
David Pauls
David Reich
Desmond Campbell
DH Alexander
DH O’Rourke
DJ Meltzer
E Tamm
Francisco M. Salzano
Francisco Rothhammer
Gabriel Bedoya
Georges Larrouy
Gerardo Jimenez-Sanchez
Giovanni Poletti
Gorka Alkorta-Aranburu
Graciela Bailliet
Irma Silva-Zolezzi
JD Wall
Jean-Michel Dugoujon
JH Greenberg
JH Greenberg
JT Lell
Juan C. Dib
Juan Carlos Fernandez-Lopez
Judith Kidd
Julio Molina
K Bryc
K Liu
KB Schroeder
Kenneth Kidd
L Campbell
Laura Riba
Laurent Excoffier
LL Cavalli-Sforza
Ludmila Osipova
Luis F. García
M Balter
M Rasmussen
M Ruhlen
Mardia Lopez-Alarcón
Maria Cátira Bortolini
Maria José Gómez-Vázquez
Maria V. Parra
Maricela Rodríguez-Cruz
María Luiza Petzl-Erler
MC Bortolini
MD Brown
Mercedes Villena
N Patterson
N Ray
N Ray
N Saitou
Natalia Mesa
Nelson B. Freimer
Nick Patterson
Nicolas Ray
NJ Fagundes
NN Yang
NV Volodko
Omar Triana
R Cooke
Ramiro Barrantes
Ramón Coral-Vazquez
Rem I. Sukernik
René Vasquez
S de Azevedo
S Wang
S Wang
Samuel Canizales-Quinteros
Sardana A. Fedorova
Silvia Blair
SR Browning
Stéphane Mazieres
T Goebel
TD Dillehay
Teresa Tusié-Luna
Thelma Canto-Cetina
TM Karafet
Tábita Hünemeier
UA Perego
Victor Acuña-Alonzo
William Klitz
Winston Rojas
Ángel Carracedo
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A Framework for the Initial Occupation of the Americas

Author: Aceituno Francisco J.
Achilli Alessandro
Adovasio James M.
Adovasio James M.
Adovasio James M.
Anderson David G.
Anderson David G.
Anderson David G.
Anderson David G.
Bar-Yosef Ofer
Barton C. Michael
Beck Charlotte
Beck Charlotte
Beck Charlotte
Beck Charlotte
Beck Charlotte
Beck Charlotte
Bement Leland C.
Bettinger Robert L.
Bever Michael R.
Bodner Martin
Bonatto Sandro L.
Borrero Luis A.
Boëda Eric
Bradley Bruce A.
Bradley Bruce A.
Brantingham P. Jeffrey
Brantingham P. Jeffrey
Brigham-Grette Julie
Brown Michael D.
Bryan Alan L.
Bryan Alan L.
Bryan Alan L.
Bueno Lucas
Cannon Michael D.
Chatters James C.
Clague John J.
Clague John J.
Clark Peter U.
Collins Michael B.
Collins Michael B.
Collins Michael B.
Collins Michael B.
Collins Michael B.
Collins Michael B.
Collins Michael B.
Cooke Richard
Coutinho Alexandra
Davis Loren G.
Davis Loren G.
Davis Loren G.
Davis Loren G.
Davis Loren G.
Derenko Miroslava
Derevianko Anatoly P.
Deschamps Pierre
Dillehay Thomas D.
Dillehay Thomas D.
Dillehay Thomas D.
Dillehay Tom D.
Dillehay Tom D.
Dillehay Tom D.
Dillehay Tom D.
Dincauze Dena F.
Dixon E. James
Dixon E. James
Dixon E. James
Dixon E. James
Dolukhanov Pavel M.
Drummond Alexei J.
Dulik Matthew C.
Dyke A. S.
Dyke Arthur S.
Erlandson Jon M.
Erlandson Jon M.
Erlandson Jon M.
Erlandson Jon M.
Fagundes Nelson J. R.
Faith J. Tyler
Fariña Richard A.
Fedje Daryl W.
Fedje Daryl W.
Fenner Jack N.
Fernandes Verónica
Ferring C. Reid
Fiedel Stuart J.
Fladmark Knut R.
Flegenheimer Nora
Frison George C.
Fu Qiaomei
Gibbons Ann
Gilbert M. Thomas P.
Gilbert M. Thomas P.
Gladyshev S.A.
Goebel Ted
Goebel Ted
Goebel Ted
Goebel Ted
Goebel Ted
Goebel Ted
Goldberg Paul
Gowan Evan
Graf Kelly E.
Graf Kelly E.
Graf Kelly E.
Grayson Donald K.
Grayson Donald K.
Grayson Donald K.
Gregoire Lauren J.
Gruhn Ruth
Gruhn Ruth
Gutiérrez María A.
Hamilton Marcus J.
Haury Emil W.
Haynes C. Vance
Haynes C. Vance
Haynes C. Vance
Haynes C. Vance
Haynes C. Vance
Haynes Gary
Haynes Gary A.
Haynes Gary A.
Haynes Gary A.
Hazelwood Lee
Hemmer Helmut
Hey Jody
Hipsley Christy A.
Hoffecker John F.
Holen Steven R.
Holen Steven R.
Ikawa-Smith Fumiko
Ives John W.
Jablonski Nina G.
Jackson Donald
Jackson Lionel E.
Jenkins Dennis L.
Jenkins Dennis L.
Jenkins Dennis L.
Jertberg Patricia M.
Jones George T.
Joyce Daniel J.
Kaufman Darrell S.
Keates Susan G.
Keefer David K.
Kelly Robert L.
Kemp Brian M.
Kemp Brian M.
Kong Augustine
Kornfeld Marcel
Kornfeld Marcel
Krause Johannes
Lahaye C.
Lahaye Christelle
Lavallée Danièle
Lavallée Danièle
Lazaridis Iosif
Lell Jeffrey T.
Lisiecki Lorraine E.
Llagostera Agustin
Lourdeau Antoine
Löfverström M.
MacDonald Stephen O.
Macphail Richard I.
Madsen David B.
Madsen David B.
Madsen David B.
Mandryk Carole A.
Martin Paul S.
Meirav Meiri
Meltzer David J.
Meltzer David J.
Meltzer David J.
Miotti Laura
Misarti Nicole
Morgan Craig
Morrow Juliet E.
Nikolskiy Pavel
O'Brien Michael J.
O'Rourke Dennis H.
Oppenheimer Stephen
Overstreet David F.
Pausata F. S. R.
Pearson Georges A.
Perego Ugo A.
Peros Matthew C.
Pickrell Joseph
Pitblado Bonnie L.
Pitulko Vladimir Victorovich
Poinar Hendrik
Prasciunas Mary M.
Prescott Graham W.
Rabinow Paul
Rademaker Kurt
Raff Jennifer A.
Raff Jennifer A.
Raghavan Maanasa
Rasmussen Morten
Rasmussen Morten
Ray Nicolas
Reeder Leslie A.
Reich David
Reidla Maere
Rieux Adrien
Rothhammer Francisco
Sain Douglas A.
Sanchez Guadalupe
Sandweiss Daniel H.
Scally Aylwyn
Schurr Theodore G.
Schurr Theodore G.
Shafer Aaron
Smith David Glenn
Soares Pedro
Southerton Simon G.
Speth John D.
Stanford Dennis
Stanford Dennis
Stanford Dennis J.
Starikovskaya Elena B.
Steele James
Stothert Karen E.
Sun James X.
Surovell Todd A.
Tamm Erika
Tarasov Lev
Thomas Gregg W. C.
Torroni Antonio
Toth Nicholas
Vasil'ev Sergey A.
Vettoretti Guido
Volodko Natalia V.
Waguespack Nicole M.
Waguespack Nicole M.
Wang Y. J.
Waters Michael R.
Waters Michael R.
Waters Michael R.
Waters Michael R.
Webb Sawney David
Westley Kieran
Publication venue: 'Maney Publishing'
Publication date
Field of study

Crossref