Search CORE

36 research outputs found

Automatic identification of variables in epidemiological datasets using logic regression

Author: Abdi N.A. (Negin Ashtiani)
Agewall Stefan
Amato M. (Mauro)
Bae J.-H. (Jang-Ho)
Baldassarre Damiano
Beloqui O. (Oscar)
Berenson G. (Gerald)
Bergstrom Goran
Bevc S. (Sebastjan)
Bickel Horst
Bokemark Lena
Bots Michiel
Bulbul Alpaslan
Castelnuovo S. (Samuela)
Catapano A. (Alberico)
Catapano Alberico
Chien K.-L. (Kuo-Liong)
Dekker Jacqueline
Desvarieux Moise
Dimitriadis C. (Chrystosomos)
Ducimetiere P.
Dörr Marcus
Ekart R. (Robert)
Empana Jean Philippe
Engström G.
Ezhov Marat
Franco Oscar
Frauchiger B. (Beat)
Friera Alfonsa
Gabriel R. (Rafael)
Grigore Liliana
Hedblad Bo
Hofman Albert
Hojs R. (Radovan)
Iglseder Bernhard
Ikram Arfan
Jovanovic A. (Aleksandar)
Kablak-Ziembicka A. (Anna)
Kato A. (Akihiko)
Kauhanen Jussi
Kavousi Maryam
Kiechl Stefan
Kitagawa K. (Kazuo)
Landecho M.F. (Manuel F.)
Lazarevic T. (Tatjana)
Lee M.-S. (Moo-Sik)
Lin H.-J. (Hung-Ju)
Lind Lars
Liu J. (Jing)
Lorenz Matthias W.
McLachlan Stela
Nijpels Giel
Norata Giuseppe
Okazaki S. (Shuhei)
Orth A. (Andreas)
Papagianni Aikaterini
Park H.W. (Hyun Woong)
Pflug Anja
Plichart Matthieu
Polak Joseph F.
Poppert Holger
Price J.F. (Jackie F.)
Przewlocki T. (Tadeusz)
Robertson Christine M
Ronkainen Kimmo
Rosvall Maria
Rundek Tatjana
Sacco R.L. (Ralph L.)
Sander D. (Dirk)
Scheckenbach Frank
Schmidt Caroline
Schminke Ulf
Sirtori C.R. (Cesare R.)
Sitzer Matthias
Srinivasan S.R. (Sathanur R.)
Staub D. (Daniel)
Stehouwer Coen
Steinmetz helmuth
Stolic R. (Radojica)
Su T.-C. (Ta-Chen)
Suarez C. (Carmen)
Tremoli Elena
Tripepi Giovanni
Tuomainen T.-P. (Tomi-Pekka)
Uthoff H. (Heiko)
Veglia Fabrizio
Völzke Henry
Willeit Johann
Willeit Johann
Xie Wuxiang
Yanez D.N. (David N.)
Zhao D. (Dong)
Zoccali Carmine
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

textabstractBackground: For an individual participant data (IPD) meta-analysis, multiple datasets must be transformed in a consistent format, e.g. using uniform variable names. When large numbers of datasets have to be processed, this can be a time-consuming and error-prone task. Automated or semi-automated identification of variables can help to reduce the workload and improve the data quality. For semi-automation high sensitivity in the recognition of matching variables is particularly important, because it allows creating software which for a target variable presents a choice of source variables, from which a user can choose the matching one, with only low risk of having missed a correct source variable. Methods: For each variable in a set of target variables, a number of simple rules were manually created. With logic regression, an optimal Boolean combination of these rules was searched for every target variable, using a random subset of a large database of epidemiological and clinical cohort data (construction subset). In a second subset of this database (validation subset), this optimal combination rules were validated. Results: In the construction sample, 41 target variables were allocated on average with a positive predictive value (PPV) of 34%, and a negative predictive value (NPV) of 95%. In the validation sample, PPV was 33%, whereas NPV remained at 94%. In the construction sample, PPV was 50% or less in 63% of all variables, in the validation sample in 71% of all variables. Conclusions: We demonstrated that the application of logic regression in a complex data management task in large epidemiological IPD meta-analyses is feasible. However, the performance of the algorithm is poor, which may require backup strategies

Directory of Open Access Journals

Edinburgh Research Explorer

University of Miami: Scholarship Miami

Erasmus University Digital Repository

NORA - Norwegian Open Research Archives

espace@Curtin

Crossref

AIR Universita degli studi di Milano

PubMed Central

EUR Research Repository

Utrecht University Repository

Leicester Research Archive