Search CORE

29,380 research outputs found

Analyses of single marker and pairwise effects of candidate loci for rheumatoid arthritis using logistic regression and random forests

Author: Chubb Daniel
Glaser Beate
Hamshere Marian L
Holmans Peter
Moskvina Valentina
Nikolov Ivan
Segurado Ricardo
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Using parametric and nonparametric techniques, our study investigated the presence of single locus and pairwise effects between 20 markers of the Genetic Analysis Workshop 15 (GAW15) North American Rheumatoid Arthritis Consortium (NARAC) candidate gene data set (Problem 2), analyzing 463 independent patients and 855 controls. Specifically, our work examined the correspondence between logistic regression (LR) analysis of single-locus and pairwise interaction effects, and random forest (RF) single and joint importance measures. For this comparison, we selected small but stable RFs (500 trees), which showed strong correlations (r~0.98) between their importance measures and those by RFs grown on 5000 trees. Both RF importance measures captured most of the LR single-locus and pairwise interaction effects, while joint importance measures also corresponded to full LR models containing main and interaction effects. We furthermore showed that RF measures were particularly sensitive to data imputation. The most consistent pairwise effect on rheumatoid arthritis was found between two markers within MAP3K7IP2/SUMO4 on 6q25.1, although LR and RFs assigned different significance levels

Pairwise accelerated failure time models for infectious disease transmission with external sources of infection

Author: Kenah Eben
Sharker Yushuf
Publication venue
Publication date: 24/04/2019
Field of study

Pairwise survival analysis handles dependent happenings in infectious disease transmission data by analyzing failure times in ordered pairs of individuals. The contact interval in the pair

ij

is the time from the onset of infectiousness in

i

to infectious contact from

i

j

, where an infectious contact is sufficient to infect

j

if he or she is susceptible. The contact interval distribution determines transmission probabilities and the infectiousness profile of infected individuals. Many important questions in infectious disease epidemiology involve the effects of covariates (e.g., age or vaccination status) on transmission. Here, we generalize earlier pairwise methods in two ways: First, we introduce an accelerated failure time model that allows the contact interval rate parameter to depend on infectiousness covariates for

i

, susceptibility covariates for

j

, and pairwise covariates. Second, we show how internal infections (caused by individuals under observation) and external infections (caused environmental or community sources) can be handled simultaneously. In simulations, we show that these methods produce valid point and interval estimates and that accounting for external infections is critical to consistent estimation. Finally, we use these methods to analyze household surveillance data from Los Angeles County during the 2009 influenza A(H1N1) pandemic.Comment: 24 pages, 4 figure

arXiv.org e-Print Archive

A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics

Author: Pandey Gaurav
Whalen Sean
Publication venue
Publication date: 19/09/2013
Field of study

The combination of multiple classifiers using ensemble methods is increasingly important for making progress in a variety of difficult prediction problems. We present a comparative analysis of several ensemble methods through two case studies in genomics, namely the prediction of genetic interactions and protein functions, to demonstrate their efficacy on real-world datasets and draw useful conclusions about their behavior. These methods include simple aggregation, meta-learning, cluster-based meta-learning, and ensemble selection using heterogeneous classifiers trained on resampled data to improve the diversity of their predictions. We present a detailed analysis of these methods across 4 genomics datasets and find the best of these methods offer statistically significant improvements over the state of the art in their respective domains. In addition, we establish a novel connection between ensemble selection and meta-learning, demonstrating how both of these disparate methods establish a balance between ensemble diversity and performance.Comment: 10 pages, 3 figures, 8 tables, to appear in Proceedings of the 2013 International Conference on Data Minin

arXiv.org e-Print Archive

Ranking relations using analogies in biological and information networks

Author: Airoldi EM
Ghahramani Z
Heller K
Silva R
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/06/2010
Field of study

Analogical reasoning depends fundamentally on the ability to learn and generalize about relations between objects. We develop an approach to relational learning which, given a set of pairs of objects

\mathbf{S}=\{A^{(1)}:B^{(1)},A^{(2)}:B^{(2)},\ldots,A^{(N)}:B ^{(N)}\}

, measures how well other pairs A:B fit in with the set

\mathbf{S}

. Our work addresses the following question: is the relation between objects A and B analogous to those relations found in

\mathbf{S}

? Such questions are particularly relevant in information retrieval, where an investigator might want to search for analogous pairs of objects that match the query set of interest. There are many ways in which objects can be related, making the task of measuring analogies very challenging. Our approach combines a similarity measure on function spaces with Bayesian analysis to produce a ranking. It requires data containing features of the objects of interest and a link matrix specifying which relationships exist; no further attributes of such relationships are necessary. We illustrate the potential of our method on text analysis and information networks. An application on discovering functional interactions between pairs of proteins is discussed in detail, where we show that our approach can work in practice even if a small set of protein pairs is provided.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS321 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Network Psychometrics

Author: Aggen
Agresti
Anderson
Anderson
Anderson
Barabási
Besag
Birnbaum
Bollen
Borkulo
Borsboom
Borsboom
Borsboom
Borsboom
Bühlmann
Bühlmann
Chalmers
Chandrasekaran
Chen
Costantini
Cox
Cox
Cramer
Cramer
Cressie
Csardi
Dryden
Edwards
Ellis
Epskamp
Fischer
Fitzmaurice
Foygel
Fried
Friedman
Friedman
Green
Haberman
Holland
Howell
Ising
Jensen
Kac
Kindermann
Kolaczyk
Lauritzen
Lee
Leemput
Lin
Liu
Liu
Maas
Markus
Marsman
McCrae
McDonald
Meinshausen
Meinshausen
Mellenbergh
Meredith
Mulaik
Murphy
Murray
Møller
Olkin
Pearl
R Core Team
Rasch
Ravikumar
Reckase
Reichenbach
Reise
Scheffer
Sebastiani
Spearman
Tibshirani
Wainwright
Whittaker
Wickens
Zhao
Zou
Publication venue
Publication date: 01/01/2018
Field of study

This chapter provides a general introduction of network modeling in psychometrics. The chapter starts with an introduction to the statistical model formulation of pairwise Markov random fields (PMRF), followed by an introduction of the PMRF suitable for binary data: the Ising model. The Ising model is a model used in ferromagnetism to explain phase transitions in a field of particles. Following the description of the Ising model in statistical physics, the chapter continues to show that the Ising model is closely related to models used in psychometrics. The Ising model can be shown to be equivalent to certain kinds of logistic regression models, loglinear models and multi-dimensional item response theory (MIRT) models. The equivalence between the Ising model and the MIRT model puts standard psychometrics in a new light and leads to a strikingly different interpretation of well-known latent variable models. The chapter gives an overview of methods that can be used to estimate the Ising model, and concludes with a discussion on the interpretation of latent variables given the equivalence between the Ising model and MIRT.Comment: In Irwing, P., Hughes, D., and Booth, T. (2018). The Wiley Handbook of Psychometric Testing, 2 Volume Set: A Multidisciplinary Reference on Survey, Scale and Test Development. New York: Wile

arXiv.org e-Print Archive