29,380 research outputs found
Analyses of single marker and pairwise effects of candidate loci for rheumatoid arthritis using logistic regression and random forests
Using parametric and nonparametric techniques, our study investigated the presence of single locus and pairwise effects between 20 markers of the Genetic Analysis Workshop 15 (GAW15) North American Rheumatoid Arthritis Consortium (NARAC) candidate gene data set (Problem 2), analyzing 463 independent patients and 855 controls. Specifically, our work examined the correspondence between logistic regression (LR) analysis of single-locus and pairwise interaction effects, and random forest (RF) single and joint importance measures. For this comparison, we selected small but stable RFs (500 trees), which showed strong correlations (r~0.98) between their importance measures and those by RFs grown on 5000 trees. Both RF importance measures captured most of the LR single-locus and pairwise interaction effects, while joint importance measures also corresponded to full LR models containing main and interaction effects. We furthermore showed that RF measures were particularly sensitive to data imputation. The most consistent pairwise effect on rheumatoid arthritis was found between two markers within MAP3K7IP2/SUMO4 on 6q25.1, although LR and RFs assigned different significance levels
Pairwise accelerated failure time models for infectious disease transmission with external sources of infection
Pairwise survival analysis handles dependent happenings in infectious disease
transmission data by analyzing failure times in ordered pairs of individuals.
The contact interval in the pair is the time from the onset of
infectiousness in to infectious contact from to , where an
infectious contact is sufficient to infect if he or she is susceptible. The
contact interval distribution determines transmission probabilities and the
infectiousness profile of infected individuals. Many important questions in
infectious disease epidemiology involve the effects of covariates (e.g., age or
vaccination status) on transmission. Here, we generalize earlier pairwise
methods in two ways: First, we introduce an accelerated failure time model that
allows the contact interval rate parameter to depend on infectiousness
covariates for , susceptibility covariates for , and pairwise covariates.
Second, we show how internal infections (caused by individuals under
observation) and external infections (caused environmental or community
sources) can be handled simultaneously. In simulations, we show that these
methods produce valid point and interval estimates and that accounting for
external infections is critical to consistent estimation. Finally, we use these
methods to analyze household surveillance data from Los Angeles County during
the 2009 influenza A(H1N1) pandemic.Comment: 24 pages, 4 figure
A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics
The combination of multiple classifiers using ensemble methods is
increasingly important for making progress in a variety of difficult prediction
problems. We present a comparative analysis of several ensemble methods through
two case studies in genomics, namely the prediction of genetic interactions and
protein functions, to demonstrate their efficacy on real-world datasets and
draw useful conclusions about their behavior. These methods include simple
aggregation, meta-learning, cluster-based meta-learning, and ensemble selection
using heterogeneous classifiers trained on resampled data to improve the
diversity of their predictions. We present a detailed analysis of these methods
across 4 genomics datasets and find the best of these methods offer
statistically significant improvements over the state of the art in their
respective domains. In addition, we establish a novel connection between
ensemble selection and meta-learning, demonstrating how both of these disparate
methods establish a balance between ensemble diversity and performance.Comment: 10 pages, 3 figures, 8 tables, to appear in Proceedings of the 2013
International Conference on Data Minin
Ranking relations using analogies in biological and information networks
Analogical reasoning depends fundamentally on the ability to learn and
generalize about relations between objects. We develop an approach to
relational learning which, given a set of pairs of objects
,
measures how well other pairs A:B fit in with the set . Our work
addresses the following question: is the relation between objects A and B
analogous to those relations found in ? Such questions are
particularly relevant in information retrieval, where an investigator might
want to search for analogous pairs of objects that match the query set of
interest. There are many ways in which objects can be related, making the task
of measuring analogies very challenging. Our approach combines a similarity
measure on function spaces with Bayesian analysis to produce a ranking. It
requires data containing features of the objects of interest and a link matrix
specifying which relationships exist; no further attributes of such
relationships are necessary. We illustrate the potential of our method on text
analysis and information networks. An application on discovering functional
interactions between pairs of proteins is discussed in detail, where we show
that our approach can work in practice even if a small set of protein pairs is
provided.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS321 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Network Psychometrics
This chapter provides a general introduction of network modeling in
psychometrics. The chapter starts with an introduction to the statistical model
formulation of pairwise Markov random fields (PMRF), followed by an
introduction of the PMRF suitable for binary data: the Ising model. The Ising
model is a model used in ferromagnetism to explain phase transitions in a field
of particles. Following the description of the Ising model in statistical
physics, the chapter continues to show that the Ising model is closely related
to models used in psychometrics. The Ising model can be shown to be equivalent
to certain kinds of logistic regression models, loglinear models and
multi-dimensional item response theory (MIRT) models. The equivalence between
the Ising model and the MIRT model puts standard psychometrics in a new light
and leads to a strikingly different interpretation of well-known latent
variable models. The chapter gives an overview of methods that can be used to
estimate the Ising model, and concludes with a discussion on the interpretation
of latent variables given the equivalence between the Ising model and MIRT.Comment: In Irwing, P., Hughes, D., and Booth, T. (2018). The Wiley Handbook
of Psychometric Testing, 2 Volume Set: A Multidisciplinary Reference on
Survey, Scale and Test Development. New York: Wile
- …