Search CORE

284,705 research outputs found

Kernel Machine SNP-Set Testing Under Multiple Candidate Kernels

Author: Armistead Paul M.
Engel Stephanie M.
Harmon Quaker E.
Lee Seunggeun
Lin Xinyi
Maity Arnab
Molldrem Jeffrey J.
Simmons Elizabeth M.
Wu Michael C.
Publication venue
Publication date: 01/01/2013
Field of study

Joint testing for the cumulative effect of multiple single nucleotide polymorphisms grouped on the basis of prior biological knowledge has become a popular and powerful strategy for the analysis of large scale genetic association studies. The kernel machine (KM) testing framework is a useful approach that has been proposed for testing associations between multiple genetic variants and many different types of complex traits by comparing pairwise similarity in phenotype between subjects to pairwise similarity in genotype, with similarity in genotype defined via a kernel function. An advantage of the KM framework is its flexibility: choosing different kernel functions allows for different assumptions concerning the underlying model and can allow for improved power. In practice, it is difficult to know which kernel to use a priori since this depends on the unknown underlying trait architecture and selecting the kernel which gives the lowest p-value can lead to inflated type I error. Therefore, we propose practical strategies for KM testing when multiple candidate kernels are present based on constructing composite kernels and based on efficient perturbation procedures. We demonstrate through simulations and real data applications that the procedures protect the type I error rate and can lead to substantially improved power over poor choices of kernels and only modest differences in power versus using the best candidate kernel

PubMed Central

Carolina Digital Repository

Human-Level Performance on Word Analogy Questions by Latent Relational Analysis

Author: Turney Peter D.
Publication venue
Publication date: 01/01/2004
Field of study

This paper introduces Latent Relational Analysis (LRA), a method for measuring relational similarity. LRA has potential applications in many areas, including information extraction, word sense disambiguation, machine translation, and information retrieval. Relational similarity is correspondence between relations, in contrast with attributional similarity, which is correspondence between attributes. When two words have a high degree of attributional similarity, we call them synonyms. When two pairs of words have a high degree of relational similarity, we say that their relations are analogous. For example, the word pair mason/stone is analogous to the pair carpenter/wood; the relations between mason and stone are highly similar to the relations between carpenter and wood. Past work on semantic similarity measures has mainly been concerned with attributional similarity. For instance, Latent Semantic Analysis (LSA) can measure the degree of similarity between two words, but not between two relations. Recently the Vector Space Model (VSM) of information retrieval has been adapted to the task of measuring relational similarity, achieving a score of 47% on a collection of 374 college-level multiple-choice word analogy questions. In the VSM approach, the relation between a pair of words is characterized by a vector of frequencies of predefined patterns in a large corpus. LRA extends the VSM approach in three ways: (1) the patterns are derived automatically from the corpus (they are not predefined), (2) the Singular Value Decomposition (SVD) is used to smooth the frequency data (it is also used this way in LSA), and (3) automatically generated synonyms are used to explore reformulations of the word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the average human score of 57%. On the related problem of classifying noun-modifier relations, LRA achieves similar gains over the VSM, while using a smaller corpus

arXiv.org e-Print Archive

NRC Publications Archive

Adaptive Mantel Test for AssociationTesting in Imaging Genetics Data

Author: Chen Chuansheng
Moyzis Robert
Ombao Hernando
Pluta Dustin
Xue Gui
Yu Zhaoxia
Publication venue
Publication date: 01/01/2018
Field of study

Mantel's test (MT) for association is conducted by testing the linear relationship of similarity of all pairs of subjects between two observational domains. Motivated by applications to neuroimaging and genetics data, and following the succes of shrinkage and kernel methods for prediction with high-dimensional data, we here introduce the adaptive Mantel test as an extension of the MT. By utilizing kernels and penalized similarity measures, the adaptive Mantel test is able to achieve higher statistical power relative to the classical MT in many settings. Furthermore, the adaptive Mantel test is designed to simultaneously test over multiple similarity measures such that the correct type I error rate under the null hypothesis is maintained without the need to directly adjust the significance threshold for multiple testing. The performance of the adaptive Mantel test is evaluated on simulated data, and is used to investigate associations between genetics markers related to Alzheimer's Disease and heatlhy brain physiology with data from a working memory study of 350 college students from Beijing Normal University

arXiv.org e-Print Archive

Frontiers - Publisher Connector

Institutional Repository Universiteit Antwerpen

FigShare

Rao\u27s Quadratic Entropy and Some New Applications

Author: Zhao Yueqin
Publication venue: ODU Digital Commons
Publication date: 01/04/2010
Field of study

Many problems in statistical inference are formulated as testing the diversity of populations. The entropy functions measure the similarity of a distribution function to the uniform distribution and hence can be used as a measure of diversity. Rao (1982a) proposed the concept of quadratic entropy. Its concavity property makes the decomposition similar to ANOVA for categorical data feasible. In this thesis, after reviewing the properties and providing a modification to quadratic entropy, various applications of quadratic entropy are explored. First, analysis of quadratic entropy with the suggested modification to analyze the contingency table data is explored. Then its application to ecological biodiversity is established by constructing practically equivalent confidence intervals. The methods are applied on a real dinosaur diversity data set and simulation experiments are performed to study the validity of the intervals. Quadratic entropy is also used for clustering multinomial data. Another application of quadratic entropy that is provided here is to test the association of two categorical variables with multiple responses. Finally, the gene expression data inspires another application of quadratic entropy in analyzing large scale data, where a hill-climbing type iterative algorithm is developed based on a new minimum quadratic entropy criterion. The algorithm is illustrated on both simulated and real data

Old Dominion University

Similarity of Semantic Relations

Author: Morris Jane
Peter D. Turney
Publication venue
Publication date: 01/01/2006
Field of study

There are at least two kinds of similarity. Relational similarity is correspondence between relations, in contrast with attributional similarity, which is correspondence between attributes. When two words have a high degree of attributional similarity, we call them synonyms. When two pairs of words have a high degree of relational similarity, we say that their relations are analogous. For example, the word pair mason:stone is analogous to the pair carpenter:wood. This paper introduces Latent Relational Analysis (LRA), a method for measuring relational similarity. LRA has potential applications in many areas, including information extraction, word sense disambiguation, and information retrieval. Recently the Vector Space Model (VSM) of information retrieval has been adapted to measuring relational similarity, achieving a score of 47% on a collection of 374 college-level multiple-choice word analogy questions. In the VSM approach, the relation between a pair of words is characterized by a vector of frequencies of predefined patterns in a large corpus. LRA extends the VSM approach in three ways: (1) the patterns are derived automatically from the corpus, (2) the Singular Value Decomposition (SVD) is used to smooth the frequency data, and (3) automatically generated synonyms are used to explore variations of the word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the average human score of 57%. On the related problem of classifying semantic relations, LRA achieves similar gains over the VSM

arXiv.org e-Print Archive

CiteSeerX

NRC Publications Archive

Crossref

CogPrints Cognitive Sciences Eprint Archive