284,705 research outputs found
Kernel Machine SNP-Set Testing Under Multiple Candidate Kernels
Joint testing for the cumulative effect of multiple single nucleotide polymorphisms grouped on the basis of prior biological knowledge has become a popular and powerful strategy for the analysis of large scale genetic association studies. The kernel machine (KM) testing framework is a useful approach that has been proposed for testing associations between multiple genetic variants and many different types of complex traits by comparing pairwise similarity in phenotype between subjects to pairwise similarity in genotype, with similarity in genotype defined via a kernel function. An advantage of the KM framework is its flexibility: choosing different kernel functions allows for different assumptions concerning the underlying model and can allow for improved power. In practice, it is difficult to know which kernel to use a priori since this depends on the unknown underlying trait architecture and selecting the kernel which gives the lowest p-value can lead to inflated type I error. Therefore, we propose practical strategies for KM testing when multiple candidate kernels are present based on constructing composite kernels and based on efficient perturbation procedures. We demonstrate through simulations and real data applications that the procedures protect the type I error rate and can lead to substantially improved power over poor choices of kernels and only modest differences in power versus using the best candidate kernel
Human-Level Performance on Word Analogy Questions by Latent Relational Analysis
This paper introduces Latent Relational Analysis (LRA), a method for measuring relational similarity. LRA has potential applications in many areas, including information extraction, word sense disambiguation, machine translation, and information retrieval. Relational similarity is correspondence between relations, in contrast with attributional similarity, which is correspondence between attributes. When two words have a high degree of attributional similarity, we call them synonyms. When two pairs of words have a high degree of relational similarity, we say that their relations are analogous. For example, the word pair mason/stone is analogous to the pair carpenter/wood; the relations between mason and stone are highly similar to the relations between carpenter and wood. Past work on semantic similarity measures has mainly been concerned with attributional similarity. For instance, Latent Semantic Analysis (LSA) can measure the degree of similarity between two words, but not between two relations. Recently the Vector Space Model (VSM) of information retrieval has been adapted to the task of measuring relational similarity, achieving a score of 47% on a collection of 374 college-level multiple-choice word analogy questions. In the VSM approach, the relation between a pair of words is characterized by a vector of frequencies of predefined patterns in a large corpus. LRA extends the VSM approach in three ways: (1) the patterns are derived automatically from the corpus (they are not predefined), (2) the Singular Value Decomposition (SVD) is used to smooth the frequency data (it is also used this way in LSA), and (3) automatically generated synonyms are used to explore reformulations of the word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the average human score of 57%. On the related problem of classifying noun-modifier relations, LRA achieves similar gains over the VSM, while using a smaller corpus
Adaptive Mantel Test for AssociationTesting in Imaging Genetics Data
Mantel's test (MT) for association is conducted by testing the linear
relationship of similarity of all pairs of subjects between two observational
domains. Motivated by applications to neuroimaging and genetics data, and
following the succes of shrinkage and kernel methods for prediction with
high-dimensional data, we here introduce the adaptive Mantel test as an
extension of the MT. By utilizing kernels and penalized similarity measures,
the adaptive Mantel test is able to achieve higher statistical power relative
to the classical MT in many settings. Furthermore, the adaptive Mantel test is
designed to simultaneously test over multiple similarity measures such that the
correct type I error rate under the null hypothesis is maintained without the
need to directly adjust the significance threshold for multiple testing. The
performance of the adaptive Mantel test is evaluated on simulated data, and is
used to investigate associations between genetics markers related to
Alzheimer's Disease and heatlhy brain physiology with data from a working
memory study of 350 college students from Beijing Normal University
Rao\u27s Quadratic Entropy and Some New Applications
Many problems in statistical inference are formulated as testing the diversity of populations. The entropy functions measure the similarity of a distribution function to the uniform distribution and hence can be used as a measure of diversity. Rao (1982a) proposed the concept of quadratic entropy. Its concavity property makes the decomposition similar to ANOVA for categorical data feasible. In this thesis, after reviewing the properties and providing a modification to quadratic entropy, various applications of quadratic entropy are explored. First, analysis of quadratic entropy with the suggested modification to analyze the contingency table data is explored. Then its application to ecological biodiversity is established by constructing practically equivalent confidence intervals. The methods are applied on a real dinosaur diversity data set and simulation experiments are performed to study the validity of the intervals. Quadratic entropy is also used for clustering multinomial data. Another application of quadratic entropy that is provided here is to test the association of two categorical variables with multiple responses. Finally, the gene expression data inspires another application of quadratic entropy in analyzing large scale data, where a hill-climbing type iterative algorithm is developed based on a new minimum quadratic entropy criterion. The algorithm is illustrated on both simulated and real data
Similarity of Semantic Relations
There are at least two kinds of similarity. Relational similarity is
correspondence between relations, in contrast with attributional similarity,
which is correspondence between attributes. When two words have a high
degree of attributional similarity, we call them synonyms. When two pairs
of words have a high degree of relational similarity, we say that their
relations are analogous. For example, the word pair mason:stone is analogous
to the pair carpenter:wood. This paper introduces Latent Relational Analysis (LRA),
a method for measuring relational similarity. LRA has potential applications in many
areas, including information extraction, word sense disambiguation,
and information retrieval. Recently the Vector Space Model (VSM) of information
retrieval has been adapted to measuring relational similarity,
achieving a score of 47% on a collection of 374 college-level multiple-choice
word analogy questions. In the VSM approach, the relation between a pair of words is
characterized by a vector of frequencies of predefined patterns in a large corpus.
LRA extends the VSM approach in three ways: (1) the patterns are derived automatically
from the corpus, (2) the Singular Value Decomposition (SVD) is used to smooth the frequency
data, and (3) automatically generated synonyms are used to explore variations of the
word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the
average human score of 57%. On the related problem of classifying semantic relations, LRA
achieves similar gains over the VSM
- …