284,705 research outputs found

    Kernel Machine SNP-Set Testing Under Multiple Candidate Kernels

    Get PDF
    Joint testing for the cumulative effect of multiple single nucleotide polymorphisms grouped on the basis of prior biological knowledge has become a popular and powerful strategy for the analysis of large scale genetic association studies. The kernel machine (KM) testing framework is a useful approach that has been proposed for testing associations between multiple genetic variants and many different types of complex traits by comparing pairwise similarity in phenotype between subjects to pairwise similarity in genotype, with similarity in genotype defined via a kernel function. An advantage of the KM framework is its flexibility: choosing different kernel functions allows for different assumptions concerning the underlying model and can allow for improved power. In practice, it is difficult to know which kernel to use a priori since this depends on the unknown underlying trait architecture and selecting the kernel which gives the lowest p-value can lead to inflated type I error. Therefore, we propose practical strategies for KM testing when multiple candidate kernels are present based on constructing composite kernels and based on efficient perturbation procedures. We demonstrate through simulations and real data applications that the procedures protect the type I error rate and can lead to substantially improved power over poor choices of kernels and only modest differences in power versus using the best candidate kernel

    Human-Level Performance on Word Analogy Questions by Latent Relational Analysis

    Get PDF
    This paper introduces Latent Relational Analysis (LRA), a method for measuring relational similarity. LRA has potential applications in many areas, including information extraction, word sense disambiguation, machine translation, and information retrieval. Relational similarity is correspondence between relations, in contrast with attributional similarity, which is correspondence between attributes. When two words have a high degree of attributional similarity, we call them synonyms. When two pairs of words have a high degree of relational similarity, we say that their relations are analogous. For example, the word pair mason/stone is analogous to the pair carpenter/wood; the relations between mason and stone are highly similar to the relations between carpenter and wood. Past work on semantic similarity measures has mainly been concerned with attributional similarity. For instance, Latent Semantic Analysis (LSA) can measure the degree of similarity between two words, but not between two relations. Recently the Vector Space Model (VSM) of information retrieval has been adapted to the task of measuring relational similarity, achieving a score of 47% on a collection of 374 college-level multiple-choice word analogy questions. In the VSM approach, the relation between a pair of words is characterized by a vector of frequencies of predefined patterns in a large corpus. LRA extends the VSM approach in three ways: (1) the patterns are derived automatically from the corpus (they are not predefined), (2) the Singular Value Decomposition (SVD) is used to smooth the frequency data (it is also used this way in LSA), and (3) automatically generated synonyms are used to explore reformulations of the word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the average human score of 57%. On the related problem of classifying noun-modifier relations, LRA achieves similar gains over the VSM, while using a smaller corpus

    Adaptive Mantel Test for AssociationTesting in Imaging Genetics Data

    Full text link
    Mantel's test (MT) for association is conducted by testing the linear relationship of similarity of all pairs of subjects between two observational domains. Motivated by applications to neuroimaging and genetics data, and following the succes of shrinkage and kernel methods for prediction with high-dimensional data, we here introduce the adaptive Mantel test as an extension of the MT. By utilizing kernels and penalized similarity measures, the adaptive Mantel test is able to achieve higher statistical power relative to the classical MT in many settings. Furthermore, the adaptive Mantel test is designed to simultaneously test over multiple similarity measures such that the correct type I error rate under the null hypothesis is maintained without the need to directly adjust the significance threshold for multiple testing. The performance of the adaptive Mantel test is evaluated on simulated data, and is used to investigate associations between genetics markers related to Alzheimer's Disease and heatlhy brain physiology with data from a working memory study of 350 college students from Beijing Normal University

    Rao\u27s Quadratic Entropy and Some New Applications

    Get PDF
    Many problems in statistical inference are formulated as testing the diversity of populations. The entropy functions measure the similarity of a distribution function to the uniform distribution and hence can be used as a measure of diversity. Rao (1982a) proposed the concept of quadratic entropy. Its concavity property makes the decomposition similar to ANOVA for categorical data feasible. In this thesis, after reviewing the properties and providing a modification to quadratic entropy, various applications of quadratic entropy are explored. First, analysis of quadratic entropy with the suggested modification to analyze the contingency table data is explored. Then its application to ecological biodiversity is established by constructing practically equivalent confidence intervals. The methods are applied on a real dinosaur diversity data set and simulation experiments are performed to study the validity of the intervals. Quadratic entropy is also used for clustering multinomial data. Another application of quadratic entropy that is provided here is to test the association of two categorical variables with multiple responses. Finally, the gene expression data inspires another application of quadratic entropy in analyzing large scale data, where a hill-climbing type iterative algorithm is developed based on a new minimum quadratic entropy criterion. The algorithm is illustrated on both simulated and real data

    Similarity of Semantic Relations

    Get PDF
    There are at least two kinds of similarity. Relational similarity is correspondence between relations, in contrast with attributional similarity, which is correspondence between attributes. When two words have a high degree of attributional similarity, we call them synonyms. When two pairs of words have a high degree of relational similarity, we say that their relations are analogous. For example, the word pair mason:stone is analogous to the pair carpenter:wood. This paper introduces Latent Relational Analysis (LRA), a method for measuring relational similarity. LRA has potential applications in many areas, including information extraction, word sense disambiguation, and information retrieval. Recently the Vector Space Model (VSM) of information retrieval has been adapted to measuring relational similarity, achieving a score of 47% on a collection of 374 college-level multiple-choice word analogy questions. In the VSM approach, the relation between a pair of words is characterized by a vector of frequencies of predefined patterns in a large corpus. LRA extends the VSM approach in three ways: (1) the patterns are derived automatically from the corpus, (2) the Singular Value Decomposition (SVD) is used to smooth the frequency data, and (3) automatically generated synonyms are used to explore variations of the word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the average human score of 57%. On the related problem of classifying semantic relations, LRA achieves similar gains over the VSM
    • …
    corecore