83 research outputs found

    LD on chromosome 1 for the subpopulations, Northern Flint (red), stiff stalk (blue), non-stiff stalk (green), tropical (yellow), of the 282 association panel.

    No full text
    <p>White dot indicates median R<sup>2</sup> for each bin. This graph shows that there is more extended LD in Northern Flint than in other subpopulations.</p

    Genetic variance explained by respective model used for the association study.

    No full text
    <p>Genetic variance explained by respective model used for the association study.</p

    Genome-wide association results for flowering time.

    No full text
    <p>A. GWAS results for flowering time (days to silking) in the 282 association panel using genotyping by sequencing (GBS) and 55k SNPs. The Q+K mixed linear model was fitted at each SNP to account for population structure (Q) and kinship (K). Genome-wide association results using the naïve model and Q model in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003246#pgen.1003246.s001" target="_blank">Figures S1</a> and <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003246#pgen.1003246.s002" target="_blank">S2</a>. B. GWAS results for flowering time (days to silking) using three models in the chromosomal region surrounding <i>tb1</i> (Chr. 1; 265,745,979–265,747,712 bp) and <i>d8</i> (Chr. 1; 266,094,769–266,097,836 bp). All GBS and 55K SNPs between 255 Mb and 270 Mb on Chr. 1 are included in the figure. Brown lines indicate results from naïve model, red lines indicate results from Q model, and blue lines indicate results from Q+K model. C. GWAS results for flowering time (days to silking) using three models in the chromosomal region surrounding <i>tb1</i> (Chr. 1; 265,745,979–265,747,712 bp) and <i>d8</i> (Chr. 1; 266,094,769–266,097,836 bp). All GBS and 55K SNPs between 265 Mb and 267 Mb on Chr. 1 are included in the figure. Black markers on the right are significant SNPs located within <i>d8</i>. Black markers on the left are significant SNPs located within <i>tb1</i>. Triangles indicate results from naïve model, squares indicate results from Q model, and diamonds indicate results from Q+K model.</p

    Association between polymorphisms at the <i>d8</i> locus and variation in flowering time in the 92 and 282 association panel, and association between polymorphisms in the region between <i>d8</i> and <i>tb1</i> (<i>d8</i>/<i>tb1</i>) and variation in flowering time in the 282 line association panel.

    No full text
    a<p>A general linear model not controlling for population structure.</p>b<p>A mixed linear model controlling for kinship but not population structure.</p>c<p>A general linear model controlling for population structure (<i>k</i> = 5).</p>d<p>A mixed linear model controlling for both population structure (<i>k</i> = 5) and kinship.</p

    Results from association study between polymorphisms within <i>d8</i> and a range of traits using MLM (Q+K).

    No full text
    <p>Results from association study between polymorphisms within <i>d8</i> and a range of traits using MLM (Q+K).</p

    R<sup>2</sup> between the 6 bp indel in <i>d8</i> and all the other sites on chromosome 1.

    No full text
    <p>Blue dots indicate results from 13,815 GBS SNPs present in 200 or more of the 282 lines. Red dots indicate results from 7,695 55K SNPs present in 200 or more of the 282 lines.</p

    Association studies of flowering time in <i>Arabidopsis thaliana</i>.

    No full text
    <p>The flowering time at 16°C was measured on 199 <i>Arabidopsis thaliana</i> individuals genotyped with 250,000 SNPs. Seven statistical methods were employed to conduct the association studies: <b>(a)</b> t-test (naïve method), which tests the additive genetic effect of markers, one marker at a time, with the marker as the only explanatory variable; <b>(b)</b> GLM; <b>(c)</b> MLM; <b>(d)</b> CMLM; <b>(e)</b> FaST-LMM-Select; <b>(f)</b> MLMM; and <b>(g)</b> FarmCPU. All methods, except the t-test, MLMM and FarmCPU, included the first three PCs derived from the genetic markers as covariates. FarmCPU identified five associated SNPs after Bonferroni multiple test correction, including three within a distance of 50,000 base pairs to known genes such as FLC. MLMM identified two associated SNPs after Bonferroni multiple test correction, and overlapped with the five associated SNPs from FarmCPU results. With all other methods, these genes are indistinguishable from the background noise.</p

    Conception and performances of different methods.

    No full text
    <p><b>A</b>) Distribution of statistical power by using a kinship derived from a set of SNPs selected randomly. The dataset contained ∼3,000 SNPs genotyped on 282 maize inbred lines. The number of selected SNPs was the same as the number of individuals used to derive kinship. Power was examined on a trait simulated from 27 causative mutations, i.e. Quantitative Trait Nucleotide (QTNs), sampled from the ∼3,000 SNPs except the ones on the last chromosome. The SNPs on the last chromosome were used to derive the null distribution of Type I error. The heritability of the trait was set to 0.75. A total of 100 replications were conducted. The average and the median power are 0.476 and 0.444. The power of using kinship derived from all SNPs is 0.511 (red line). <b>B</b>) Conception of kinship for association study. Pedigree is the first available information used to calculate kinship. It is the expectation for a pair of individuals to be identical by descent at any locus, (e.g., full siblings have a kinship of 50% in cases of no inbreeding). Pedigree kinship can be used across traits. A realized kinship derived from genetic markers covering entire genome is more precise than pedigree based (e.g., full siblings could have a kinship of 60% - or 40% - instead of 50%). However, it is still general and can be used for all traits. A complete trait specific realized kinship is using all the QTNs underlying the trait. This complete trait specific kinship is ideal for genome prediction, but not for GWAS. The ideal kinship for GWAS is its complement (using all QTNs except the one being tested) to remove the confounding between the kinship and the tested SNPs. <b>C</b>) and <b>D</b>) display the performance of statistical power and effectiveness of genomic control of inflation factor by using different kinship. The statistical power is about 50% when using all the SNPs. Inclusion or exclusion of the 27 QTNs did not have a significant impact. When only the 27 QTNs were used to derive a complete trait specific kinship, the statistical power was dramatically reduced to 30%. When each of the 27 QTNs was tested by using the complementary trait specific kinship derived from the other 26 QTNs (SUPER with known QTNs), the statistical power was boosted to 66%. A statistical power of 61% was retained by using SUPER with masked QTNs. The genomic control of SUPER was similar with known QTNs and with masked QTNs, closer to expectation (1.00) than other methods.</p

    Computing time and memory usage of five software packages.

    No full text
    <p>Three statistical models were performed by the five packages: 1) GLM by PLINK; 2) MLMs by EMMAX, GenABEL, and MLMM; and 3) FarmCPU by FarmCPU. Computing time <b>(a)</b> and memory usage <b>(b)</b> in response to sample size are displayed. The analyses were performed on a laptop (Asus A53S) running a Linux system (Ubuntu 12.10, 64 bit) with a 4.0 Gb of Random-Access Memory (RAM) and an Inter duo Core i3-2310M processor at 2.1 GHz. One core was used for this test. All datasets had 60,000 markers, but response was measured as a function of sample size. The last data point indicates the maximum sample size each software package could process without freezing the computer, except for PLINK and FarmCPU. The limitations for these two software packages were not reached with the maximum sample size examined.</p

    Conceptual development and procedure of FarmCPU.

    No full text
    <p>The proposed method, FarmCPU, was inspired by the method development demonstrated on the left panel <b>(a)</b>. These methods start with a naïve model (e.g. t-test) that tests marker effect, one at a time, i.e. i<sup>th</sup> marker (s<sub>i</sub>), on the phenotype (<b>y</b>) with a residual effect (<b>e</b>). Next, GLM controls false positives by fitting population structure (<b>Q</b>) as covariates to adjust the test on genetic markers indicated by the blue arrows. MLM fits both <b>Q</b> and kinship (<b>K</b>) as covariates. However, both <b>Q</b> and <b>K</b> remain constant for testing all the markers. Neither <b>Q</b> nor <b>K</b> receives adjustment from association tests on markers. MLMM add pseudo QTNs as additional covariates (<b>S</b>). These pseudo QTNs are estimated through a stepwise regression procedure. Consequently, these pseudo QTNs receive adjustment from association tests on markers as indicated by the red arrow. However, both <b>Q</b> and <b>K</b> remain constant for testing all the markers. Although similar to MLM, FaST-LMM-Select controls false positives by fitting <b>Q</b> and <b>K</b> as covariates; the <b>K</b> of FaST-LMM-Select is incorporated with association tests on markers as indicated by the red arrow. However, <b>Q</b> remains constant. FarmCPU completely removes the confounding between the testing marker and both <b>K</b> and <b>Q</b> by combining MLMM and FaST-LMM-Select, but allowing a fixed effect model and a random effect model to perform separately. The fixed effect model contains the testing marker and pseudo QTNs to control false positives. The pseudo QTNs are selected from associated markers and evaluated by the random effect model, with <b>K</b> defined by the pseudo QTNs. The fixed effect model and random effect model are used iteratively until a stage of convergence is reached, that is, when no new pseudo QTNs are added. The right panel <b>(b)</b> displays the fixed effect model above the dashed line and the random effect models below the dashed line. The t pseudo QTNs (<b>S</b><sub><b>1</b></sub> to <b>S</b><sub><b>t</b></sub>) are fitted as covariates to test markers one at a time, e.g., i<sup>th</sup> marker (<b>s</b><sub><b>i</b></sub>) in the fixed model. As the pseudo QTNs are fitted as covariates for each marker, Not Available (NA) is assigned as the test statistic for all markers that are also pseudo QTNs—as the genetic marker is completely co-linear to the pseudo QTN marker. However, each pseudo QTN has a test statistic corresponding to every marker, creating a matrix (lightly shaded) with elements of P<sub>ij</sub>, i = 1 to t and j = 1 to m. The most significant P value of each pseudo QTN (the vector on the right of shaded area) is used as the substitution for the NA of the corresponding marker. The pseudo QTNs are optimized by using the SUPER method in the random model to incorporate both test statistics from the fixed effect model and genetic map information in the genotype data. The random effects are the individuals’ genetic effects (<b>u</b>) with variance and covariance matrix, Var(<b>u</b>), defined by the Singular Value Decomposition (SVD) on the pseudo QTNs by using the FaST-LMM algorithm. The updated set of pseudo QTNs go back into the fixed model. The process continuously repeats until no more pseudo QTNs are added.</p
    • …
    corecore