17 research outputs found
Recommended from our members
Methods of genotype imputation for genome-wide association studies
In genetic epidemiological studies, missing data problems arise when genotypes of particular markers are unavailable for reasons of data quality, cost efficiency or technical design. Genotype imputation is a well-established statistical technique for estimating unobserved genotypes in association studies. Imputation methods are implemented by copying haplotype segments from a densely genotyped reference panel into individuals typed at a subset of the reference variants. By this way, genotypes can be estimated and tested for association at variants that were not assayed in a study. This report first summarizes the missing data mechanisms. Then an overview of the different methods that have been proposed for genotype imputation is provided and some thoughts for future directions are given.Statistic
Fast -NNG construction with GPU-based quick multi-select
In this paper we describe a new brute force algorithm for building the
-Nearest Neighbor Graph (-NNG). The -NNG algorithm has many
applications in areas such as machine learning, bio-informatics, and clustering
analysis. While there are very efficient algorithms for data of low dimensions,
for high dimensional data the brute force search is the best algorithm. There
are two main parts to the algorithm: the first part is finding the distances
between the input vectors which may be formulated as a matrix multiplication
problem. The second is the selection of the -NNs for each of the query
vectors. For the second part, we describe a novel graphics processing unit
(GPU) -based multi-select algorithm based on quick sort. Our optimization makes
clever use of warp voting functions available on the latest GPUs along with
use-controlled cache. Benchmarks show significant improvement over
state-of-the-art implementations of the -NN search on GPUs
Utilizing Genotype Imputation for the Augmentation of Sequence Data
In recent years, capabilities for genotyping large sets of single nucleotide polymorphisms (SNPs) has increased considerably with the ability to genotype over 1 million SNP markers across the genome. This advancement in technology has led to an increase in the number of genome-wide association studies (GWAS) for various complex traits. These GWAS have resulted in the implication of over 1500 SNPs associated with disease traits. However, the SNPs identified from these GWAS are not necessarily the functional variants. Therefore, the next phase in GWAS will involve the refining of these putative loci.A next step for GWAS would be to catalog all variants, especially rarer variants, within the detected loci, followed by the association analysis of the detected variants with the disease trait. However, sequencing a locus in a large number of subjects is still relatively expensive. A more cost effective approach would be to sequence a portion of the individuals, followed by the application of genotype imputation methods for imputing markers in the remaining individuals. A potentially attractive alternative option would be to impute based on the 1000 Genomes Project; however, this has the drawbacks of using a reference population that does not necessarily match the disease status and LD pattern of the study population. We explored a variety of approaches for carrying out the imputation using a reference panel consisting of sequence data for a fraction of the study participants using data from both a candidate gene sequencing study and the 1000 Genomes Project.Imputation of genetic variation based on a proportion of sequenced samples is feasible. Our results indicate the following sequencing study design guidelines which take advantage of the recent advances in genotype imputation methodology: Select the largest and most diverse reference panel for sequencing and genotype as many "anchor" markers as possible
Escaneamento genômico para tolerância à seca em sorgo.
O sorgo é adaptado a ambientes extremos onde estresses abióticos como a seca limitam a produção de grãos e de biomassa, como nas vastas regiões do Cerrado brasileiro. Por ser usado como alimento básico em regiões do mundo onde a produção de alimentos é ainda um desafio, o aumento da tolerância à seca em sorgo é importante para a segurança alimentar global, particularmente em um cenário de mudanças climáticas. Além disso, como o genoma do sorgo é menor e menos duplicado, em comparação com gramíneas como o milho e a cana-de-açúcar, o sorgo pode ser utilizado para elucidar os determinantes genéticos da tolerância à seca em outras espécies. Neste trabalho, o mapeamento associativo em escala genômica foi utilizado para a identificação de regiões genômicas associadas com características relacionadas com a tolerância à seca em dois ambientes, em Janaúba (MG) e em Teresina (PI). Um total de 265.587 marcadores SNP foram testados para associações com diferentes características em um painel de sorgo com 243 acessos. As estimativas de herdabilidade foram moderadas a altas e a redução máxima na produção de grãos causada pelo estresse de seca foi de 57% em Teresina. Os testes de associação com um modelo incorporando simultaneamente estrutura populacional e a matriz de relacionamento revelaram vários SNPs associados com diferentes característas, alguns dos quais foram estáveis entre ambientes.bitstream/item/160901/1/bol-152.pd
FastMap: Fast eQTL mapping in homozygous populations
Motivation: Gene expression Quantitative Trait Locus (eQTL) mapping measures the association between transcript expression and genotype in order to find genomic locations likely to regulate transcript expression. The availability of both gene expression and high-density genotype data has improved our ability to perform eQTL mapping in inbred mouse and other homozygous populations. However, existing eQTL mapping software does not scale well when the number of transcripts and markers are on the order of 105 and 105–106, respectively
TEAM: efficient two-locus epistasis tests in human genome-wide association study
As a promising tool for identifying genetic markers underlying phenotypic differences, genome-wide association study (GWAS) has been extensively investigated in recent years. In GWAS, detecting epistasis (or gene–gene interaction) is preferable over single locus study since many diseases are known to be complex traits. A brute force search is infeasible for epistasis detection in the genome-wide scale because of the intensive computational burden. Existing epistasis detection algorithms are designed for dataset consisting of homozygous markers and small sample size. In human study, however, the genotype may be heterozygous, and number of individuals can be up to thousands. Thus, existing methods are not readily applicable to human datasets. In this article, we propose an efficient algorithm, TEAM, which significantly speeds up epistasis detection for human GWAS. Our algorithm is exhaustive, i.e. it does not ignore any epistatic interaction. Utilizing the minimum spanning tree structure, the algorithm incrementally updates the contingency tables for epistatic tests without scanning all individuals. Our algorithm has broader applicability and is more efficient than existing methods for large sample study. It supports any statistical test that is based on contingency tables, and enables both family-wise error rate and false discovery rate controlling. Extensive experiments show that our algorithm only needs to examine a small portion of the individuals to update the contingency tables, and it achieves at least an order of magnitude speed up over the brute force approach
Using Population Mixtures to Optimize the Utility of Genomic Databases: Linkage Disequilibrium and Association Study Design in India
When performing association studies in populations that have not been the focus of large-scale investigations of haplotype variation, it is often helpful to rely on genomic databases in other populations for study design and analysis – such as in the selection of tag SNPs and in the imputation of missing genotypes. One way of improving the use of these databases is to rely on a mixture of database samples that is similar to the population of interest, rather than using the single most similar database sample. We demonstrate the effectiveness of the mixture approach in the application of African, European, and East Asian HapMap samples for tag SNP selection in populations from India, a genetically intermediate region underrepresented in genomic studies of haplotype variation.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/65949/1/j.1469-1809.2008.00457.x.pd