Search CORE

17 research outputs found

Recommended from our members

Methods of genotype imputation for genome-wide association studies

Author: Qiu Lin, M.S. in Statistiscs
Publication venue
Publication date: 24/10/2016
Field of study

In genetic epidemiological studies, missing data problems arise when genotypes of particular markers are unavailable for reasons of data quality, cost efficiency or technical design. Genotype imputation is a well-established statistical technique for estimating unobserved genotypes in association studies. Imputation methods are implemented by copying haplotype segments from a densely genotyped reference panel into individuals typed at a subset of the reference variants. By this way, genotypes can be estimated and tested for association at variants that were not assayed in a study. This report first summarizes the missing data mechanisms. Then an overview of the different methods that have been proposed for genotype imputation is provided and some thoughts for future directions are given.Statistic

Texas ScholarWorks

Fast $k$ -NNG construction with GPU-based quick multi-select

Author: D'Souza Roshan
Dashti Ali
Komarov Ivan
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 21/09/2013
Field of study

In this paper we describe a new brute force algorithm for building the

k

-Nearest Neighbor Graph (

k

-NNG). The

k

-NNG algorithm has many applications in areas such as machine learning, bio-informatics, and clustering analysis. While there are very efficient algorithms for data of low dimensions, for high dimensional data the brute force search is the best algorithm. There are two main parts to the algorithm: the first part is finding the distances between the input vectors which may be formulated as a matrix multiplication problem. The second is the selection of the

k

-NNs for each of the query vectors. For the second part, we describe a novel graphics processing unit (GPU) -based multi-select algorithm based on quick sort. Our optimization makes clever use of warp voting functions available on the latest GPUs along with use-controlled cache. Benchmarks show significant improvement over state-of-the-art implementations of the

k

-NN search on GPUs

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

Utilizing Genotype Imputation for the Augmentation of Sequence Data

Author: Deyo-Svendsen Matthew E.
Freimuth Robert
Fridley Brooke L.
Hebbring Scott
Jenkins Gregory
Publication venue: Public Library of Science
Publication date: 01/06/2010
Field of study

In recent years, capabilities for genotyping large sets of single nucleotide polymorphisms (SNPs) has increased considerably with the ability to genotype over 1 million SNP markers across the genome. This advancement in technology has led to an increase in the number of genome-wide association studies (GWAS) for various complex traits. These GWAS have resulted in the implication of over 1500 SNPs associated with disease traits. However, the SNPs identified from these GWAS are not necessarily the functional variants. Therefore, the next phase in GWAS will involve the refining of these putative loci.A next step for GWAS would be to catalog all variants, especially rarer variants, within the detected loci, followed by the association analysis of the detected variants with the disease trait. However, sequencing a locus in a large number of subjects is still relatively expensive. A more cost effective approach would be to sequence a portion of the individuals, followed by the application of genotype imputation methods for imputing markers in the remaining individuals. A potentially attractive alternative option would be to impute based on the 1000 Genomes Project; however, this has the drawbacks of using a reference population that does not necessarily match the disease status and LD pattern of the study population. We explored a variety of approaches for carrying out the imputation using a reference panel consisting of sequence data for a fraction of the study participants using data from both a candidate gene sequencing study and the 1000 Genomes Project.Imputation of genetic variation based on a proportion of sequenced samples is feasible. Our results indicate the following sequencing study design guidelines which take advantage of the recent advances in genotype imputation methodology: Select the largest and most diverse reference panel for sequencing and genotype as many "anchor" markers as possible

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

The polymorphism architecture of mouse genetic resources elucidated using genome-wide resequencing data: implications for QTL discovery and systems genetics

Author
Publication venue: Springer
Publication date: 01/07/2007
Field of study

Springer - Publisher Connector

Escaneamento genômico para tolerância à seca em sorgo.

Author: ALBUQUERQUE P. E. P. de
ANDRADE C. de L. T. de
BASTOS E. A.
CARDOSO M. J.
GAZAFFI R.
GOMIDE R. L.
GUIMARAES C. T.
MAGALHAES J. V. de
MENEZES C. B. de
PASTINA M. M.
ROSA J. R. B. F.
SCHAFFERT R. E.
TARDIN F. D.
Publication venue: Sete Lagoas: Embrapa Milho e Sorgo, 2016.
Publication date: 20/06/2017
Field of study

O sorgo é adaptado a ambientes extremos onde estresses abióticos como a seca limitam a produção de grãos e de biomassa, como nas vastas regiões do Cerrado brasileiro. Por ser usado como alimento básico em regiões do mundo onde a produção de alimentos é ainda um desafio, o aumento da tolerância à seca em sorgo é importante para a segurança alimentar global, particularmente em um cenário de mudanças climáticas. Além disso, como o genoma do sorgo é menor e menos duplicado, em comparação com gramíneas como o milho e a cana-de-açúcar, o sorgo pode ser utilizado para elucidar os determinantes genéticos da tolerância à seca em outras espécies. Neste trabalho, o mapeamento associativo em escala genômica foi utilizado para a identificação de regiões genômicas associadas com características relacionadas com a tolerância à seca em dois ambientes, em Janaúba (MG) e em Teresina (PI). Um total de 265.587 marcadores SNP foram testados para associações com diferentes características em um painel de sorgo com 243 acessos. As estimativas de herdabilidade foram moderadas a altas e a redução máxima na produção de grãos causada pelo estresse de seca foi de 57% em Teresina. Os testes de associação com um modelo incorporando simultaneamente estrutura populacional e a matriz de relacionamento revelaram vários SNPs associados com diferentes característas, alguns dos quais foram estáveis entre ambientes.bitstream/item/160901/1/bol-152.pd

Infoteca-e

FastMap: Fast eQTL mapping in homozygous populations

Author: Andrew B. Nobel
Andrey A. Shabalin
Beck
Broman
Bystrykh
Carlborg
Cervino
Chesler
Churchill
Churchill
Daniel M. Gatti
Doerge
Dupuis
Frazer
Frazer
Fred A. Wright
Gatti
Haley
Hillebrandt
Ivan Rusyn
Kadarmideen
Kang
Kao
Kendziorski
Kent
Kong
Lander
Manly
McClurg
McClurg
Mehrabian
Peirce
Pletcher
Pontius
Pritchard
Roberts
Roberts
Schadt
Storey
Szatkiewicz
Tieu-Chong Lam
Wang
Wang
Yang
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Motivation: Gene expression Quantitative Trait Locus (eQTL) mapping measures the association between transcript expression and genotype in order to find genomic locations likely to regulate transcript expression. The availability of both gene expression and high-density genotype data has improved our ability to perform eQTL mapping in inbred mouse and other homozygous populations. However, existing eQTL mapping software does not scale well when the number of transcripts and markers are on the order of 105 and 105–106, respectively

Crossref

PubMed Central

Carolina Digital Repository

TEAM: efficient two-locus epistasis tests in human genome-wide association study

Author: Balding
Evans
F. Zou
Hirschhorn
Hoh
Hoh
Musani
Nelson
Ritchie
S. Huang
W. Wang
Wade
Weedon
X. Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

As a promising tool for identifying genetic markers underlying phenotypic differences, genome-wide association study (GWAS) has been extensively investigated in recent years. In GWAS, detecting epistasis (or gene–gene interaction) is preferable over single locus study since many diseases are known to be complex traits. A brute force search is infeasible for epistasis detection in the genome-wide scale because of the intensive computational burden. Existing epistasis detection algorithms are designed for dataset consisting of homozygous markers and small sample size. In human study, however, the genotype may be heterozygous, and number of individuals can be up to thousands. Thus, existing methods are not readily applicable to human datasets. In this article, we propose an efficient algorithm, TEAM, which significantly speeds up epistasis detection for human GWAS. Our algorithm is exhaustive, i.e. it does not ignore any epistatic interaction. Utilizing the minimum spanning tree structure, the algorithm incrementally updates the contingency tables for epistatic tests without scanning all individuals. Our algorithm has broader applicability and is more efficient than existing methods for large sample study. It supports any statistical test that is based on contingency tables, and enables both family-wise error rate and false discovery rate controlling. Extensive experiments show that our algorithm only needs to examine a small portion of the individuals to update the contingency tables, and it achieves at least an order of magnitude speed up over the brute force approach

CiteSeerX

Crossref

PubMed Central

Carolina Digital Repository

Using Population Mixtures to Optimize the Utility of Genomic Databases: Linkage Disequilibrium and Association Study Design in India

Author: Conrad D. F.
Coop G.
Jakobsson Mattias
Patel P. I.
Pemberton Trevor J.
Pritchard Jonathan K.
Rosenberg Noah A.
Wall J. D.
Publication venue: 'Wiley'
Publication date: 01/07/2008
Field of study

When performing association studies in populations that have not been the focus of large-scale investigations of haplotype variation, it is often helpful to rely on genomic databases in other populations for study design and analysis – such as in the selection of tag SNPs and in the imputation of missing genotypes. One way of improving the use of these databases is to rely on a mixture of database samples that is similar to the population of interest, rather than using the single most similar database sample. We demonstrate the effectiveness of the mixture approach in the application of African, European, and East Asian HapMap samples for tag SNP selection in populations from India, a genetically intermediate region underrepresented in genomic studies of haplotype variation.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/65949/1/j.1469-1809.2008.00457.x.pd

Crossref

PubMed Central

Deep Blue Documents at the University of Michigan

Two in one sweep: aluminum tolerance and grain yield in P-limited soils are associated to the same genomic region in West African Sorghum

Crossref