473 research outputs found
Design Considerations for Massively Parallel Sequencing Studies of Complex Human Disease
Massively Parallel Sequencing (MPS) allows sequencing of entire exomes and genomes to now be done at reasonable cost, and its utility for identifying genes responsible for rare Mendelian disorders has been demonstrated. However, for a complex disease, study designs need to accommodate substantial degrees of locus, allelic, and phenotypic heterogeneity, as well as complex relationships between genotype and phenotype. Such considerations include careful selection of samples for sequencing and a well-developed strategy for identifying the few “true” disease susceptibility genes from among the many irrelevant genes that will be found to harbor rare variants. To examine these issues we have performed simulation-based analyses in order to compare several strategies for MPS sequencing in complex disease. Factors examined include genetic architecture, sample size, number and relationship of individuals selected for sequencing, and a variety of filters based on variant type, multiple observations of genes and concordance of genetic variants within pedigrees. A two-stage design was assumed where genes from the MPS analysis of high-risk families are evaluated in a secondary screening phase of a larger set of probands with more modest family histories. Designs were evaluated using a cost function that assumes the cost of sequencing the whole exome is 400 times that of sequencing a single candidate gene. Results indicate that while requiring variants to be identified in multiple pedigrees and/or in multiple individuals in the same pedigree are effective strategies for reducing false positives, there is a danger of over-filtering so that most true susceptibility genes are missed. In most cases, sequencing more than two individuals per pedigree results in reduced power without any benefit in terms of reduced overall cost. Further, our results suggest that although no single strategy is optimal, simulations can provide important guidelines for study design
Parameter Estimation and Quantitative Parametric Linkage Analysis with GENEHUNTER-QMOD
Objective: We present a parametric method for linkage analysis of quantitative phenotypes. The method provides a test for linkage as well as an estimate of different phenotype parameters. We have implemented our new method in the program GENEHUNTER-QMOD and evaluated its properties by performing simulations. Methods: The phenotype is modeled as a normally distributed variable, with a separate distribution for each genotype. Parameter estimates are obtained by maximizing the LOD score over the normal distribution parameters with a gradient-based optimization called PGRAD method. Results: The PGRAD method has lower power to detect linkage than the variance components analysis (VCA) in case of a normal distribution and small pedigrees. However, it outperforms the VCA and Haseman-Elston regression for extended pedigrees, nonrandomly ascertained data and non-normally distributed phenotypes. Here, the higher power even goes along with conservativeness, while the VCA has an inflated type I error. Parameter estimation tends to underestimate residual variances but performs better for expectation values of the phenotype distributions. Conclusion: With GENEHUNTER-QMOD, a powerful new tool is provided to explicitly model quantitative phenotypes in the context of linkage analysis. It is freely available at http://www.helmholtz-muenchen.de/genepi/downloads. Copyright (C) 2012 S. Karger AG, Base
Candidate high myopia loci on chromosomes 18p and 12q do not play a major role in susceptibility to common myopia
BACKGROUND: To determine whether previously reported loci predisposing to nonsyndromic high myopia show linkage to common myopia in pedigrees from two ethnic groups: Ashkenazi Jewish and Amish. We hypothesized that these high myopia loci might exhibit allelic heterogeneity and be responsible for moderate /mild or common myopia. METHODS: Cycloplegic and manifest refraction were performed on 38 Jewish and 40 Amish families. Individuals with at least -1.00 D in each meridian of both eyes were classified as myopic. Genomic DNA was genotyped with 12 markers on chromosomes 12q21-23 and 18p11.3. Parametric and nonparametric linkage analyses were conducted to determine whether susceptibility alleles at these loci are important in families with less severe, clinical forms of myopia. RESULTS: There was no strong evidence of linkage of common myopia to these candidate regions: all two-point and multipoint heterogeneity LOD scores were < 1.0 and non-parametric linkage p-values were > 0.01. However, one Amish family showed slight evidence of linkage (LOD>1.0) on 12q; another 3 Amish families each gave LOD >1.0 on 18p; and 3 Jewish families each gave LOD >1.0 on 12q. CONCLUSIONS: Significant evidence of linkage (LOD> 3) of myopia was not found on chromosome 18p or 12q loci in these families. These results suggest that these loci do not play a major role in the causation of common myopia in our families studied
Polymorphisms in the WNK1 gene are asociated with blood pressure variation and urinary potassium excretion
WNK1 - a serine/threonine kinase involved in electrolyte homeostasis and blood pressure (BP) control - is an excellent candidate gene for essential hypertension (EH). We and others have previously reported association between WNK1 and BP variation. Using tag SNPs (tSNPs) that capture 100% of common WNK1 variation in HapMap, we aimed to replicate our findings with BP and to test for association with phenotypes relating to WNK1 function in the British Genetics of Hypertension (BRIGHT) study case-control resource (1700 hypertensive cases and 1700 normotensive controls). We found multiple variants to be associated with systolic blood pressure, SBP (7/28 tSNPs min-p = 0.0005), diastolic blood pressure, DBP (7/28 tSNPs min-p = 0.002) and 24 hour urinary potassium excretion (10/28 tSNPs min-p = 0.0004). Associations with SBP and urine potassium remained significant after correction for multiple testing (p = 0.02 and p = 0.01 respectively). The major allele (A) of rs765250, located in intron 1, demonstrated the strongest evidence for association with SBP, effect size 3.14 mmHg (95%CI:1.23–4.9), DBP 1.9 mmHg (95%CI:0.7–3.2) and hypertension, odds ratio (OR: 1.3 [95%CI: 1.0–1.7]).We genotyped this variant in six independent populations (n = 14,451) and replicated the association between rs765250 and SBP in a meta-analysis (p = 7×10−3, combined with BRIGHT data-set p = 2×10−4, n = 17,851). The associations of WNK1 with DBP and EH were not confirmed. Haplotype analysis revealed striking associations with hypertension and BP variation (global permutation p10 mmHg reduction) and risk for hypertension (OR<0.60). Our data indicates that multiple rare and common WNK1 variants contribute to BP variation and hypertension, and provide compelling evidence to initiate further genetic and functional studies to explore the role of WNK1 in BP regulation and EH
Strong evidence that the common variant S384F in BRCA2 has no pathogenic relevance in hereditary breast cancer
INTRODUCTION: Unclassified variants (UVs) of unknown clinical significance are frequently detected in the BRCA2 gene. In this study, we have investigated the potential pathogenic relevance of the recurrent UV S384F (BRCA2, exon 10). METHODS: For co-segregation, four women from a large kindred (BN326) suffering from breast cancer were analysed. Moreover, paraffin-embedded tumours from two patients were analysed for loss of heterozygosity. Co-occurrence of the variant with a deleterious mutation was further determined in a large data set of 43,029 index cases. Nature and position of the UV and conservation among species were evaluated. RESULTS: We identified the unclassified variant S384F in three of the four breast cancer patients (the three were diagnosed at 41, 43 and 57 years of age). One woman with bilateral breast cancer (diagnosed at ages 32 and 50) did not carry the variant. Both tumours were heterozygous for the S384F variant, so loss of the wild-type allele could be excluded. Ser384 is not located in a region of functional importance and cross-species sequence comparison revealed incomplete conservation in the human, dog, rodent and chicken BRCA2 homologues. Overall, the variant was detected in 116 patients, five of which co-occurred with different deleterious mutations. The combined likelihood ratio of co-occurrence, co-segregation and loss of heterozygosity revealed a value of 1.4 × 10(-8 )in favour of neutrality of the variant. CONCLUSION: Our data provide conclusive evidence that the S384F variant is not a disease causing mutation
Linkage analysis of HLA and candidate genes for celiac disease in a North American family-based study
BACKGROUND: Celiac disease has a strong genetic association with HLA. However, this association only explains approximately half of the sibling risk for celiac disease. Therefore, other genes must be involved in susceptibility to celiac disease. We tested for linkage to genes or loci that could play a role in pathogenesis of celiac disease. METHODS: DNA samples, from members of 62 families with a minimum of two cases of celiac disease, were genotyped at HLA and at 13 candidate gene regions, including CD4, CTLA4, four T-cell receptor regions, and 7 insulin-dependent diabetes regions. Two-point and multipoint heterogeneity LOD (HLOD) scores were examined. RESULTS: The highest two-point and multipoint HLOD scores were obtained in the HLA region, with a two-point HLOD of 3.1 and a multipoint HLOD of 5.0. For the candidate genes, we found no evidence for linkage. CONCLUSIONS: Our significant evidence of linkage to HLA replicates the known linkage and association of HLA with CD. In our families, likely candidate genes did not explain the susceptibility to celiac disease
Inference of population splits and mixtures from genome-wide allele frequency data
Many aspects of the historical relationships between populations in a species
are reflected in genetic data. Inferring these relationships from genetic data,
however, remains a challenging task. In this paper, we present a statistical
model for inferring the patterns of population splits and mixtures in multiple
populations. In this model, the sampled populations in a species are related to
their common ancestor through a graph of ancestral populations. Using
genome-wide allele frequency data and a Gaussian approximation to genetic
drift, we infer the structure of this graph. We applied this method to a set of
55 human populations and a set of 82 dog breeds and wild canids. In both
species, we show that a simple bifurcating tree does not fully describe the
data; in contrast, we infer many migration events. While some of the migration
events that we find have been detected previously, many have not. For example,
in the human data we infer that Cambodians trace approximately 16% of their
ancestry to a population ancestral to other extant East Asian populations. In
the dog data, we infer that both the boxer and basenji trace a considerable
fraction of their ancestry (9% and 25%, respectively) to wolves subsequent to
domestication, and that East Asian toy breeds (the Shih Tzu and the Pekingese)
result from admixture between modern toy breeds and "ancient" Asian breeds.
Software implementing the model described here, called TreeMix, is available at
http://treemix.googlecode.comComment: 28 pages, 6 figures in main text. Attached supplement is 22 pages, 15
figures. This is an updated version of the preprint available at
http://precedings.nature.com/documents/6956/version/
Single nucleotide polymorphism discovery in rainbow trout by deep sequencing of a reduced representation library
<p>Abstract</p> <p>Background</p> <p>To enhance capabilities for genomic analyses in rainbow trout, such as genomic selection, a large suite of polymorphic markers that are amenable to high-throughput genotyping protocols must be identified. Expressed Sequence Tags (ESTs) have been used for single nucleotide polymorphism (SNP) discovery in salmonids. In those strategies, the salmonid semi-tetraploid genomes often led to assemblies of paralogous sequences and therefore resulted in a high rate of false positive SNP identification. Sequencing genomic DNA using primers identified from ESTs proved to be an effective but time consuming methodology of SNP identification in rainbow trout, therefore not suitable for high throughput SNP discovery. In this study, we employed a high-throughput strategy that used pyrosequencing technology to generate data from a reduced representation library constructed with genomic DNA pooled from 96 unrelated rainbow trout that represent the National Center for Cool and Cold Water Aquaculture (NCCCWA) broodstock population.</p> <p>Results</p> <p>The reduced representation library consisted of 440 bp fragments resulting from complete digestion with the restriction enzyme <it>Hae</it>III; sequencing produced 2,000,000 reads providing an average 6 fold coverage of the estimated 150,000 unique genomic restriction fragments (300,000 fragment ends). Three independent data analyses identified 22,022 to 47,128 putative SNPs on 13,140 to 24,627 independent contigs. A set of 384 putative SNPs, randomly selected from the sets produced by the three analyses were genotyped on individual fish to determine the validation rate of putative SNPs among analyses, distinguish apparent SNPs that actually represent paralogous loci in the tetraploid genome, examine Mendelian segregation, and place the validated SNPs on the rainbow trout linkage map. Approximately 48% (183) of the putative SNPs were validated; 167 markers were successfully incorporated into the rainbow trout linkage map. In addition, 2% of the sequences from the validated markers were associated with rainbow trout transcripts.</p> <p>Conclusion</p> <p>The use of reduced representation libraries and pyrosequencing technology proved to be an effective strategy for the discovery of a high number of putative SNPs in rainbow trout; however, modifications to the technique to decrease the false discovery rate resulting from the evolutionary recent genome duplication would be desirable.</p
TOPS++FATCAT: Fast flexible structural alignment using constraints derived from TOPS+ Strings Model
<p>Abstract</p> <p>Background</p> <p>Protein structure analysis and comparison are major challenges in structural bioinformatics. Despite the existence of many tools and algorithms, very few of them have managed to capture the intuitive understanding of protein structures developed in structural biology, especially in the context of rapid database searches. Such intuitions could help speed up similarity searches and make it easier to understand the results of such analyses.</p> <p>Results</p> <p>We developed a TOPS++FATCAT algorithm that uses an intuitive description of the proteins' structures as captured in the popular TOPS diagrams to limit the search space of the aligned fragment pairs (AFPs) in the flexible alignment of protein structures performed by the FATCAT algorithm. The TOPS++FATCAT algorithm is faster than FATCAT by more than an order of magnitude with a minimal cost in classification and alignment accuracy. For beta-rich proteins its accuracy is better than FATCAT, because the TOPS+ strings models contains important information of the parallel and anti-parallel hydrogen-bond patterns between the beta-strand SSEs (Secondary Structural Elements). We show that the TOPS++FATCAT errors, rare as they are, can be clearly linked to oversimplifications of the TOPS diagrams and can be corrected by the development of more precise secondary structure element definitions.</p> <p>Software Availability</p> <p>The benchmark analysis results and the compressed archive of the TOPS++FATCAT program for Linux platform can be downloaded from the following web site: <url>http://fatcat.burnham.org/TOPS/</url></p> <p>Conclusion</p> <p>TOPS++FATCAT provides FATCAT accuracy and insights into protein structural changes at a speed comparable to sequence alignments, opening up a possibility of interactive protein structure similarity searches.</p
- …