1,176 research outputs found

    The effect of genetic variation on promoter usage and enhancer activity.

    Get PDF
    The identification of genetic variants affecting gene expression, namely expression quantitative trait loci (eQTLs), has contributed to the understanding of mechanisms underlying human traits and diseases. The majority of these variants map in non-coding regulatory regions of the genome and their identification remains challenging. Here, we use natural genetic variation and CAGE transcriptomes from 154 EBV-transformed lymphoblastoid cell lines, derived from unrelated individuals, to map 5376 and 110 regulatory variants associated with promoter usage (puQTLs) and enhancer activity (eaQTLs), respectively. We characterize five categories of genes associated with puQTLs, distinguishing single from multi-promoter genes. Among multi-promoter genes, we find puQTL effects either specific to a single promoter or to multiple promoters with variable effect orientations. Regulatory variants associated with opposite effects on different mRNA isoforms suggest compensatory mechanisms occurring between alternative promoters. Our analyses identify differential promoter usage and modulation of enhancer activity as molecular mechanisms underlying eQTLs related to regulatory elements

    Evidence After Imputation for a Role of MICA Variants in Nonprogression and Elite Control of HIV Type 1 Infection

    Get PDF
    Past genome-wide association studies (GWAS) involving individuals with AIDS have mainly identified associations in the HLA region. Using the latest software, we imputed 7 million single-nucleotide polymorphisms (SNPs)/indels of the 1000 Genomes Project from the GWAS-determined genotypes of individuals in the Genomics of Resistance to Immunodeficiency Virus AIDS nonprogression cohort and compared them with those of control cohorts. The strongest signals were in MICA, the gene encoding major histocompatibility class I polypeptide-related sequence A (P = 3.31 × 10−12), with a particular exonic deletion (P = 1.59 × 10−8) in full linkage disequilibrium with the reference HCP5 rs2395029 SNP. Haplotype analysis also revealed an additive effect between HLA-C, HLA-B, and MICA variants. These data suggest a role for MICA in progression and elite control of human immunodeficiency virus type 1 infectio

    Genome-wide association study identifies loci associated with liability to alcohol and drug dependence that is associated with variability in reward-related ventral striatum activity in African- and European-Americans.

    Get PDF
    Genetic influences on alcohol and drug dependence partially overlap, however, specific loci underlying this overlap remain unclear. We conducted a genome-wide association study (GWAS) of a phenotype representing alcohol or illicit drug dependence (ANYDEP) among 7291 European-Americans (EA; 2927 cases) and 3132 African-Americans (AA: 1315 cases) participating in the family-based Collaborative Study on the Genetics of Alcoholism. ANYDEP was heritable (h 2 in EA = 0.60, AA = 0.37). The AA GWAS identified three regions with genome-wide significant (GWS; P < 5E-08) single nucleotide polymorphisms (SNPs) on chromosomes 3 (rs34066662, rs58801820) and 13 (rs75168521, rs78886294), and an insertion-deletion on chromosome 5 (chr5:141988181). No polymorphisms reached GWS in the EA. One GWS region (chromosome 1: rs1890881) emerged from a trans-ancestral meta-analysis (EA + AA) of ANYDEP, and was attributable to alcohol dependence in both samples. Four genes (AA: CRKL, DZIP3, SBK3; EA: P2RX6) and four sets of genes were significantly enriched within biological pathways for hemostasis and signal transduction. GWS signals did not replicate in two independent samples but there was weak evidence for association between rs1890881 and alcohol intake in the UK Biobank. Among 118 AA and 481 EA individuals from the Duke Neurogenetics Study, rs75168521 and rs1890881 genotypes were associated with variability in reward-related ventral striatum activation. This study identified novel loci for substance dependence and provides preliminary evidence that these variants are also associated with individual differences in neural reward reactivity. Gene discovery efforts in non-European samples with distinct patterns of substance use may lead to the identification of novel ancestry-specific genetic markers of risk

    Scans for signatures of selection in Russian cattle breed genomes reveal new candidate genes for environmental adaptation and acclimation

    Get PDF
    Domestication and selective breeding has resulted in over 1000 extant cattle breeds. Many of these breeds do not excel in important traits but are adapted to local environments. These adaptations are a valuable source of genetic material for efforts to improve commercial breeds. As a step toward this goal we identified candidate regions to be under selection in genomes of nine Russian native cattle breeds adapted to survive in harsh climates. After comparing our data to other breeds of European and Asian origins we found known and novel candidate genes that could potentially be related to domestication, economically important traits and environmental adaptations in cattle. The Russian cattle breed genomes contained regions under putative selection with genes that may be related to adaptations to harsh environments (e.g., AQP5, RAD50, and RETREG1). We found genomic signatures of selective sweeps near key genes related to economically important traits, such as the milk production (e.g., DGAT1, ABCG2), growth (e.g., XKR4), and reproduction (e.g., CSF2). Our data point to candidate genes which should be included in future studies attempting to identify genes to improve the extant breeds and facilitate generation of commercial breeds that fit better into the environments of Russia and other countries with similar climates

    HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data

    Get PDF
    As the more recent next-generation sequencing (NGS) technologies provide longer read sequences, the use of sequencing datasets for complete haplotype phasing is fast becoming a reality, allowing haplotype reconstruction of a single sequenced genome. Nearly all previous haplotype reconstruction studies have focused on diploid genomes and are rarely scalable to genomes with higher ploidy. Yet computational investigations into polyploid genomes carry great importance, impacting plant, yeast and fish genomics, as well as the studies of the evolution of modern-day eukaryotes and (epi)genetic interactions between copies of genes. In this paper, we describe a novel maximum-likelihood estimation framework, HapTree, for polyploid haplotype assembly of an individual genome using NGS read datasets. We evaluate the performance of HapTree on simulated polyploid sequencing read data modeled after Illumina sequencing technologies. For triploid and higher ploidy genomes, we demonstrate that HapTree substantially improves haplotype assembly accuracy and efficiency over the state-of-the-art; moreover, HapTree is the first scalable polyplotyping method for higher ploidy. As a proof of concept, we also test our method on real sequencing data from NA12878 (1000 Genomes Project) and evaluate the quality of assembled haplotypes with respect to trio-based diplotype annotation as the ground truth. The results indicate that HapTree significantly improves the switch accuracy within phased haplotype blocks as compared to existing haplotype assembly methods, while producing comparable minimum error correction (MEC) values. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2–5.National Science Foundation (U.S.) (NSF/NIH BIGDATA Grant R01GM108348-01)National Science Foundation (U.S.) (Graduate Research Fellowship)Simons Foundatio

    Breakpoint mapping and haplotype analysis of translocation t(1;12)(q43;q21.1) in two apparently independent families with vascular phenotypes

    Get PDF
    Background The risk of serious congenital anomaly for de novo balanced translocations is estimated to be at least 6%. We identified two apparently independent families with a balanced t(1;12)(q43;q21.1) as an outcome of a ''Systematic Survey of Balanced Chromosomal Rearrangements in Finns. ''In the first family, carriers (n=6) manifest with learning problems in childhood, and later with unexplained neurological symptoms (chronic headache, balance problems, tremor, fatigue) and cerebral infarctions in their 50s. In the second family, two carriers suffer from tetralogy of Fallot, one from transient ischemic attack and one from migraine. The translocation cosegregates with these vascular phenotypes and neurological symptoms. Methods and Results We narrowed down the breakpoint regions using mate pair sequencing. We observed conserved haplotypes around the breakpoints, pointing out that this translocation has arisen only once. The chromosome 1 breakpoint truncates a CHRM3 processed transcript, and is flanked by the 5 end of CHRM3 and the 3 end of RYR2. TRHDE, KCNC2, and ATXN7L3B flank the chromosome 12 breakpoint. Conclusions This study demonstrates a balanced t(1;12)(q43;q21.1) with conserved haplotypes on the derived chromosomes. The translocation seems to result in vascular phenotype, with or without neurological symptoms, in at least two families. We suggest that the translocation influences the positional expression of CHRM3, RYR2,TRHDE, KCNC2, and/or ATXN7L3B.Peer reviewe

    Genotype imputation using the Positional Burrows Wheeler Transform.

    Get PDF
    Genotype imputation is the process of predicting unobserved genotypes in a sample of individuals using a reference panel of haplotypes. In the last 10 years reference panels have increased in size by more than 100 fold. Increasing reference panel size improves accuracy of markers with low minor allele frequencies but poses ever increasing computational challenges for imputation methods. Here we present IMPUTE5, a genotype imputation method that can scale to reference panels with millions of samples. This method continues to refine the observation made in the IMPUTE2 method, that accuracy is optimized via use of a custom subset of haplotypes when imputing each individual. It achieves fast, accurate, and memory-efficient imputation by selecting haplotypes using the Positional Burrows Wheeler Transform (PBWT). By using the PBWT data structure at genotyped markers, IMPUTE5 identifies locally best matching haplotypes and long identical by state segments. The method then uses the selected haplotypes as conditioning states within the IMPUTE model. Using the HRC reference panel, which has ∼65,000 haplotypes, we show that IMPUTE5 is up to 30x faster than MINIMAC4 and up to 3x faster than BEAGLE5.1, and uses less memory than both these methods. Using simulated reference panels we show that IMPUTE5 scales sub-linearly with reference panel size. For example, keeping the number of imputed markers constant, increasing the reference panel size from 10,000 to 1 million haplotypes requires less than twice the computation time. As the reference panel increases in size IMPUTE5 is able to utilize a smaller number of reference haplotypes, thus reducing computational cost

    Shape-IT: new rapid and accurate algorithm for haplotype inference

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We have developed a new computational algorithm, Shape-IT, to infer haplotypes under the genetic model of coalescence with recombination developed by Stephens et al in Phase v2.1. It runs much faster than Phase v2.1 while exhibiting the same accuracy. The major algorithmic improvements rely on the use of binary trees to represent the sets of candidate haplotypes for each individual. These binary tree representations: (1) speed up the computations of posterior probabilities of the haplotypes by avoiding the redundant operations made in Phase v2.1, and (2) overcome the exponential aspect of the haplotypes inference problem by the smart exploration of the most plausible pathways (ie. haplotypes) in the binary trees.</p> <p>Results</p> <p>Our results show that Shape-IT is several orders of magnitude faster than Phase v2.1 while being as accurate. For instance, Shape-IT runs 50 times faster than Phase v2.1 to compute the haplotypes of 200 subjects on 6,000 segments of 50 SNPs extracted from a standard Illumina 300 K chip (13 days instead of 630 days). We also compared Shape-IT with other widely used software, Gerbil, PL-EM, Fastphase, 2SNP, and Ishape in various tests: Shape-IT and Phase v2.1 were the most accurate in all cases, followed by Ishape and Fastphase. As a matter of speed, Shape-IT was faster than Ishape and Fastphase for datasets smaller than 100 SNPs, but Fastphase became faster -but still less accurate- to infer haplotypes on larger SNP datasets.</p> <p>Conclusion</p> <p>Shape-IT deserves to be extensively used for regular haplotype inference but also in the context of the new high-throughput genotyping chips since it permits to fit the genetic model of Phase v2.1 on large datasets. This new algorithm based on tree representations could be used in other HMM-based haplotype inference software and may apply more largely to other fields using HMM.</p

    Assessing the role of insulin-like growth factors and binding proteins in prostate cancer using Mendelian randomization:genetic variants as instruments for circulating levels

    Get PDF
    Circulating insulin-like growth factors (IGFs) and their binding proteins (IGFBPs) are associated with prostate cancer. Using genetic variants as instruments for IGF peptides, we investigated whether these associations are likely to be causal. We identified from the literature 56 single nucleotide polymorphisms (SNPs) in the IGF axis previously associated with biomarker levels (8 from a genome-wide association study [GWAS] and 48 in reported candidate genes). In ∼700 men without prostate cancer and two replication cohorts (N∼900 and ∼9,000), we examined the properties of these SNPS as instrumental variables (IVs) for IGF-I, IGF-II, IGFBP-2 and IGFBP-3. Those confirmed as strong IVs were tested for association with prostate cancer risk, low (< 7) vs high (≥ 7) Gleason grade, localised vs advanced stage, and mortality, in 22,936 controls and 22,992 cases. IV analysis was used in an attempt to estimate the causal effect of circulating IGF peptides on prostate cancer. Published SNPs in the IGFBP1/IGFBP3 gene region, particularly rs11977526, were strong instruments for IGF-II and IGFBP-3, less so for IGF-I. Rs11977526 was associated with high (vs low) Gleason grade (OR per IGF-II/IGFBP-3 level-raising allele 1.05; 95% CI 1.00, 1.10). Using rs11977526 as an IV we estimated the causal effect of a one SD increase in IGF-II (∼265 ng/ml) on risk of high vs low grade disease as 1.14 (95% CI 1.00, 1.31). Because of the potential for pleiotropy of the genetic instruments, these findings can only causally implicate the IGF pathway in general, not any one specific biomarker. This article is protected by copyright. All rights reserved
    corecore