25 research outputs found

    Huntingtin exists as multiple splice forms in human brain

    Get PDF
    Background: A CAG repeat expansion in HTT has been known to cause Huntington’s disease for over 20 years. The genomic sequence of the 67 exon HTT is clear but few reports have detailed alternative splicing or alternative transcripts. Most eukaryotic genes with multiple exons show alternative splicing that increases the diversity of the transcriptome and proteome: it would be surprising if a gene with 67 known exons in its two major transcripts did not present some alternative transcripts. Objective: To investigate the presence of alternative transcripts directly in human HTT. Methods: An overlapping RT-PCR based approach was used to determine novel HTT splice variants in human brain from HD patients and controls and 3D protein homology modelling employed to investigate their significance on the function of the HTT protein. Results: Here we show multiple previously unreported novel transcripts of HTT. Of the 22 splice variants found, eight were in-frame with the potential to encode novel HTT protein isoforms. Two splice variants were selected for further study; HTT Δex4,5,6 which results in the skipping of exons 4, 5 and 6 and HTTex41b which includes a novel exon created via partial retention of intron 41. 3D protein homology modelling showed that both splice variants are of potential functional significance leading to the loss of a karyopherin nuclear localisation signal and alterations to sites of posttranslational modification. Conclusions: The identification of novel HTT transcripts has implications for HTT protein isoform expression and function. Understanding the functional significance of HTT alternative splicing would be critical to guide the design of potential therapeutics in HD that aim to reduce the toxic HTT transcript or protein product including RNA silencing and correction of mis-splicing in disease

    Multiple novel prostate cancer susceptibility signals identified by fine-mapping of known risk loci among Europeans

    Get PDF
    Genome-wide association studies (GWAS) have identified numerous common prostate cancer (PrCa) susceptibility loci. We have fine-mapped 64 GWAS regions known at the conclusion of the iCOGS study using large-scale genotyping and imputation in 25 723 PrCa cases and 26 274 controls of European ancestry. We detected evidence for multiple independent signals at 16 regions, 12 of which contained additional newly identified significant associations. A single signal comprising a spectrum of correlated variation was observed at 39 regions; 35 of which are now described by a novel more significantly associated lead SNP, while the originally reported variant remained as the lead SNP only in 4 regions. We also confirmed two association signals in Europeans that had been previously reported only in East-Asian GWAS. Based on statistical evidence and linkage disequilibrium (LD) structure, we have curated and narrowed down the list of the most likely candidate causal variants for each region. Functional annotation using data from ENCODE filtered for PrCa cell lines and eQTL analysis demonstrated significant enrichment for overlap with bio-features within this set. By incorporating the novel risk variants identified here alongside the refined data for existing association signals, we estimate that these loci now explain ∟38.9% of the familial relative risk of PrCa, an 8.9% improvement over the previously reported GWAS tag SNPs. This suggests that a significant fraction of the heritability of PrCa may have been hidden during the discovery phase of GWAS, in particular due to the presence of multiple independent signals within the same regio

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Chromosomal distribution of disease genes in the human genome

    No full text
    Genes are nonrandomly distributed in the human genome, both within and between chromosomes. Thus, genes of similar function and common evolutionary origin are often clustered, as are genes with similar expression profiles. We now report that the >2400 genes known to underlie human monogenic inherited disease are non-randomly distributed in the genome over and above the general nonrandomness evident in the distribution of human genes. Further, a subset of 315 inherited disease genes subject to gross deletion was found to exhibit a degree of clustering that was twice that manifested by disease genes in general. The clustering of human disease genes is likely to have important implications for understanding the genotype-phenotype relationship in contiguous gene syndromes as well as those conditions characterized by multigene deletions or complex chromosomal rearrangements

    A meta-analysis of nonsense mutations causing human genetic disease

    No full text
    Nonsense mutations account for ∼11% of all described gene lesions causing human inherited disease and ∼20% of disease-associated single-basepair substitutions affecting gene coding regions. Pathological nonsense mutations resulting in TGA (38.5%), TAG (40.4%), and TAA (21.1%) occur in different proportions to naturally occurring stop codons. Of the 23 different nucleotide substitutions giving rise to nonsense mutations, the most frequent are CGA → TGA (21%; resulting from methylation-mediated deamination) and CAG → TAG (19%). The differing nonsense mutation frequencies are largely explicable in terms of variable nucleotide substitution rates such that it is unnecessary to invoke differential translational termination efficiency or differential codon usage. Some genes are characterized by numerous nonsense mutations but relatively few if any missense mutations (e.g., CHM) whereas other genes exhibit many missense mutations but few if any nonsense mutations (e.g., PSEN1). Genes in the latter category have a tendency to encode proteins characterized by multimer formation. Consistent with the operation of a clinical selection bias, genes exhibiting an excess of nonsense mutations are also likely to display an excess of frameshift mutations. Tumor suppressor (TS) genes exhibit a disproportionate number of nonsense mutations while most mutations in oncogenes are missense. A total of 12% of somatic nonsense mutations in TS genes were found to occur recurrently in the hypermutable CpG dinucleotide. In a comparison of somatic and germline mutational spectra for 17 TS genes, ∼43% of somatic nonsense mutations had counterparts in the germline (rising to 98% for CpG mutations). Finally, the proportion of disease-causing nonsense mutations predicted to elicit nonsense-mediated mRNA decay (NMD) is significantly higher (P=1.56 × 10−9) than among nonobserved (potential) nonsense mutations, implying that nonsense mutations that elicit NMD are more likely to come to clinical attention

    Cross-comparison of the genome sequences from human, chimpanzee, Neanderthal and a Denisovan hominin identifies novel potentially compensated mutations

    Get PDF
    The recent publication of the draft genome sequences of the Neanderthal and a ∼50,000-year-old archaic hominin from Denisova Cave in southern Siberia has ushered in a new age in molecular archaeology. We previously cross-compared the human, chimpanzee and Neanderthal genome sequences with respect to a set of disease-causing/disease-associated missense and regulatory mutations (Human Gene Mutation Database) and succeeded in identifying genetic variants which, although apparently pathogenic in humans, may represent a ‘compensated’ wild-type state in at least one of the other two species. Here, in an attempt to identify further ‘potentially compensated mutations’ (PCMs) of interest, we have compared our dataset of disease-causing/disease-associated mutations with their corresponding nucleotide positions in the Denisovan hominin, Neanderthal and chimpanzee genomes. Of the 15 human putatively disease-causing mutations that were found to be compensated in chimpanzee, Denisovan or Neanderthal, only a solitary F5 variant (Val1736Met) was specific to the Denisovan. In humans, this missense mutation is associated with activated protein C resistance and an increased risk of thromboembolism and recurrent miscarriage. It is unclear at this juncture whether this variant was indeed a PCM in the Denisovan or whether it could instead have been associated with disease in this ancient hominin

    Prediction of functional regulatory SNPs in monogenic and complex disease

    No full text
    Next-generation sequencing (NGS) technologies are yielding ever higher volumes of human genome sequence data. Given this large amount of data, it has become both a possibility and a priority to determine how disease-causing single nucleotide polymorphisms (SNPs) detected within gene regulatory regions (rSNPs) exert their effects on gene expression. Recently, several studies have explored whether disease-causing polymorphisms have attributes that can distinguish them from those that are neutral, attaining moderate success at discriminating between functional and putatively neutral regulatory SNPs. Here, we have extended this work by assessing the utility of both SNP-based features (those associated only with the polymorphism site and the surrounding DNA) and gene-based features (those derived from the associated gene in whose regulatory region the SNP lies) in the identification of functional regulatory polymorphisms involved in either monogenic or complex disease. Gene-based features were found to be capable of both augmenting and enhancing the utility of SNP-based features in the prediction of known regulatory mutations. Adopting this approach, we achieved an AUC of 0.903 for predicting regulatory SNPs. Finally, our tool predicted 225 new regulatory SNPs with a high degree of confidence, with 105 of the 225 falling into linkage disequilibrium blocks of reported disease-associated genome-wide association studies SNPs

    The human gene mutation database (HGMD) and its exploitation in the fields of personalized genomics and molecular evolution

    No full text
    The Human Gene Mutation Database (HGMD) constitutes a comprehensive core collection of data on germ-line mutations in nuclear genes underlying or associated with human inherited disease (http://www.hgmd.org). Data cataloged include single-base-pair substitutions in coding, regulatory, and splicing-relevant regions, micro-deletions and micro-insertions, indels, and triplet repeat expansions, as well as gross gene deletions, insertions, duplications, and complex rearrangements. Each mutation is entered into HGMD only once, in order to avoid confusion between recurrent and identical-by-descent lesions. By March 2012, the database contained in excess of 123,600 different lesions (HGMD Professional release 2012.1) detected in 4,514 different nuclear genes, with new entries currently accumulating at a rate in excess of 10,000 per annum. ~6,000 of these entries constitute disease-associated and functional polymorphisms. HGMD also includes cDNA reference sequences for more than 98% of the listed genes
    corecore