68 research outputs found
A novel application of pattern recognition for accurate SNP and indel discovery from high-throughput data: Targeted resequencing of the glucocorticoid receptor co-chaperone FKBP5 in a Caucasian population
The detection of single nucleotide polymorphisms (SNPs) and insertion/deletions (indels) with precision from high-throughput data remains a significant bioinformatics challenge. Accurate detection is necessary before next-generation sequencing can routinely be used in the clinic. In research, scientific advances are inhibited by gaps in data, exemplified by the underrepresented discovery of rare variants, variants in non-coding regions and indels. The continued presence of false positives and false negatives prevents full automation and requires additional manual verification steps. Our methodology presents applications of both pattern recognition and sensitivity analysis to eliminate false positives and aid in the detection of SNP/indel loci and genotypes from high-throughput data. We chose FK506-binding protein 51(FKBP5) (6p21.31) for our clinical target because of its role in modulating pharmacological responses to physiological and synthetic glucocorticoids and because of the complexity of the genomic region. We detected genetic variation across a160 kb region encompassing FKBP5. 613 SNPs and 57 indels, including a 3.3 kb deletion were discovered. We validated our method using three independent data sets and, with Sanger sequencing and Affymetrix and Illumina microarrays, achieved 99% concordance. Furthermore we were able to detect 267 novel rare variants and assess linkage disequilibrium. Our results showed both a sensitivity and specificity of 98%, indicating near perfect classification between true and false variants. The process is scalable and amenable to automation, with the downstream filters taking only 1.5 hours to analyze 96 individuals simultaneously. We provide examples of how our level of precision uncovered the interactions of multiple loci, their predicted influences on mRNA stability, perturbations of the hsp90 binding site, and individual variation in FKBP5 expression. Finally we show how our discovery of rare variants may change current conceptions of evolution at this locus
Promoter-proximal transcription factor binding is transcriptionally active when coupled with nucleosome repositioning in immediate vicinity
Previous studies have analyzed patterns of transcription, Transcription Factor (TF) binding or mapped nucleosome occupancy across the genome. These suggest that the three aspects are genetically connected but the cause and effect relationships are still unknown. For example, physiologic TF binding studies involve many TFs, consequently, it is difficult to assign nucleosome reorganization to the binding site occupancy of any particular TF. Therefore, several aspects remain unclear: does TF binding influence nucleosome (re)organizations locally or impact the chromatin landscape at a more global level; are all or only a fraction of TF binding a result of reorganization in nucleosome occupancy and do all TF binding and associated changes in nucleosome occupancy result in altered gene expression? With these in mind, following characterization of two states (before and after induction of a single TF of choice) we determined: (i) genomic binding sites of the TF, (ii) promoter nucleosome occupancy and (iii) transcriptome profiles. Results demonstrated that promoter-proximal TF binding influenced expression of the target gene when it was coupled to nucleosome repositioning at or close to its binding site in most cases. In contrast, only in few cases change in target gene expression was found when TF binding occurred without local nucleosome reorganization
Whole genome sequencing of cyanobacterium Nostoc sp. CCCryo 231-06 using microfluidic single cell technology
The Nostoc sp. strain CCCryo 231-06 is a cyanobacterial strain capable of surviving under extreme conditions and thus is of great interest for the astrobiology community. The knowledge of its complete genome sequence would serve as a guide for further studies. However, a major concern has been placed on the effects of contamination on the quality of sequencing data without a reference genome. Here, we report the use of microfluidic technology combined with single cell sequencing and de novo assembly to minimize the contamination and recover the complete genome of the Nostoc strain CCCryo 231-06 with high quality. 100% of the whole genome was recovered with all contaminants removed and a strongly supported phylogenetic tree. The data reported can be useful for comparative genomics for phylogenetic and taxonomic studies. The method used in this work can be applied to studies that require high-quality assemblies of genomes of unknown microorganisms
Functional Genetic Polymorphisms in the Aromatase Gene CYP19 Vary the Response of Breast Cancer Patients to Neoadjuvant Therapy with Aromatase Inhibitors
Aromatase (CYP19) is a critical enzyme for estrogen biosynthesis, and aromatase inhibitors (AIs) are established endocrine therapy for post-menopausal women with breast cancer. DNA samples were obtained from 52 women pre- and post-AI treatment in the neoadjuvant setting. 82 breast cancer and 19 normal breast samples were resequenced to test the hypothesis that single nucleotide polymorphisms (SNPs) in the CYP19 gene might contribute to response to neoadjuvant AI therapy. There were no differences in CYP19 sequence between tumor and germline DNA in the same patient. Forty-eight CYP19 SNPs were identified, with four being novel when compared with previous resequencing data. Genotype-phenotype association studies performed with levels of aromatase activity, estrone, estradiol and tumor size pre- and post-AI treatment indicated that two tightly linked SNPs, rs6493497 and rs7176005 in the 5’-flanking region of CYP19 exon 1.1, were significantly associated with a greater change in aromatase activity after AI treatment. A follow-up study in 200 women with early breast cancer treated with adjuvant anastrozole showed that these same two SNPs were also associated with higher plasma estradiol levels pre- and post-AI treatment. Electrophoretic mobility shift and reporter gene assays confirmed the potential functional effects of these two SNPs on transcription regulation. These studies provide insight into the role of common genetic polymorphisms in CYP19 in variation in response to AIs by breast cancer patients
Human catechol O-methyltransferase (COMT): Gene resequencing and functional characterisation of polymorphisms.
Catechol O-methyltransferase (COMT) plays an Important role in the metabolism of catecholamlnes, catecholestrogens and catechol drugs. A common COMT G472A genetic polymorphism (Val108/158Met) that was identified previously is associated with decreased levels of enzyme activity and has been implicated as a possible risk factor for neuropsychiatric disease. We set out to 'resequence' the human COMT gene using DNA samples from 60 African-American and 60 Caucasian-American subjects. A total of 23 single nucleotlde polymorphisms (SNPs), including a novel nonsynonymous cSNP present only in DNA from African-American subjects, and one insertion/deletion were observed. The wild type (WT) and two variant allozymes, Thr52 and Met108, were transiently expressed in COS-1 and HEK293 cells. There was no significant change in level of COMT activity for the Thr52 variant allozyme, but there was a 40% decrease In the level of activity in cells transfected with the Met108 construct. Apparent Km values of the WT and variant allozymes for the two reaction cosubstrates differed slightly, but significantly, for 3,4-dihydroxybenzoic acid but not for S-adenosyl-L-methionine. The Met108 allozyme displayed a 70-90% decrease in immunoreactive protein when compared with WT, but there was no significant change in the level of immunoreactive protein for Thr52. A significant decrease in the level of immunoreactive protein was also observed in hepatic biopsy samples from patients homozygous for the allele encoding Met108. These observations represent steps toward an understanding of molecular genetic mechanisms responsible for variation in COMT level and/or properties, variation that may contribute to the pathophysiology of neuropsychiatric disease
- …