474 research outputs found

    What Is a Microsatellite: A Computational and Experimental Definition Based upon Repeat Mutational Behavior at A/T and GT/AC Repeats

    Get PDF
    Microsatellites are abundant in eukaryotic genomes and have high rates of strand slippage-induced repeat number alterations. They are popular genetic markers, and their mutations are associated with numerous neurological diseases. However, the minimal number of repeats required to constitute a microsatellite has been debated, and a definition of a microsatellite that considers its mutational behavior has been lacking. To define a microsatellite, we investigated slippage dynamics for a range of repeat sizes, utilizing two approaches. Computationally, we assessed length polymorphism at repeat loci in ten ENCODE regions resequenced in four human populations, assuming that the occurrence of polymorphism reflects strand slippage rates. Experimentally, we determined the in vitro DNA polymerase-mediated strand slippage error rates as a function of repeat number. In both approaches, we compared strand slippage rates at tandem repeats with the background slippage rates. We observed two distinct modes of mutational behavior. At small repeat numbers, slippage rates were low and indistinguishable from background measurements. A marked transition in mutability was observed as the repeat array lengthened, such that slippage rates at large repeat numbers were significantly higher than the background rates. For both mononucleotide and dinucleotide microsatellites studied, the transition length corresponded to a similar number of nucleotides (approximately 10). Thus, microsatellite threshold is determined not by the presence/absence of strand slippage at repeats but by an abrupt alteration in slippage rates relative to background. These findings have implications for understanding microsatellite mutagenesis, standardization of genome-wide microsatellite analyses, and predicting polymorphism levels of individual microsatellite loci

    Development and validation of a next generation sequencing based microsatellite instability assay for routine clinical use

    Get PDF
    PhD ThesisColorectal cancer (CRC) is the second most common cancer in both men and women. Approximately 3-5% of CRCs show microsatellite instability (MSI) caused by germline defects in mismatch repair genes. In addition, 12% of sporadic CRCs show MSI. Currently, MSI is tested using a fragment analysis based assay not suitable for high throughput testing. Knowledge of microsatellite instability affects prognosis, surveillance and treatment of CRCs and MSI testing is now recommended for all newly diagnosed CRCs. As a result, development of high throughput approaches is desirable. The focus of my work was to develop and validate a high throughput sequence based MSI assay. Initially, I tested 25 (7-9bp) mononucleotide markers, previously identified from in silico analyses, using a cohort of 55 CRCs, and selected 8 markers which collectively could discriminate between MSI-high (MSI-H) and microsatellite stable (MSS) cases. To define the optimal parameters to discriminate between MSI-H and MSS samples, I tested these 8 markers and 9 long (8-12bp) mononucleotide markers identified in a parallel study, across a panel of 141 CRC samples. This allowed development of a scoring scheme for the 17 markers, which achieved 96% sensitivity and 100% specificity. I validated this scheme using an independent cohort of 70 CRCs without knowing their MSI status. The assay achieved a 100% sensitivity and specificity. Finally, I assessed the ability of short repeats to allow inference of the clonal variation within both FFPE (7) and fresh (4) MSI-H CRCs by analysing multiple samples from each cancer. I was able to infer the lineage relationship between primary tumour and lymph node metastasis in three cases and to construct phylogenetic trees for all cancers for which multiple samples were available illustrating the utility of these markers for understanding of CRC clonal variation.Higher Committee for Education Development in Iraq (HCED Iraq

    An Investigation of Links Between Simple Sequences and Meiotic Recombination Hotspots

    Get PDF
    Previous evidence has shown that the simple sequences microsatellites and poly-purine/poly-pyrimidine tracts (PPTs) could be both a cause, and an effect, of meiotic recombination. The causal link between simple sequences and recombination has not been much explored, however, probably because other evidence has cast doubt on its generality, though this evidence has never been conclusive. Several questions have remained unanswered in the literature, and I have addressed aspects of three of them in my thesis. First, what is the scale and magnitude of the association between simple sequences and recombination? I found that microsatellites and PPTs are strongly associated with meiotic double-strand break (DSB) hotspots in yeast, and that PPTs are generally more common in human recombination hotspots, particularly in close proximity to hotspot central regions, in which recombination events are markedly more frequent. I also showed that these associations can't be explained by coincidental mutual associations between simple sequences, recombination and other factors previously shown to correlate with both. A second question not conclusively answered in the literature is whether simple sequences, or their high levels of polymorphism, are an effect of recombination. I used three methods to address this question. Firstly, I investigated the distributions of two-copy tandem repeats and short PPTs in relation to yeast DSB hotspots in order to look for evidence of an involvement of recombination in simple sequence formation. I found no significant associations. Secondly, I compared the fraction of simple sequences containing polymorphic sites between human recombination hotspots and coldspots. The third method I used was generalized linear model analysis, with which I investigated the correlation between simple sequence variation and recombination rate, and the influence on the correlation of additional factors with potential relevance including GC-content and gene density. Both the direct comparison and correlation methods showed a very weak and inconsistent effect of recombination on simple sequence polymorphism in the human genome.Whether simple sequences are an important cause of recombination events is a third question that has received relatively little previous attention, and I have explored one aspect of it. Simple sequences of the types I studied have previously been shown to form non-B-DNA structures, which can be recombinagenic in model systems. Using a previously described sodium bisulphite modification assay, I tested for the presence of these structures in sequences amplified from the central regions of hotspots and cloned into supercoiled plasmids. I found significantly higher sensitivity to sodium bisulphite in humans in than in chimpanzees in three out of six genomic regions in which there is a hotspot in humans but none in chimpanzees. In the DNA2 hotspot, this correlated with a clear difference in numbers of molecules showing long contiguous strings of converted cytosines, which are present in previously described intramolecular quadruplex and triplex structures. Two out of the five other hotspots tested show evidence for secondary structure comparable to a known intramolecular triplex, though with similar patterns in humans and chimpanzees. In conclusion, my results clearly motivate further investigation of a functional link between simple sequences and meiotic recombination, including the putative role of non-B-DNA structures

    DNA Sequences Shaped by Selection for Stability

    Get PDF
    The sequence of a stretch of nucleotides affects its propensity for errors during replication and expression. Are proteins encoded by stable or unstable nucleotide sequences? If selection for variability is prevalent, one could expect an excess of unstable sequences. Alternatively, if selection against targets for errors were substantial, an excess of stable sequences would be expected. We screened the genome sequences of different organisms for an important determinant of stability, the presence of mononucleotide repeats. We find that codons are used to encode proteins in a way that avoids the emergence of mononucleotide repeats, and we can attribute this bias to selection rather than a neutral process. This indicates that selection for stability, rather than for the generation of variation, substantially influences how information is encoded in the genome

    Evidence for Widespread Convergent Evolution around Human Microsatellites

    Get PDF
    Microsatellites are a major component of the human genome, and their evolution has been much studied. However, the evolution of microsatellite flanking sequences has received less attention, with reports of both high and low mutation rates and of a tendency for microsatellites to cluster. From the human genome we generated a database of many thousands of (AC)(n) flanking sequences within which we searched for common characteristics. Sequences flanking microsatellites of similar length show remarkable levels of convergent evolution, indicating shared mutational biases. These biases extend 25–50 bases either side of the microsatellite and may therefore affect more than 30% of the entire genome. To explore the extent and absolute strength of these effects, we quantified the observed convergence. We also compared homologous human and chimpanzee loci to look for evidence of changes in mutation rate around microsatellites. Most models of DNA sequence evolution assume that mutations are independent and occur randomly. Allowances may be made for sites mutating at different rates and for general mutation biases such as the faster rate of transitions over transversions. Our analysis suggests that these models may be inadequate, in that proximity to even very short microsatellites may alter the rate and distribution of mutations that occur. The elevated local mutation rate combined with sequence convergence, both of which we find evidence for, also provide a possible resolution for the apparently contradictory inferences of mutation rates in microsatellite flanking sequences

    Computational Mining and Survey of Simple Sequence Repeats (SSRs) in Expressed Sequence Tags (ESTs) of Dicotyledonous Plants

    Get PDF
    Submitted to the faculty of the School of Informatics in partial fulfillment of the requirements for the degree Master of Science in Bioinformatics in the School of Informatics,Indiana University July, 2004DNA markers have revolutionized the field of genetics by increasing the pace of genetic analysis. Simple sequence repeats (SSRs) are repetitions of nucleotide motifs of 1 to 5 bases and are currently the markers of choice in many plant and animal genomes due to their abundant distribution in the genomes, hypervariable nature and suitability for high-throughput analysis. While SSRs, once developed, are extremely valuable, their development is time consuming, laborious and expensive. Sequences from many genomes are continuously made freely available in the public databases and mining of these sources using computational approaches permits rapid and economical marker development. Expressed sequence tags (ESTs) are ideal candidates for mining SSRs not only because of their availability in large numbers but also due to the fact that they represent expressed genes. Large scale SSR mining efforts in plants to date focused on monocotyledonous plants. In this project, an efficient SSR identification tool was developed and used to mine SSRs from more than 53 dicotyledonous species. A total of 92,648 non-redundant ESTs or 6.0% of the 1.54 million dicotyledonous ESTs investigated in this study were found to contain SSRs. The frequency of non-redundant-ESTs containing SSRs among the species investigated ranged from 2.65% to 16.82%. More than 80% of the non-redundant ESTs having SSRs contained a single SSR repeat while others contained 2 or more SSRs. An extensive analysis of the occurrence and frequencies of various SSR types revealed that the A/T mononucleotide, AG/GA/CT/TC dinucleotide, AAG/AGA/GAA/CTT/TTC/TCT trinucleotide and TTTA and TTAA tetranucleotide repeats are the most abundant in dicotyledonous species. In addition, an analysis of the number of repeats across species revealed that majority of the mononucleotide SSRs contained 15-25 repeats while majority of the di- and tri-nucleotide SSRs contained 5-10 repeats. By providing valuable information on the abundance of SSRs in ESTs of a large number of dicotyledonous species, this study demonstrates the potential of computational mining approach for rapid discovery of SSRs towards the development of markers for genetic analysis and related applications

    Local DNA dynamics shape mutational patterns of mononucleotide repeats in human genomes

    Get PDF
    Single base substitutions (SBSs) and insertions/deletions are critical for generating population diversity and can lead both to inherited disease and cancer. Whereas on a genome-wide scale SBSs are influenced by cellular factors, on a fine scale SBSs are influenced by the local DNA sequence-context, although the role of flanking sequence is often unclear. Herein, we used bioinformatics, molecular dynamics and hybrid quantum mechanics/molecular mechanics to analyze sequence context-dependent mutagenesis at mononucleotide repeats (A-tracts and G-tracts) in human population variation and in cancer genomes. SBSs and insertions/deletions occur predominantly at the first and last base-pairs of A-tracts, whereas they are concentrated at the second and third base-pairs in G-tracts. These positions correspond to the most flexible sites along A-tracts, and to sites where a ‘hole’, generated by the loss of an electron through oxidation, is most likely to be localized in G-tracts. For A-tracts, most SBSs occur in the direction of the base-pair flanking the tracts. We conclude that intrinsic features of local DNA structure, i.e. base-pair flexibility and charge transfer, render specific nucleotides along mononucleotide runs susceptible to base modification, which then yields mutations. Thus, local DNA dynamics contributes to phenotypic variation and disease in the human population

    Genetic Polymorphisms and Molecular Pathogenesis of Endometriosis

    Get PDF

    Susceptibility to late onset hearing loss: an investigation into genetic variation at the Brn-3c locus.

    Get PDF
    BrnSc (BrnS.l, POU4F3) encoding a POU domain transcription factor is a candidate gene for late onset sensorineural hearing loss, which is exhibited by a large proportion of the ageing population. To identify common sequence variants at the Brn-3c locus mutation scanning of the BrnSc cDNA, intron and 5'-flanking region was performed by PCR-SSCP analysis in 45 members of the general population. Seven polymorphic sites were identified of which five within the Bm-Sc 5'-flanking region appear common. A functional screening approach utilising in-vitro assays suggests that at least three common sequence variants in the Brn-Sc 5'-flanking region could have a functional affect: -566(GT)i7-23, -1391A>C and a complex multi-allelic poly-G polymorphism at - 3432 that exhibits multiple variations in length together with single base substitutions within the guanine repeat. The -3432poly-G polymorphism modifies the binding affinity of an OC-2 derived nuclear protein and there is convincing evidence that this is the transcription factor SP1. Use of purified human recombinant SP1 protein, in-vitro translated SP1 and in-vitro translated SP3 confirms that the -3432polyG polymorphism modulates a high affinity SP family binding site and evidence suggests that this alters the regulation of the BrnSc promoter when SP1 levels are limiting, p<0.05. Moreover, the data suggest a functional interaction between the -3432poly-G polymorphism and the -566(GT)i7.23 repeat which associate to determine the response of the Brn-3c gene to SP1. Similarly, evidence suggests that the variant allele, -1391C has a reduced affinity for an OC-2 derived nuclear protein and this is consistent with a significant decrease in basal activity of the Brn-Sc promoter, pC were genotyped for a pilot association study but allelic frequencies were not found to significantly differ between the patient and control populations examined (by %2 analysis). Further large-scale population studies are required to establish whether these common sequence variants are associated with late onset hearing loss

    Assembly and Compositional Analysis of Human Genomic DNA - Doctoral Dissertation, August 2002

    Get PDF
    In 1990, the United States Human Genome Project was initiated as a fifteen-year endeavor to sequence the approximately three billion bases making up the human genome (Vaughan, 1996).As of December 31, 2001, the public sequencing efforts have sequenced a total of 2.01 billion finished bases representing 63.0% of the human genome (http://www.ncbi.nlm.nih.gov/genome/seq/page.cgi?F=HsProgress.shtml&&ORG=Hs) to a Bermuda quality error rate of 1/10000 (Smith and Carrano, 1996). In addition, 1.11 billion bases representing 34.8% of the human genome has been sequenced to a rough-draft level. Efforts such as UCSC\u27s GoldenPath (Kent and Haussler, 2001) and NCBI\u27s contig assembly (Jang et al., 1999) attempt to assemble the human genome by incorporating both finished and rough-draft sequence. The availability of the human genome data allows us to ask questions concerning the maintenance of specific regions of the human genome. We consider two hypotheses for maintenance of high G+C regions: the presence of specific repetitive elements and compositional mutation biases. Our results rule out the possibility of the G+C content of repetitive elements determining regions of high and low G+C regions in the human genome. We determine that there is a compositional bias for mutation rates. However, these biases are not responsible for the maintenance of high G+C regions. In addition, we show that regions of the human under less selective pressure will mutate towards a higher A+T composition, regardless of the surrounding G+C composition. We also analyze sequence organization and show that previous studies of isochore regions (Bernardi,1993) cannot be generalized within the human genome. In addition, we propose a method to assemble only those parts of the human genome that are finished into larger contigs. Analysis of the contigs can lead to the mining of meaningful biological data that can give insights into genetic variation and evolution. I suggest a method to help aid in single nucleotide polymorphism (SNP)detection, which can help to determine differences within a population. I also discuss a dynamic-programming based approach to sequence assembly validation and detection of large-scale polymorphisms within a population that is made possible through the availability of large human sequence contigs
    corecore