Search CORE

474 research outputs found

What Is a Microsatellite: A Computational and Experimental Definition Based upon Repeat Mutational Behavior at A/T and GT/AC Repeats

Author: Amos
Amos
Baase
Baldi
Bebenek
Brandstrom
Brandstrom
Brinkmann
Britten
Buerger
Bulmer
Buschiazzo
Chen
Chu
Cox
Crothers
Cuppens
da Silva
de Wachter
Dechering
Denver
Dieringer
Eckert
Eckert
Eckert
Eckert
Eckert
Ellegren
Ellegren
Field
Francesca Chiaromonte
Garcia-Diaz
Gebhardt
Gordon
Gragg
Hammock
Hardison
Harfe
Hile
Hile
Hile
Huang
Hui
Iglesias
International HapMap Consortium
International HapMap Consortium
Kateryna D. Makova
Kelkar
Kim
Kristin A. Eckert
Kroutil
Kunkel
Kunkel
Kunkel
Lai
Laken
Leclercq
Legendre
Levinson
Li
Li
Martin
Meloni
Merkel
Messier
Molla
Mudunuri
Nadeau
Nishizawa
Noelle Strubczewski
Noor
Opresko
Pearson
Prasad
Pupko
Rajendran
Rhead
Rich
Rockman
Rose
Rosen
Ruggiero
Schlötterer
Shah
Sinden
Suzanne E. Hile
Swan
Sweasy
Tautz
Timsit
Timsit
Wagner
Watkins
Webster
Wierdl
Woodson
Yogeshwar D. Kelkar
Zhang
Zhu
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Microsatellites are abundant in eukaryotic genomes and have high rates of strand slippage-induced repeat number alterations. They are popular genetic markers, and their mutations are associated with numerous neurological diseases. However, the minimal number of repeats required to constitute a microsatellite has been debated, and a definition of a microsatellite that considers its mutational behavior has been lacking. To define a microsatellite, we investigated slippage dynamics for a range of repeat sizes, utilizing two approaches. Computationally, we assessed length polymorphism at repeat loci in ten ENCODE regions resequenced in four human populations, assuming that the occurrence of polymorphism reflects strand slippage rates. Experimentally, we determined the in vitro DNA polymerase-mediated strand slippage error rates as a function of repeat number. In both approaches, we compared strand slippage rates at tandem repeats with the background slippage rates. We observed two distinct modes of mutational behavior. At small repeat numbers, slippage rates were low and indistinguishable from background measurements. A marked transition in mutability was observed as the repeat array lengthened, such that slippage rates at large repeat numbers were significantly higher than the background rates. For both mononucleotide and dinucleotide microsatellites studied, the transition length corresponded to a similar number of nucleotides (approximately 10). Thus, microsatellite threshold is determined not by the presence/absence of strand slippage at repeats but by an abrupt alteration in slippage rates relative to background. These findings have implications for understanding microsatellite mutagenesis, standardization of genome-wide microsatellite analyses, and predicting polymorphism levels of individual microsatellite loci

Crossref

PubMed Central

Archivio della ricerca della Scuola Superiore Sant'Anna

Development and validation of a next generation sequencing based microsatellite instability assay for routine clinical use

Author: Alhilal Mohammed Ghanim Mehdi
Publication venue: Newcastle University
Publication date: 01/01/2016
Field of study

PhD ThesisColorectal cancer (CRC) is the second most common cancer in both men and women. Approximately 3-5% of CRCs show microsatellite instability (MSI) caused by germline defects in mismatch repair genes. In addition, 12% of sporadic CRCs show MSI. Currently, MSI is tested using a fragment analysis based assay not suitable for high throughput testing. Knowledge of microsatellite instability affects prognosis, surveillance and treatment of CRCs and MSI testing is now recommended for all newly diagnosed CRCs. As a result, development of high throughput approaches is desirable. The focus of my work was to develop and validate a high throughput sequence based MSI assay. Initially, I tested 25 (7-9bp) mononucleotide markers, previously identified from in silico analyses, using a cohort of 55 CRCs, and selected 8 markers which collectively could discriminate between MSI-high (MSI-H) and microsatellite stable (MSS) cases. To define the optimal parameters to discriminate between MSI-H and MSS samples, I tested these 8 markers and 9 long (8-12bp) mononucleotide markers identified in a parallel study, across a panel of 141 CRC samples. This allowed development of a scoring scheme for the 17 markers, which achieved 96% sensitivity and 100% specificity. I validated this scheme using an independent cohort of 70 CRCs without knowing their MSI status. The assay achieved a 100% sensitivity and specificity. Finally, I assessed the ability of short repeats to allow inference of the clonal variation within both FFPE (7) and fresh (4) MSI-H CRCs by analysing multiple samples from each cancer. I was able to infer the lineage relationship between primary tumour and lymph node metastasis in three cases and to construct phylogenetic trees for all cancers for which multiple samples were available illustrating the utility of these markers for understanding of CRC clonal variation.Higher Committee for Education Development in Iraq (HCED Iraq

Newcastle University eTheses

An Investigation of Links Between Simple Sequences and Meiotic Recombination Hotspots

Author: Bagshaw Andrew Tobias Matthew
Publication venue: University of Canterbury. Biological Sciences
Publication date: 01/01/2008
Field of study

Previous evidence has shown that the simple sequences microsatellites and poly-purine/poly-pyrimidine tracts (PPTs) could be both a cause, and an effect, of meiotic recombination. The causal link between simple sequences and recombination has not been much explored, however, probably because other evidence has cast doubt on its generality, though this evidence has never been conclusive. Several questions have remained unanswered in the literature, and I have addressed aspects of three of them in my thesis. First, what is the scale and magnitude of the association between simple sequences and recombination? I found that microsatellites and PPTs are strongly associated with meiotic double-strand break (DSB) hotspots in yeast, and that PPTs are generally more common in human recombination hotspots, particularly in close proximity to hotspot central regions, in which recombination events are markedly more frequent. I also showed that these associations can't be explained by coincidental mutual associations between simple sequences, recombination and other factors previously shown to correlate with both. A second question not conclusively answered in the literature is whether simple sequences, or their high levels of polymorphism, are an effect of recombination. I used three methods to address this question. Firstly, I investigated the distributions of two-copy tandem repeats and short PPTs in relation to yeast DSB hotspots in order to look for evidence of an involvement of recombination in simple sequence formation. I found no significant associations. Secondly, I compared the fraction of simple sequences containing polymorphic sites between human recombination hotspots and coldspots. The third method I used was generalized linear model analysis, with which I investigated the correlation between simple sequence variation and recombination rate, and the influence on the correlation of additional factors with potential relevance including GC-content and gene density. Both the direct comparison and correlation methods showed a very weak and inconsistent effect of recombination on simple sequence polymorphism in the human genome.Whether simple sequences are an important cause of recombination events is a third question that has received relatively little previous attention, and I have explored one aspect of it. Simple sequences of the types I studied have previously been shown to form non-B-DNA structures, which can be recombinagenic in model systems. Using a previously described sodium bisulphite modification assay, I tested for the presence of these structures in sequences amplified from the central regions of hotspots and cloned into supercoiled plasmids. I found significantly higher sensitivity to sodium bisulphite in humans in than in chimpanzees in three out of six genomic regions in which there is a hotspot in humans but none in chimpanzees. In the DNA2 hotspot, this correlated with a clear difference in numbers of molecules showing long contiguous strings of converted cytosines, which are present in previously described intramolecular quadruplex and triplex structures. Two out of the five other hotspots tested show evidence for secondary structure comparable to a known intramolecular triplex, though with similar patterns in humans and chimpanzees. In conclusion, my results clearly motivate further investigation of a functional link between simple sequences and meiotic recombination, including the putative role of non-B-DNA structures

UC Research Repository

DNA Sequences Shaped by Selection for Stability

Author: Ivan Matic
Lin Chao
Martin Ackermann
Publication venue: Public Library of Science
Publication date: 01/02/2006
Field of study

The sequence of a stretch of nucleotides affects its propensity for errors during replication and expression. Are proteins encoded by stable or unstable nucleotide sequences? If selection for variability is prevalent, one could expect an excess of unstable sequences. Alternatively, if selection against targets for errors were substantial, an excess of stable sequences would be expected. We screened the genome sequences of different organisms for an important determinant of stability, the presence of mononucleotide repeats. We find that codons are used to encode proteins in a way that avoids the emergence of mononucleotide repeats, and we can attribute this bias to selection rather than a neutral process. This indicates that selection for stability, rather than for the generation of variation, substantially influences how information is encoded in the genome

Repository for Publications and Research Data

Crossref

Directory of Open Access Journals

PubMed Central

Evidence for Widespread Convergent Evolution around Human Microsatellites

Author: Amos William
Vowles Edward J
Publication venue: Public Library of Science
Publication date: 01/08/2004
Field of study

Microsatellites are a major component of the human genome, and their evolution has been much studied. However, the evolution of microsatellite flanking sequences has received less attention, with reports of both high and low mutation rates and of a tendency for microsatellites to cluster. From the human genome we generated a database of many thousands of (AC)(n) flanking sequences within which we searched for common characteristics. Sequences flanking microsatellites of similar length show remarkable levels of convergent evolution, indicating shared mutational biases. These biases extend 25–50 bases either side of the microsatellite and may therefore affect more than 30% of the entire genome. To explore the extent and absolute strength of these effects, we quantified the observed convergence. We also compared homologous human and chimpanzee loci to look for evidence of changes in mutation rate around microsatellites. Most models of DNA sequence evolution assume that mutations are independent and occur randomly. Allowances may be made for sites mutating at different rates and for general mutation biases such as the faster rate of transitions over transversions. Our analysis suggests that these models may be inadequate, in that proximity to even very short microsatellites may alter the rate and distribution of mutations that occur. The elevated local mutation rate combined with sequence convergence, both of which we find evidence for, also provide a possible resolution for the apparently contradictory inferences of mutation rates in microsatellite flanking sequences

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Computational Mining and Survey of Simple Sequence Repeats (SSRs) in Expressed Sequence Tags (ESTs) of Dicotyledonous Plants

Author: Kumpatla Siva Prasad
Publication venue
Publication date: 01/07/2004
Field of study

Submitted to the faculty of the School of Informatics in partial fulfillment of the requirements for the degree Master of Science in Bioinformatics in the School of Informatics,Indiana University July, 2004DNA markers have revolutionized the field of genetics by increasing the pace of genetic analysis. Simple sequence repeats (SSRs) are repetitions of nucleotide motifs of 1 to 5 bases and are currently the markers of choice in many plant and animal genomes due to their abundant distribution in the genomes, hypervariable nature and suitability for high-throughput analysis. While SSRs, once developed, are extremely valuable, their development is time consuming, laborious and expensive. Sequences from many genomes are continuously made freely available in the public databases and mining of these sources using computational approaches permits rapid and economical marker development. Expressed sequence tags (ESTs) are ideal candidates for mining SSRs not only because of their availability in large numbers but also due to the fact that they represent expressed genes. Large scale SSR mining efforts in plants to date focused on monocotyledonous plants. In this project, an efficient SSR identification tool was developed and used to mine SSRs from more than 53 dicotyledonous species. A total of 92,648 non-redundant ESTs or 6.0% of the 1.54 million dicotyledonous ESTs investigated in this study were found to contain SSRs. The frequency of non-redundant-ESTs containing SSRs among the species investigated ranged from 2.65% to 16.82%. More than 80% of the non-redundant ESTs having SSRs contained a single SSR repeat while others contained 2 or more SSRs. An extensive analysis of the occurrence and frequencies of various SSR types revealed that the A/T mononucleotide, AG/GA/CT/TC dinucleotide, AAG/AGA/GAA/CTT/TTC/TCT trinucleotide and TTTA and TTAA tetranucleotide repeats are the most abundant in dicotyledonous species. In addition, an analysis of the number of repeats across species revealed that majority of the mononucleotide SSRs contained 15-25 repeats while majority of the di- and tri-nucleotide SSRs contained 5-10 repeats. By providing valuable information on the abundance of SSRs in ESTs of a large number of dicotyledonous species, this study demonstrates the potential of computational mining approach for rapid discovery of SSRs towards the development of markers for genetic analysis and related applications

IUPUIScholarWorks

Local DNA dynamics shape mutational patterns of mononucleotide repeats in human genomes

Author: Bacolla A.
Chen H.
Cooper David Neil
Howells Katy
Vasquez K. M.
Zhu X.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2015
Field of study

Single base substitutions (SBSs) and insertions/deletions are critical for generating population diversity and can lead both to inherited disease and cancer. Whereas on a genome-wide scale SBSs are influenced by cellular factors, on a fine scale SBSs are influenced by the local DNA sequence-context, although the role of flanking sequence is often unclear. Herein, we used bioinformatics, molecular dynamics and hybrid quantum mechanics/molecular mechanics to analyze sequence context-dependent mutagenesis at mononucleotide repeats (A-tracts and G-tracts) in human population variation and in cancer genomes. SBSs and insertions/deletions occur predominantly at the first and last base-pairs of A-tracts, whereas they are concentrated at the second and third base-pairs in G-tracts. These positions correspond to the most flexible sites along A-tracts, and to sites where a ‘hole’, generated by the loss of an electron through oxidation, is most likely to be localized in G-tracts. For A-tracts, most SBSs occur in the direction of the base-pair flanking the tracts. We conclude that intrinsic features of local DNA structure, i.e. base-pair flexibility and charge transfer, render specific nucleotides along mononucleotide runs susceptible to base modification, which then yields mutations. Thus, local DNA dynamics contributes to phenotypic variation and disease in the human population

CiteSeerX

Online Research @ Cardiff

PubMed Central

Genetic Polymorphisms and Molecular Pathogenesis of Endometriosis

Author: Q. Hasan
Vijayalakshmi Kodati
Publication venue: 'IntechOpen'
Publication date: 09/05/2012
Field of study

IntechOpen

Susceptibility to late onset hearing loss: an investigation into genetic variation at the Brn-3c locus.

Author: Nolan L.S.
Publication venue: 'Queen Mary University of London'
Publication date: 01/01/2006
Field of study

BrnSc (BrnS.l, POU4F3) encoding a POU domain transcription factor is a candidate gene for late onset sensorineural hearing loss, which is exhibited by a large proportion of the ageing population. To identify common sequence variants at the Brn-3c locus mutation scanning of the BrnSc cDNA, intron and 5'-flanking region was performed by PCR-SSCP analysis in 45 members of the general population. Seven polymorphic sites were identified of which five within the Bm-Sc 5'-flanking region appear common. A functional screening approach utilising in-vitro assays suggests that at least three common sequence variants in the Brn-Sc 5'-flanking region could have a functional affect: -566(GT)i7-23, -1391A>C and a complex multi-allelic poly-G polymorphism at - 3432 that exhibits multiple variations in length together with single base substitutions within the guanine repeat. The -3432poly-G polymorphism modifies the binding affinity of an OC-2 derived nuclear protein and there is convincing evidence that this is the transcription factor SP1. Use of purified human recombinant SP1 protein, in-vitro translated SP1 and in-vitro translated SP3 confirms that the -3432polyG polymorphism modulates a high affinity SP family binding site and evidence suggests that this alters the regulation of the BrnSc promoter when SP1 levels are limiting, p<0.05. Moreover, the data suggest a functional interaction between the -3432poly-G polymorphism and the -566(GT)i7.23 repeat which associate to determine the response of the Brn-3c gene to SP1. Similarly, evidence suggests that the variant allele, -1391C has a reduced affinity for an OC-2 derived nuclear protein and this is consistent with a significant decrease in basal activity of the Brn-Sc promoter, pC were genotyped for a pilot association study but allelic frequencies were not found to significantly differ between the patient and control populations examined (by %2 analysis). Further large-scale population studies are required to establish whether these common sequence variants are associated with late onset hearing loss

UCL Discovery

Assembly and Compositional Analysis of Human Genomic DNA - Doctoral Dissertation, August 2002

Author: Rouchka Eric C.
Publication venue: Washington University Open Scholarship
Publication date: 18/11/2002
Field of study

In 1990, the United States Human Genome Project was initiated as a fifteen-year endeavor to sequence the approximately three billion bases making up the human genome (Vaughan, 1996).As of December 31, 2001, the public sequencing efforts have sequenced a total of 2.01 billion finished bases representing 63.0% of the human genome (http://www.ncbi.nlm.nih.gov/genome/seq/page.cgi?F=HsProgress.shtml&&ORG=Hs) to a Bermuda quality error rate of 1/10000 (Smith and Carrano, 1996). In addition, 1.11 billion bases representing 34.8% of the human genome has been sequenced to a rough-draft level. Efforts such as UCSC\u27s GoldenPath (Kent and Haussler, 2001) and NCBI\u27s contig assembly (Jang et al., 1999) attempt to assemble the human genome by incorporating both finished and rough-draft sequence. The availability of the human genome data allows us to ask questions concerning the maintenance of specific regions of the human genome. We consider two hypotheses for maintenance of high G+C regions: the presence of specific repetitive elements and compositional mutation biases. Our results rule out the possibility of the G+C content of repetitive elements determining regions of high and low G+C regions in the human genome. We determine that there is a compositional bias for mutation rates. However, these biases are not responsible for the maintenance of high G+C regions. In addition, we show that regions of the human under less selective pressure will mutate towards a higher A+T composition, regardless of the surrounding G+C composition. We also analyze sequence organization and show that previous studies of isochore regions (Bernardi,1993) cannot be generalized within the human genome. In addition, we propose a method to assemble only those parts of the human genome that are finished into larger contigs. Analysis of the contigs can lead to the mining of meaningful biological data that can give insights into genetic variation and evolution. I suggest a method to help aid in single nucleotide polymorphism (SNP)detection, which can help to determine differences within a population. I also discuss a dynamic-programming based approach to sequence assembly validation and detection of large-scale polymorphisms within a population that is made possible through the availability of large human sequence contigs

Washington University St. Louis: Open Scholarship