242 research outputs found

    Analysis Of DNA Motifs In The Human Genome

    Full text link
    DNA motifs include repeat elements, promoter elements and gene regulator elements, and play a critical role in the human genome. This thesis describes a genome-wide computational study on two groups of motifs: tandem repeats and core promoter elements. Tandem repeats in DNA sequences are extremely relevant in biological phenomena and diagnostic tools. Computational programs that discover tandem repeats generate a huge volume of data, which can be difficult to decipher without further organization. A new method is presented here to organize and rank detected tandem repeats through clustering and classification. Our work presents multiple ways of expressing tandem repeats using the n-gram model with different clustering distance measures. Analysis of the clusters for the tandem repeats in the human genome shows that the method yields a well-defined grouping in which similarity among repeats is apparent. Our new, alignment-free method facilitates the analysis of the myriad of tandem repeats replete in the human genome. We believe that this work will lead to new discoveries on the roles, origins, and significance of tandem repeats. As with tandem repeats, promoter sequences of genes contain binding sites for proteins that play critical roles in mediating expression levels. Promoter region binding proteins and their co-factors influence timing and context of transcription. Despite the critical regulatory role of these non-coding sequences, computational methods to identify and predict DNA binding sites are extremely limited. The work reported here analyzes the relative occurrence of core promoter elements (CPEs) in and around transcription start sites. We found that out of all the data sets 49\%-63\% upstream regions have either TATA box or DPE elements. Our results suggest the possibility of predicting transcription start sites through combining CPEs signals with other promoter signals such as CpG islands and clusters of specific transcription binding sites

    Discovery of EST-SSRs in Lung Cancer: Tagged ESTs with SSRs Lead to Differential Amino Acid and Protein Expression Patterns in Cancerous Tissues

    Get PDF
    Tandem repeats are found in both coding and non-coding sequences of higher organisms. These sequences can be used in cancer genetics and diagnosis to unravel the genetic basis of tumor formation and progression. In this study, a possible relationship between SSR distributions and lung cancer was studied by comparative analysis of EST-SSRs in normal and lung cancerous tissues. While the EST-SSR distribution was similar between tumorous tissues, this distribution was different between normal and tumorous tissues. Trinucleotides tandem repeats were highly different; the number of trinucleotides in ESTs of lung cancer was 3 times higher than normal tissue. Significant negative correlation between normal and cancerous tissue showed that cancerous tissue generates different types of trinucleotides. GGC and CGC were the more frequent expressed trinucleotides in cancerous tissue, but these SSRs were not expressed in normal tissue. Similar to the EST level, the expression pattern of EST-SSRs-derived amino acids was significantly different between normal and cancerous tissues. Arg, Pro, Ser, Gly, and Lys were the most abundant amino acids in cancerous tissues, and Leu, Cys, Phe, and His were significantly more abundant in normal tissues than in cancerous tissues. Next, the putative functions of triplet SSR-containing genes were analyzed. In cancerous tissue, EST-SSRs produce different types of proteins. Chromodomain helicase DNA binding proteins were one of the major protein products of EST-SSRs in the cancerous library, while these proteins were not produced from EST-SSRs in normal tissue. For the first time, the findings of this study confirmed that EST-SSRs in normal lung tissues are different than in unhealthy tissues, and tagged ESTs with SSRs cause remarkable differences in amino acid and protein expression patterns in cancerous tissue. We suggest that EST-SSRs and EST-SSRs differentially expressed in cancerous tissue may be suitable candidate markers for lung cancer diagnosis and prediction

    Microsatellite abundance across the Anthozoa and Hydrozoa in the phylum Cnidaria

    Get PDF
    Background: Microsatellite loci have high mutation rates and thus are indicative of mutational processes within the genome. By concentrating on the symbiotic and aposymbiotic cnidarians, we investigated if microsatellite abundances follow a phylogenetic or ecological pattern. Individuals from eight species were shotgun sequenced using 454 GS-FLX Titanium technology. Sequences from the three available cnidarian genomes (Nematostella vectensis, Hydra magnipapillata and Acropora digitifera) were added to the analysis for a total of eleven species representing two classes, three subclasses and eight orders within the phylum Cnidaria. Results: Trinucleotide and tetranucleotide repeats were the most abundant motifs, followed by hexa- and dinucleotides. Pentanucleotides were the least abundant motif in the data set. Hierarchical clustering and log likelihood ratio tests revealed a weak relationship between phylogeny and microsatellite content. Further, comparisons between cnidaria harboring intracellular dinoflagellates and those that do not, show microsatellite coverage is higher in the latter group. Conclusions: Our results support previous studies that found tri- and tetranucleotides to be the most abundant motifs in invertebrates. Differences in microsatellite coverage and composition between symbiotic and non-symbiotic cnidaria suggest the presence/absence of dinoflagellates might place restrictions on the host genome

    Mutational processes molding the genomes of 21 breast cancers

    Get PDF
    All cancers carry somatic mutations. The patterns of mutation in cancer genomes reflect the DNA damage and repair processes to which cancer cells and their precursors have been exposed. To explore these mechanisms further, we generated catalogs of somatic mutation from 21 breast cancers and applied mathematical methods to extract mutational signatures of the underlying processes. Multiple distinct single- and double-nucleotide substitution signatures were discernible. Cancers with BRCA1 or BRCA2 mutations exhibited a characteristic combination of substitution mutation signatures and a distinctive profile of deletions. Complex relationships between somatic mutation prevalence and transcription were detected. A remarkable phenomenon of localized hypermutation, termed "kataegis," was observed. Regions of kataegis differed between cancers but usually colocalized with somatic rearrangements. Base substitutions in these regions were almost exclusively of cytosine at TpC dinucleotides. The mechanisms underlying most of these mutational signatures are unknown. However, a role for the APOBEC family of cytidine deaminases is proposed

    A Study of Selection on Microsatellites in the Helianthus Annuus Transcriptome

    Get PDF
    The ability of populations to continually respond to directional selection even after many generations instead of reaching response plateaus suggests the presence of mechanisms for rapidly generating novel adaptive variation within organismal genomes. The contributions of cis regulation are now being widely studied. This study details the contributions of one such mechanism capable of generating adaptive genetic variation through transcribed microsatellite mutation. Microsatellites are abundant in eukaryotic genomes, exhibit one of the highest known mutation rates; and mutations involve indels that are reversible. These features make them excellent candidates for generating variation in populations. This study explores the functional roles of transcribed microsatellites in Helianthus annuus (common sunflower). More specifically, I explored the role of microsatellites as agents of rapid change that act as “tuning knobs” of phenotypic variation by influencing gene expression in a stepwise manner by expansions and contractions of the microsatellite tract. A bioinformatic study suggests that selection has favored expansion and maintenance of transcriptomic microsatellites. This inference is based on the non-random distribution of microsatellites, prevalence of motifs associated with gene regulation in untranslated regions, and the enrichment of microsatellites in Gene Ontologies representing plant response to stress and stimulus. A population genetics study provides support for selection on these transcribed microsatellites when compared to anonymous microsatellites that were assumed to evolve neutrally. The natural populations utilized in this study show greater similarity in allele frequencies, mean length, and variance in lengths at the transcribed microsatellites relative to that observed at anonymous microsatellite loci. This finding is indicative of balancing selection, and provides evidence that allele lengths are under selection. This finding provides support for the tuning knob hypothesis. The findings of a functional genomic study with regard to the tuning knob hypothesis are ambiguous. No correlation between allele lengths and gene expression was detected at any of three loci investigated. However, the loci utilized exhibited narrow ranges in length. The tuning knob hypothesis implies that similar allele lengths are likely to exhibit similar gene expression levels. Hence, variation in the populations studied may be tracking the optimal gene expression levels

    The functional role of methylated short tandem repeats in early mouse development

    Get PDF
    Short tandem repeats, or microsatellites are ubiquitous throughout all genomes that have been explored. In common with other sequences, the DNA in microsatellites has DNA marks in the form of chromatin methylation. Regulation of DNA methylation and changes in their pattern is critical for the establishment of unique cell states throughout development in mammals. DNA methylation is extensively reprogrammed during the early phases of mammalian development to establish unique developmental patterning. Whether microsatellites are also reprogrammed with developmental patterns is unknown. In this thesis, we assessed the characteristics of di- and trinucleotide microsatellites in the NCBIM37 Mus musculus assembly and observed a marked difference in quantity and length of microsatellites of differing motif, not explained by any known mechanism. Secondly we assessed the quantities of di-, tri- and tetranucleotide microsatellites in experimentally determined methylomes of Mus musculus at various stages in development. Our results indicate that at least one tetranucleotide microsatellite motif and more tentatively a second trinucleotide microsatellite follow a pattern of methylation consistent with reprogramming. Finally we show that the genes containing these specific microsatellites in the NCBIM37 genome have strong links to known developmental processes.Biotechnology and Biological (BBSRC)Applied Bioinformatic

    Characterization of the repetitive DNA landscape in wheat homeologous group 4 chromosomes

    Get PDF
    Background: The number and complexity of repetitive elements varies between species, being in general most represented in those with larger genomes. Combining the flow-sorted chromosome arms approach to genome analysis with second generation DNA sequencing technologies provides a unique opportunity to study the repetitive portion of each chromosome, enabling comparisons among them. Additionally, different sequencing approaches may produce different depth of insight to repeatome content and structure. In this work we analyze and characterize the repetitive sequences of Triticum aestivum cv. Chinese Spring homeologous group 4 chromosome arms, obtained through Roche 454 and Illumina sequencing technologies, hereinafter marked by subscripts 454 and I, respectively. Repetitive sequences were identified with the RepeatMasker software using the interspersed repeat database mips-REdat_v9.0p. The input sequences consisted of our 4DS454 and 4DL454 scaffolds and 4ASI, 4ALI, 4BSI, 4BLI, 4DSI and 4DLI contigs, downloaded from the International Wheat Genome Sequencing Consortium (IWGSC). Results: Repetitive sequences content varied from 55% to 63% for all chromosome arm assemblies except for 4DLI, in which the repeat content was 38%. Transposable elements, small RNA, satellites, simple repeats and low complexity sequences were analyzed. SSR frequency was found one per 24 to 27 kb for all chromosome assemblies except 4DLI, where it was three times higher. Dinucleotides and trinucleotides were the most abundant SSR repeat units. (GA)n/(TC)n was the most abundant SSR except for 4DLI where the most frequently identified SSR was (CCG/CGG)n. Retrotransposons followed by DNA transposons were the most highly represented sequence repeats, mainly composed of CACTA/En-Spm and Gypsy superfamilies, respectively. This whole chromosome sequence analysis allowed identification of three new LTR retrotransposon families belonging to the Copia superfamily, one belonging to the Gypsy superfamily and two TRIM retrotransposon families. Their physical distribution in wheat genome was analyzed by fluorescent in situ hybridization (FISH) and one of them, the Carmen retrotransposon, was found specific for centromeric regions of all wheat chromosomes. Conclusion: The presented work is the first deep report of wheat repetitive sequences analyzed at the chromosome arm level, revealing the first insight into the repeatome of T. aestivum chromosomes of homeologous group 4.Fil: Garbus, Ingrid. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico - Conicet - Bahia Blanca. Centro Recursos Naturales Renovables de Zona Semiarida(i); ArgentinaFil: Romero, José Rodolfo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico - Conicet - Bahia Blanca. Centro Recursos Naturales Renovables de Zona Semiarida(i); ArgentinaFil: Miroslav, Valarik. Centre of the Region Haná for Biotechnological and Agricultural Research. Institute of Experimental Botany; República ChecaFil: Vanzurova, Hana. Centre of the Region Haná for Biotechnological and Agricultural Research. Institute of Experimental Botany; República ChecaFil: Karafiatova, Miroslava. Centre of the Region Haná for Biotechnological and Agricultural Research. Institute of Experimental Botany; República ChecaFil: Caccamo, Mario. Norwich Research Park. Genome Analysis Centre; Reino UnidoFil: Dolezel, Jaroslav. Centre of the Region Haná for Biotechnological and Agricultural Research. Institute of Experimental Botany; República ChecaFil: Tranquilli, Gabriela. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto Recursos Biológicos; ArgentinaFil: Helguera, Marcelo. Instituto Nacional de Tecnología Agropecuaria (INTA). Estación Experimental Agropecuaria Marcos Juárez; ArgentinaFil: Echenique, Carmen Viviana. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico - Conicet - Bahia Blanca. Centro Recursos Naturales Renovables de Zona Semiarida(i); Argentin

    Mutational processes molding the genomes of 21 breast cancers

    Get PDF
    All cancers carry somatic mutations. The patterns of mutation in cancer genomes reflect the DNA damage and repair processes to which cancer cells and their precursors have been exposed. To explore these mechanisms further, we generated catalogs of somatic mutation from 21 breast cancers and applied mathematical methods to extract mutational signatures of the underlying processes. Multiple distinct single- and double-nucleotide substitution signatures were discernible. Cancers with BRCA1 or BRCA2 mutations exhibited a characteristic combination of substitution mutation signatures and a distinctive profile of deletions. Complex relationships between somatic mutation prevalence and transcription were detected. A remarkable phenomenon of localized hypermutation, termed “kataegis,” was observed. Regions of kataegis differed between cancers but usually colocalized with somatic rearrangements. Base substitutions in these regions were almost exclusively of cytosine at TpC dinucleotides. The mechanisms underlying most of these mutational signatures are unknown. However, a role for the APOBEC family of cytidine deaminases is proposed

    A fast and cost-effective approach to develop and map EST-SSR markers: oak as a case study

    Get PDF
    Background: Expressed Sequence Tags (ESTs) are a source of simple sequence repeats (SSRs) that can be used to develop molecular markers for genetic studies. The availability of ESTs for Quercus robur and Quercus petraea provided a unique opportunity to develop microsatellite markers to accelerate research aimed at studying adaptation of these long-lived species to their environment. As a first step toward the construction of a SSR-based linkage map of oak for quantitative trait locus (QTL) mapping, we describe the mining and survey of EST-SSRs as well as a fast and cost-effective approach (bin mapping) to assign these markers to an approximate map position. We also compared the level of polymorphism between genomic and EST-derived SSRs and address the transferability of EST-SSRs in Castanea sativa (chestnut). Results: A catalogue of 103,000 Sanger ESTs was assembled into 28,024 unigenes from which 18.6% presented one or more SSR motifs. More than 42% of these SSRs corresponded to trinucleotides. Primer pairs were designed for 748 putative unigenes. Overall 37.7% (283) were found to amplify a single polymorphic locus in a reference fullsib pedigree of Quercus robur. The usefulness of these loci for establishing a genetic map was assessed using a bin mapping approach. Bin maps were constructed for the male and female parental tree for which framework linkage maps based on AFLP markers were available. The bin set consisting of 14 highly informative offspring selected based on the number and position of crossover sites. The female and male maps comprised 44 and 37 bins, with an average bin length of 16.5 cM and 20.99 cM, respectively. A total of 256 EST-SSRs were assigned to bins and their map position was further validated by linkage mapping. EST-SSRs were found to be less polymorphic than genomic SSRs, but their transferability rate to chestnut, a phylogenetically related species to oak, was higher. Conclusion: We have generated a bin map for oak comprising 256 EST-SSRs. This resource constitutes a first step toward the establishment of a gene-based map for this genus that will facilitate the dissection of QTLs affecting complex traits of ecological importance

    Genomic Diversity in Two Related Plant Species with and without Sex Chromosomes - Silene latifolia and S. vulgaris

    Get PDF
    Genome size evolution is a complex process influenced by polyploidization, satellite DNA accumulation, and expansion of retroelements. How this process could be affected by different reproductive strategies is still poorly understood.We analyzed differences in the number and distribution of major repetitive DNA elements in two closely related species, Silene latifolia and S. vulgaris. Both species are diploid and possess the same chromosome number (2n = 24), but differ in their genome size and mode of reproduction. The dioecious S. latifolia (1C = 2.70 pg DNA) possesses sex chromosomes and its genome is 2.5× larger than that of the gynodioecious S. vulgaris (1C = 1.13 pg DNA), which does not possess sex chromosomes. We discovered that the genome of S. latifolia is larger mainly due to the expansion of Ogre retrotransposons. Surprisingly, the centromeric STAR-C and TR1 tandem repeats were found to be more abundant in S. vulgaris, the species with the smaller genome. We further examined the distribution of major repetitive sequences in related species in the Caryophyllaceae family. The results of FISH (fluorescence in situ hybridization) on mitotic chromosomes with the Retand element indicate that large rearrangements occurred during the evolution of the Caryophyllaceae family.Our data demonstrate that the evolution of genome size in the genus Silene is accompanied by the expansion of different repetitive elements with specific patterns in the dioecious species possessing the sex chromosomes
    • …
    corecore