3,478 research outputs found

    Database of repetitive elements in complete genomes and data mining using transcription factor binding sites

    Get PDF
    [[abstract]]Approximately 43% of the human genome is occupied by repetitive elements. Even more, around 51% of the rice genome is occupied by repetitive elements. The analysis presented here indicates that repetitive elements in complete genomes may have been very important in the evolutionary genomics. In this study, a database, called the Repeat Sequence Database, is first designed and implemented to store complete and comprehensive repetitive sequences. See http://rsdb.csie.ncu.edu.tw for more information. The database contains direct, inverted and palindromic repetitive sequences, and each repetitive sequence has a variable length ranging from seven to many hundred nucleotides. The repetitive sequences in the database are explored using a mathematical algorithm to mine rules on how combinations of individual binding sites are distributed among repetitive sequences in the database. Combinations of transcription factor binding sites in the repetitive sequences are obtained and then data mining techniques are applied to mine association rules from these combinations. The discovered associations are further pruned to remove insignificant associations and obtain a set of associations. The mined association rules facilitate efforts to identify gene classes regulated by similar mechanisms and accurately predict regulatory elements. Experiments are performed on several genomes including C. elegans, human chromosome 22, and yeast

    Predicting Combinatorial Binding of Transcription Factors to Regulatory Elements in the Human Genome by Association Rule Mining

    Get PDF
    Cis-acting transcriptional regulatory elements in mammalian genomes typically contain specific combinations of binding sites for various transcription factors. Although some cisregulatory elements have been well studied, the combinations of transcription factors that regulate normal expression levels for the vast majority of the 20,000 genes in the human genome are unknown. We hypothesized that it should be possible to discover transcription factor combinations that regulate gene expression in concert by identifying over-represented combinations of sequence motifs that occur together in the genome. In order to detect combinations of transcription factor binding motifs, we developed a data mining approach based on the use of association rules, which are typically used in market basket analysis. We scored each segment of the genome for the presence or absence of each of 83 transcription factor binding motifs, then used association rule mining algorithms to mine this dataset, thus identifying frequently occurring pairs of distinct motifs within a segment. Results: Support for most pairs of transcription factor binding motifs was highly correlated across different chromosomes although pair significance varied. Known true positive motif pairs showed higher association rule support, confidence, and significance than background. Our subsets of high-confidence, high-significance mined pairs of transcription factors showed enrichment for co-citation in PubMed abstracts relative to all pairs, and the predicted associations were often readily verifiable in the literature. Conclusion: Functional elements in the genome where transcription factors bind to regulate expression in a combinatorial manner are more likely to be predicted by identifying statistically and biologically significant combinations of transcription factor binding motifs than by simply scanning the genome for the occurrence of binding sites for a single transcription factor.NIAAA Alcohol Training GrantNational Science FoundationCellular and Molecular Biolog

    i-Genome: A database to summarize oligonucleotide data in genomes

    Get PDF
    BACKGROUND: Information on the occurrence of sequence features in genomes is crucial to comparative genomics, evolutionary analysis, the analyses of regulatory sequences and the quantitative evaluation of sequences. Computing the frequencies and the occurrences of a pattern in complete genomes is time-consuming. RESULTS: The proposed database provides information about sequence features generated by exhaustively computing the sequences of the complete genome. The repetitive elements in the eukaryotic genomes, such as LINEs, SINEs, Alu and LTR, are obtained from Repbase. The database supports various complete genomes including human, yeast, worm, and 128 microbial genomes. CONCLUSIONS: This investigation presents and implements an efficiently computational approach to accumulate the occurrences of the oligonucleotides or patterns in complete genomes. A database is established to maintain the information of the sequence features, including the distributions of oligonucleotide, the gene distribution, the distribution of repetitive elements in genomes and the occurrences of the oligonucleotides. The database can provide more effective and efficient way to access the repetitive features in genomes

    The Echinococcus canadensis (G7) genome: A key knowledge of parasitic platyhelminth human diseases

    Get PDF
    Background: The parasite Echinococcus canadensis (G7) (phylum Platyhelminthes, class Cestoda) is one of the causative agents of echinococcosis. Echinococcosis is a worldwide chronic zoonosis affecting humans as well as domestic and wild mammals, which has been reported as a prioritized neglected disease by the World Health Organisation. No genomic data, comparative genomic analyses or efficient therapeutic and diagnostic tools are available for this severe disease. The information presented in this study will help to understand the peculiar biological characters and to design species-specific control tools. Results: We sequenced, assembled and annotated the 115-Mb genome of E. canadensis (G7). Comparative genomic analyses using whole genome data of three Echinococcus species not only confirmed the status of E. canadensis (G7) as a separate species but also demonstrated a high nucleotide sequences divergence in relation to E. granulosus (G1). The E. canadensis (G7) genome contains 11,449 genes with a core set of 881 orthologs shared among five cestode species. Comparative genomics revealed that there are more single nucleotide polymorphisms (SNPs) between E. canadensis (G7) and E. granulosus (G1) than between E. canadensis (G7) and E. multilocularis. This result was unexpected since E. canadensis (G7) and E. granulosus (G1) were considered to belong to the species complex E. granulosus sensu lato. We described SNPs in known drug targets and metabolism genes in the E. canadensis (G7) genome. Regarding gene regulation, we analysed three particular features: CpG island distribution along the three Echinococcus genomes, DNA methylation system and small RNA pathway. The results suggest the occurrence of yet unknown gene regulation mechanisms in Echinococcus. Conclusions: This is the first work that addresses Echinococcus comparative genomics. The resources presented here will promote the study of mechanisms of parasite development as well as new tools for drug discovery. The availability of a high-quality genome assembly is critical for fully exploring the biology of a pathogenic organism. The E. canadensis (G7) genome presented in this study provides a unique opportunity to address the genetic diversity among the genus Echinococcus and its particular developmental features. At present, there is no unequivocal taxonomic classification of Echinococcus species; however, the genome-wide SNPs analysis performed here revealed the phylogenetic distance among these three Echinococcus species. Additional cestode genomes need to be sequenced to be able to resolve their phylogeny.Fil: Maldonado, Lucas Luciano. Consejo Nacional de Investigaciones CientΓ­ficas y TΓ©cnicas. Oficina de CoordinaciΓ³n Administrativa Houssay. Instituto de Investigaciones en MicrobiologΓ­a y ParasitologΓ­a MΓ©dica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en MicrobiologΓ­a y ParasitologΓ­a MΓ©dica; ArgentinaFil: Assis, Juliana. FundaciΓ³n Oswaldo Cruz; BrasilFil: Gomes AraΓΊjo, FlΓ‘vio M.. FundaciΓ³n Oswaldo Cruz; BrasilFil: Salim, Anna C. M.. FundaciΓ³n Oswaldo Cruz; BrasilFil: Macchiaroli, Natalia. Consejo Nacional de Investigaciones CientΓ­ficas y TΓ©cnicas. Oficina de CoordinaciΓ³n Administrativa Houssay. Instituto de Investigaciones en MicrobiologΓ­a y ParasitologΓ­a MΓ©dica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en MicrobiologΓ­a y ParasitologΓ­a MΓ©dica; ArgentinaFil: Cucher, Marcela Alejandra. Consejo Nacional de Investigaciones CientΓ­ficas y TΓ©cnicas. Oficina de CoordinaciΓ³n Administrativa Houssay. Instituto de Investigaciones en MicrobiologΓ­a y ParasitologΓ­a MΓ©dica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en MicrobiologΓ­a y ParasitologΓ­a MΓ©dica; ArgentinaFil: Camicia, Federico. Consejo Nacional de Investigaciones CientΓ­ficas y TΓ©cnicas. Oficina de CoordinaciΓ³n Administrativa Houssay. Instituto de Investigaciones en MicrobiologΓ­a y ParasitologΓ­a MΓ©dica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en MicrobiologΓ­a y ParasitologΓ­a MΓ©dica; ArgentinaFil: Fox, Adolfo. Consejo Nacional de Investigaciones CientΓ­ficas y TΓ©cnicas. Oficina de CoordinaciΓ³n Administrativa Houssay. Instituto de Investigaciones en MicrobiologΓ­a y ParasitologΓ­a MΓ©dica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en MicrobiologΓ­a y ParasitologΓ­a MΓ©dica; ArgentinaFil: Rosenzvit, Mara Cecilia. Consejo Nacional de Investigaciones CientΓ­ficas y TΓ©cnicas. Oficina de CoordinaciΓ³n Administrativa Houssay. Instituto de Investigaciones en MicrobiologΓ­a y ParasitologΓ­a MΓ©dica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en MicrobiologΓ­a y ParasitologΓ­a MΓ©dica; ArgentinaFil: Oliveira, Guilherme. Instituto TecnolΓ³gico Vale; Brasil. FundaciΓ³n Oswaldo Cruz; BrasilFil: Kamenetzky, Laura. Consejo Nacional de Investigaciones CientΓ­ficas y TΓ©cnicas. Oficina de CoordinaciΓ³n Administrativa Houssay. Instituto de Investigaciones en MicrobiologΓ­a y ParasitologΓ­a MΓ©dica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en MicrobiologΓ­a y ParasitologΓ­a MΓ©dica; Argentin

    The genetics and genomics of Trypanosoma cruzi

    Get PDF
    Trypanosoma cruzi is a kinetoplastid parasite that causes Chagas disease. Trypanosomes are unusual organisms in many aspects of its genetics and molecular and cellular biology and considered a paradigm of the exception of the rule in the eukaryotic kingdom. The complete genome sequence of T. cruzi was published in 2005, thus, providing a major tool to the understanding of several of his unusual aspects. However, with so many different mechanisms between the parasite and its mammalian host there is still a lack of availability of effective antiparasitic drugs or disease treatments, specially in the chronic phase. This review highlights the fundamentals of the fasci- nating genetics and genomics of T. cruzi with emphasis in the differential mechanisms that could provide interesting therapeutic targets.Fil: Vazquez, Martin Pablo. Consejo Nacional de Investigaciones CientΓ­ficas y TΓ©cnicas. Instituto de Investigaciones en IngenierΓ­a GenΓ©tica y BiologΓ­a Molecular "Dr. HΓ©ctor N. Torres"; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentin

    EpiGRAPH: user-friendly software for statistical analysis and prediction of (epi)genomic data

    Get PDF
    EpiGRAPH is a genome-scale data-mining software tool that enables users to identify epigenetic and gene regulatory features in large datasets of genomic regions

    Genome analysis of the necrotrophic fungal pathogens Sclerotinia sclerotiorum and Botrytis cinerea

    Get PDF
    Sclerotinia sclerotiorum and Botrytis cinerea are closely related necrotrophic plant pathogenic fungi notable for their wide host ranges and environmental persistence. These attributes have made these species models for understanding the complexity of necrotrophic, broad host-range pathogenicity. Despite their similarities, the two species differ in mating behaviour and the ability to produce asexual spores. We have sequenced the genomes of one strain of S. sclerotiorum and two strains of B. cinerea. The comparative analysis of these genomes relative to one another and to other sequenced fungal genomes is provided here. Their 38–39 Mb genomes include 11,860–14,270 predicted genes, which share 83% amino acid identity on average between the two species. We have mapped the S. sclerotiorum assembly to 16 chromosomes and found large-scale co-linearity with the B. cinerea genomes. Seven percent of the S. sclerotiorum genome comprises transposable elements compared t
    • …
    corecore