121 research outputs found

    Study of large and highly stratified population datasets by combining iterative pruning principal component analysis and structure

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The ever increasing sizes of population genetic datasets pose great challenges for population structure analysis. The Tracy-Widom (TW) statistical test is widely used for detecting structure. However, it has not been adequately investigated whether the TW statistic is susceptible to type I error, especially in large, complex datasets. Non-parametric, Principal Component Analysis (PCA) based methods for resolving structure have been developed which rely on the TW test. Although PCA-based methods can resolve structure, they cannot infer ancestry. Model-based methods are still needed for ancestry analysis, but they are not suitable for large datasets. We propose a new structure analysis framework for large datasets. This includes a new heuristic for detecting structure and incorporation of the structure patterns inferred by a PCA method to complement STRUCTURE analysis.</p> <p>Results</p> <p>A new heuristic called EigenDev for detecting population structure is presented. When tested on simulated data, this heuristic is robust to sample size. In contrast, the TW statistic was found to be susceptible to type I error, especially for large population samples. EigenDev is thus better-suited for analysis of large datasets containing many individuals, in which spurious patterns are likely to exist and could be incorrectly interpreted as population stratification. EigenDev was applied to the iterative pruning PCA (ipPCA) method, which resolves the underlying subpopulations. This subpopulation information was used to supervise STRUCTURE analysis to infer patterns of ancestry at an unprecedented level of resolution. To validate the new approach, a bovine and a large human genetic dataset (3945 individuals) were analyzed. We found new ancestry patterns consistent with the subpopulations resolved by ipPCA.</p> <p>Conclusions</p> <p>The EigenDev heuristic is robust to sampling and is thus superior for detecting structure in large datasets. The application of EigenDev to the ipPCA algorithm improves the estimation of the number of subpopulations and the individual assignment accuracy, especially for very large and complex datasets. Furthermore, we have demonstrated that the structure resolved by this approach complements parametric analysis, allowing a much more comprehensive account of population structure. The new version of the ipPCA software with EigenDev incorporated can be downloaded from <url>http://www4a.biotec.or.th/GI/tools/ippca</url>.</p

    Iterative pruning PCA improves resolution of highly structured populations

    Get PDF
    BACKGROUND: Non-random patterns of genetic variation exist among individuals in a population owing to a variety of evolutionary factors. Therefore, populations are structured into genetically distinct subpopulations. As genotypic datasets become ever larger, it is increasingly difficult to correctly estimate the number of subpopulations and assign individuals to them. The computationally efficient non-parametric, chiefly Principal Components Analysis (PCA)-based methods are thus becoming increasingly relied upon for population structure analysis. Current PCA-based methods can accurately detect structure; however, the accuracy in resolving subpopulations and assigning individuals to them is wanting. When subpopulations are closely related to one another, they overlap in PCA space and appear as a conglomerate. This problem is exacerbated when some subpopulations in the dataset are genetically far removed from others. We propose a novel PCA-based framework which addresses this shortcoming. RESULTS: A novel population structure analysis algorithm called iterative pruning PCA (ipPCA) was developed which assigns individuals to subpopulations and infers the total number of subpopulations present. Genotypic data from simulated and real population datasets with different degrees of structure were analyzed. For datasets with simple structures, the subpopulation assignments of individuals made by ipPCA were largely consistent with the STRUCTURE, BAPS and AWclust algorithms. On the other hand, highly structured populations containing many closely related subpopulations could be accurately resolved only by ipPCA, and not by other methods. CONCLUSION: The algorithm is computationally efficient and not constrained by the dataset complexity. This systematic subpopulation assignment approach removes the need for prior population labels, which could be advantageous when cryptic stratification is encountered in datasets containing individuals otherwise assumed to belong to a homogenous population

    Unique and conserved MicroRNAs in wheat chromosome 5D revealed by next-generation sequencing

    Get PDF
    MicroRNAs are a class of short, non-coding, single-stranded RNAs that act as post-transcriptional regulators in gene expression. miRNA analysis of Triticum aestivum chromosome 5D was performed on 454 GS FLX Titanium sequences of flow sorted chromosome 5D with a total of 3,208,630 good quality reads representing 1.34x and 1.61x coverage of the short (5DS) and long (5DL) arms of the chromosome respectively. In silico and structural analyses revealed a total of 55 miRNAs; 48 and 42 miRNAs were found to be present on 5DL and 5DS respectively, of which 35 were common to both chromosome arms, while 13 miRNAs were specific to 5DL and 7 miRNAs were specific to 5DS. In total, 14 of the predicted miRNAs were identified in wheat for the first time. Representation (the copy number of each miRNA) was also found to be higher in 5DL (1,949) compared to 5DS (1,191). Targets were predicted for each miRNA, while expression analysis gave evidence of expression for 6 out of 55 miRNAs. Occurrences of the same miRNAs were also found in Brachypodium distachyon and Oryza sativa genome sequences to identify syntenic miRNA coding sequences. Based on this analysis, two other miRNAs: miR1133 and miR167 were detected in B. distachyon syntenic region of wheat 5DS. Five of the predicted miRNA coding regions (miR6220, miR5070, miR169, miR5085, miR2118) were experimentally verified to be located to the 5D chromosome and three of them : miR2118, miR169 and miR5085, were shown to be 5D specific. Furthermore miR2118 was shown to be expressed in Chinese Spring adult leaves. miRNA genes identified in this study will expand our understanding of gene regulation in bread wheat

    Plasmodium parasites mount an arrest response to dihydroartemisinin, as revealed by whole transcriptome shotgun sequencing (RNA-seq) and microarray study

    Get PDF
    RNA-seq data analysis from DHA treatment of P. falciparum Limma results from 1 h treatments with 500 nM DHA in P. falciparum K1 rings, trophozoites and schizonts. (XLS 2040 kb

    Comparison of gene expression profiles between human erythroid cells derived from fetal liver and adult peripheral blood

    Get PDF
    Background A key event in human development is the establishment of erythropoietic progenitors in the bone marrow, which is accompanied by a fetal-to-adult switch in hemoglobin expression. Understanding of this event could lead to medical application, notably treatment of sickle cell disease and Ξ²-thalassemia. The changes in gene expression of erythropoietic progenitor cells as they migrate from the fetal liver and colonize the bone marrow are still rather poorly understood, as primary fetal liver (FL) tissues are difficult to obtain. Methods We obtained human FL tissue and adult peripheral blood (AB) samples from Thai subjects. Primary CD34+ cells were cultured in vitro in a fetal bovine serum-based culture medium. After 8 days of culture, erythroid cell populations were isolated by flow cytometry. Gene expression in the FL- and AB-derived cells was studied by Affymetrix microarray and reverse-transcription quantitative PCR. The microarray data were combined with that from a previous study of human FL and AB erythroid development, and meta-analysis was performed on the combined dataset. Results FL erythroid cells showed enhanced proliferation and elevated fetal hemoglobin relative to AB cells. A total of 1,391 fetal up-regulated and 329 adult up-regulated genes were identified from microarray data generated in this study. Five hundred ninety-nine fetal up-regulated and 284 adult up-regulated genes with reproducible patterns between this and a previous study were identified by meta-analysis of the combined dataset, which constitute a core set of genes differentially expressed between FL and AB erythroid cells. In addition to these core genes, 826 and 48 novel genes were identified only from data generated in this study to be FL up- and AB up-regulated, respectively. The in vivo relevance for some of these novel genes was demonstrated by pathway analysis, which showed novel genes functioning in pathways known to be important in proliferation and erythropoiesis, including the mitogen-activated protein kinase (MAPK) and the phosphatidyl inositol 3 kinase (PI3K)-Akt pathways. Discussion The genes with upregulated expression in FL cells, which include many novel genes identified from data generated in this study, suggest that cellular proliferation pathways are more active in the fetal stage. Erythroid progenitor cells may thus undergo a reprogramming during ontogenesis in which proliferation is modulated by changes in expression of key regulators, primarily MYC, and others including insulin-like growth factor 2 mRNA-binding protein 3 (IGF2BP3), neuropilin and tolloid-like 2 (NETO2), branched chain amino acid transaminase 1 (BCAT1), tenascin XB (TNXB) and proto-oncogene, AP-1 transcription factor subunit (JUND). This reprogramming may thus be necessary for acquisition of the adult identity and switching of hemoglobin expression

    Characterization and Evolution of microRNA Genes Derived from Repetitive Elements and Duplication Events in Plants

    Get PDF
    MicroRNAs (miRNAs) are a major class of small non-coding RNAs that act as negative regulators at the post-transcriptional level in animals and plants. In this study, all known miRNAs in four plant species (Arabidopsis thaliana, Populus trichocarpa, Oryza sativa and Sorghum bicolor) have been analyzed, using a combination of computational and comparative genomic approaches, to systematically identify and characterize the miRNAs that were derived from repetitive elements and duplication events. The study provides a complete mapping, at the genome scale, of all the miRNAs found on repetitive elements in the four test plant species. Significant differences between repetitive element-related miRNAs and non-repeat-derived miRNAs were observed for many characteristics, including their location in protein-coding and intergenic regions in genomes, their conservation in plant species, sequence length of their hairpin precursors, base composition of their hairpin precursors and the minimum free energy of their hairpin structures. Further analysis showed that a considerable number of miRNA families in the four test plant species arose from either tandem duplication events, segmental duplication events or a combination of the two. However, comparative analysis suggested that the contribution made by these two duplication events differed greatly between the perennial tree species tested and the other three annual species. The expansion of miRNA families in A. thaliana, O. sativa and S. bicolor are more likely to occur as a result of tandem duplication events than from segmental duplications. In contrast, genomic segmental duplications contributed significantly more to the expansion of miRNA families in P. trichocarpa than did tandem duplication events. Taken together, this study has successfully characterized miRNAs derived from repetitive elements and duplication events at the genome scale and provides comprehensive knowledge and deeper insight into the origins and evolution of miRNAs in plants

    Modeling the asymmetric evolution of a mouse and rat-specific microRNA gene cluster intron 10 of the Sfmbt2 gene

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The total number of miRNA genes in a genome, expression of which is responsible for the miRNA repertoire of an organism, is not precisely known. Moreover, the question of how new miRNA genes arise during evolution is incompletely understood. Recent data in humans and opossum indicate that retrotranspons of the class of short interspersed nuclear elements have contributed to the growth of microRNA gene clusters.</p> <p>Method</p> <p>We studied a large miRNA gene cluster in intron 10 of the mouse Sfmbt2 gene using bioinformatic tools.</p> <p>Results</p> <p>Mice and rats are unique to harbor a 55-65 Kb DNA sequence in intron 10 of the Sfmbt2 gene. This intronic region is rich in regularly repeated B1 retrotransposons together with inverted self-complementary CA/TG microsatellites. The smallest repeats unit, called MSHORT1 in the mouse, was duplicated 9 times in a tandem head-to-tail array to form 2.5 Kb MLONG1 units. The center of the mouse miRNA gene cluster consists of 13 copies of MLONG1. BLAST analysis of MSHORT1 in the mouse shows that the repeat unit is unique for intron 10 of the Sfmbt2 gene and suggest a dual phase model for growth of the miRNA gene cluster: arrangment of 10 MSHORT1 units into MLONG1 and further duplication of 13 head-to-tail MLONG1 units in the center of the miRNA gene cluster. Rats have a similar arrangment of repeat units in intron 10 of the Sfmbt2 gene. The discrepancy between 65 miRNA genes in the mouse cluster as compared to only 1 miRNA gene in the corresponding rat repeat cluster is ascribed to sequence differences between MSHORT1 and RSHORT1 that result in lateral-shifted, less-stable miRNA precursor hairpins for RSHORT1.</p> <p>Conclusion</p> <p>Our data provides new evidence for the emerging concept that lineage-specific retroposons have played an important role in the birth of new miRNA genes during evolution. The large difference in the number of miRNA genes in two closely related species (65 versus 1, mice versus rats) indicates that this species-specific evolution can be a rapid process.</p

    MicroRNA Genes Derived from Repetitive Elements and Expanded by Segmental Duplication Events in Mammalian Genomes

    Get PDF
    MicroRNAs (miRNAs) are a class of small noncoding RNAs that regulate gene expression by targeting mRNAs for translation repression or mRNA degradation. Many miRNAs are being discovered and studied, but in most cases their origin, evolution and function remain unclear. Here, we characterized miRNAs derived from repetitive elements and miRNA families expanded by segmental duplication events in the human, rhesus and mouse genomes. We applied a comparative genomics approach combined with identifying miRNA paralogs in segmental duplication pair data in a genome-wide study to identify new homologs of human miRNAs in the rhesus and mouse genomes. Interestingly, using segmental duplication pair data, we provided credible computational evidence that two miRNA genes are located in the pseudoautosomal region of the human Y chromosome. We characterized all the miRNAs whether they were derived from repetitive elements or not and identified significant differences between the repeat-related miRNAs (RrmiRs) and non-repeat-derived miRNAs in (1) their location in protein-coding and intergenic regions in genomes, (2) the minimum free energy of their hairpin structures, and (3) their conservation in vertebrate genomes. We found some lineage-specific RrmiR families and three lineage-specific expansion families, and provided evidence indicating that some RrmiR families formed and expanded during evolutionary segmental duplication events. We also provided computational and experimental evidence for the functions of the conservative RrmiR families in the three species. Together, our results indicate that repetitive elements contribute to the origin of miRNAs, and large segmental duplication events could prompt the expansion of some miRNA families, including RrmiR families. Our study is a valuable contribution to the knowledge of evolution and function of non-coding region in genome

    microPIR: An Integrated Database of MicroRNA Target Sites within Human Promoter Sequences

    Get PDF
    Background: microRNAs are generally understood to regulate gene expression through binding to target sequences within 39-UTRs of mRNAs. Therefore, computational prediction of target sites is usually restricted to these gene regions. Recent experimental studies though have suggested that microRNAs may alternatively modulate gene expression by interacting with promoters. A database of potential microRNA target sites in promoters would stimulate research in this field leading to more understanding of complex microRNA regulatory mechanism. Methodology: We developed a database hosting predicted microRNA target sites located within human promoter sequences and their associated genomic features, called microPIR (microRNA-Promoter Interaction Resource). microRNA seed sequences were used to identify perfect complementary matching sequences in the human promoters and the potential target sites were predicted using the RNAhybrid program..15 million target sites were identified which are located within 5000 bp upstream of all human genes, on both sense and antisense strands. The experimentally confirmed argonaute (AGO) binding sites and EST expression data including the sequence conservation across vertebrate species of each predicted target are presented for researchers to appraise the quality of predicted target sites. The microPIR database integrates various annotated genomic sequence databases, e.g. repetitive elements, transcription factor binding sites, CpG islands, and SNPs, offering users the facility to extensively explore relationships among target sites and other genomi

    Evolution of MicroRNA Genes in Oryza sativa and Arabidopsis thaliana: An Update of the Inverted Duplication Model

    Get PDF
    The origin and evolution of microRNA (miRNA) genes, which are of significance in tuning and buffering gene expressions in a number of critical cellular processes, have long attracted evolutionary biologists. However, genome-wide perspectives on their origins, potential mechanisms of their de novo generation and subsequent evolution remain largely unsolved in flowering plants. Here, genome-wide analyses of Oryza sativa and Arabidopsis thaliana revealed apparently divergent patterns of miRNA gene origins. A large proportion of miRNA genes in O. sativa were TE-related and MITE-related miRNAs in particular, whereas the fraction of these miRNA genes much decreased in A. thaliana. Our results show that the majority of TE-related and pseudogene-related miRNA genes have originated through inverted duplication instead of segmental or tandem duplication events. Based on the presented findings, we hypothesize and illustrate the four likely molecular mechanisms to de novo generate novel miRNA genes from TEs and pseudogenes. Our rice genome analysis demonstrates that non-MITEs and MITEs mediated inverted duplications have played different roles in de novo generating miRNA genes. It is confirmed that the previously proposed inverted duplication model may give explanations for non-MITEs mediated duplication events. However, many other miRNA genes, known from the earlier proposed model, were rather arisen from MITE transpositions into target genes to yield binding sites. We further investigated evolutionary processes spawned from de novo generated to maturely-formed miRNA genes and their regulatory systems. We found that miRNAs increase the tunability of some gene regulatory systems with low gene copy numbers. The results also suggest that gene balance effects may have largely contributed to the evolution of miRNA regulatory systems
    • …
    corecore