253 research outputs found

    Identification and expression of equine MER-derived miRNAs

    Get PDF
    MicroRNAs (miRNAs) are single-stranded, small RNAs (21-23 nucleotides) that function in gene silencing and translational inhibition via the RNA interference mechanism. Most miRNAs originate from host genomic regions, such as intergenic regions, introns, exons, and transposable elements (TEs). Here, we focused on the palindromic structure of medium reiteration frequencies (MERs), which are similar to precursor miRNAs. Five MER consensus sequences (MER5A1, MER53, MER81, MER91C, and MER117) were matched with paralogous transcripts predicted to be precursor miRNAs in the horse genome (equCab2) and located in either intergenic regions or introns. The MER5A1, MER53, and MER91C sequences obtained from RepeatMasker were matched with the eca-miR-544b, eca-miR-1302, and eca-miR-652 precursor sequences derived from Ensembl transcript database, respectively. Each precursor form was anticipated to yield two mature forms, and we confirmed miRNA expression in six different tissues (cerebrum, cerebellum, lung, spleen, adrenal gland, and duodenum) of one thoroughbred horse. MER5A1-derived miRNAs generally showed significantly higher expression in the lung than in other tissues. MER91C-derived miRNA-5p also showed significantly higher expression in the duodenum than in other tissues (cerebellum, lung, spleen, and adrenal gland). The MER117-overlapped expressed sequence tag generated polycistronic miRNAs, which showed higher expression in the duodenum than other tissues. These data indicate that horse MER transposons encode miRNAs that are expressed in several tissues and are thought to have biological functions

    Study of large and highly stratified population datasets by combining iterative pruning principal component analysis and structure

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The ever increasing sizes of population genetic datasets pose great challenges for population structure analysis. The Tracy-Widom (TW) statistical test is widely used for detecting structure. However, it has not been adequately investigated whether the TW statistic is susceptible to type I error, especially in large, complex datasets. Non-parametric, Principal Component Analysis (PCA) based methods for resolving structure have been developed which rely on the TW test. Although PCA-based methods can resolve structure, they cannot infer ancestry. Model-based methods are still needed for ancestry analysis, but they are not suitable for large datasets. We propose a new structure analysis framework for large datasets. This includes a new heuristic for detecting structure and incorporation of the structure patterns inferred by a PCA method to complement STRUCTURE analysis.</p> <p>Results</p> <p>A new heuristic called EigenDev for detecting population structure is presented. When tested on simulated data, this heuristic is robust to sample size. In contrast, the TW statistic was found to be susceptible to type I error, especially for large population samples. EigenDev is thus better-suited for analysis of large datasets containing many individuals, in which spurious patterns are likely to exist and could be incorrectly interpreted as population stratification. EigenDev was applied to the iterative pruning PCA (ipPCA) method, which resolves the underlying subpopulations. This subpopulation information was used to supervise STRUCTURE analysis to infer patterns of ancestry at an unprecedented level of resolution. To validate the new approach, a bovine and a large human genetic dataset (3945 individuals) were analyzed. We found new ancestry patterns consistent with the subpopulations resolved by ipPCA.</p> <p>Conclusions</p> <p>The EigenDev heuristic is robust to sampling and is thus superior for detecting structure in large datasets. The application of EigenDev to the ipPCA algorithm improves the estimation of the number of subpopulations and the individual assignment accuracy, especially for very large and complex datasets. Furthermore, we have demonstrated that the structure resolved by this approach complements parametric analysis, allowing a much more comprehensive account of population structure. The new version of the ipPCA software with EigenDev incorporated can be downloaded from <url>http://www4a.biotec.or.th/GI/tools/ippca</url>.</p

    Iterative pruning PCA improves resolution of highly structured populations

    Get PDF
    BACKGROUND: Non-random patterns of genetic variation exist among individuals in a population owing to a variety of evolutionary factors. Therefore, populations are structured into genetically distinct subpopulations. As genotypic datasets become ever larger, it is increasingly difficult to correctly estimate the number of subpopulations and assign individuals to them. The computationally efficient non-parametric, chiefly Principal Components Analysis (PCA)-based methods are thus becoming increasingly relied upon for population structure analysis. Current PCA-based methods can accurately detect structure; however, the accuracy in resolving subpopulations and assigning individuals to them is wanting. When subpopulations are closely related to one another, they overlap in PCA space and appear as a conglomerate. This problem is exacerbated when some subpopulations in the dataset are genetically far removed from others. We propose a novel PCA-based framework which addresses this shortcoming. RESULTS: A novel population structure analysis algorithm called iterative pruning PCA (ipPCA) was developed which assigns individuals to subpopulations and infers the total number of subpopulations present. Genotypic data from simulated and real population datasets with different degrees of structure were analyzed. For datasets with simple structures, the subpopulation assignments of individuals made by ipPCA were largely consistent with the STRUCTURE, BAPS and AWclust algorithms. On the other hand, highly structured populations containing many closely related subpopulations could be accurately resolved only by ipPCA, and not by other methods. CONCLUSION: The algorithm is computationally efficient and not constrained by the dataset complexity. This systematic subpopulation assignment approach removes the need for prior population labels, which could be advantageous when cryptic stratification is encountered in datasets containing individuals otherwise assumed to belong to a homogenous population

    TranspoGene and microTranspoGene: transposed elements influence on the transcriptome of seven vertebrates and invertebrates

    Get PDF
    Transposed elements (TEs) are mobile genetic sequences. During the evolution of eukaryotes TEs were inserted into active protein-coding genes, affecting gene structure, expression and splicing patterns, and protein sequences. Genomic insertions of TEs also led to creation and expression of new functional non-coding RNAs such as micro- RNAs. We have constructed the TranspoGene database, which covers TEs located inside proteincoding genes of seven species: human, mouse, chicken, zebrafish, fruit fly, nematode and sea squirt. TEs were classified according to location within the gene: proximal promoter TEs, exonized TEs (insertion within an intron that led to exon creation), exonic TEs (insertion into an existing exon) or intronic TEs. TranspoGene contains information regarding specific type and family of the TEs, genomic and mRNA location, sequence, supporting transcript accession and alignment to the TE consensus sequence. The database also contains host gene specific data: gene name, genomic location, Swiss-Prot and RefSeq accessions, diseases associated with the gene and splicing pattern. In addition, we created microTranspoGene: a database of human, mouse, zebrafish and nematode TEderived microRNAs. The TranspoGene and micro- TranspoGene databases can be used by researchers interested in the effect of TE insertion on the eukaryotic transcriptome

    Plasmodium parasites mount an arrest response to dihydroartemisinin, as revealed by whole transcriptome shotgun sequencing (RNA-seq) and microarray study

    Get PDF
    RNA-seq data analysis from DHA treatment of P. falciparum Limma results from 1 h treatments with 500 nM DHA in P. falciparum K1 rings, trophozoites and schizonts. (XLS 2040 kb

    Unique and conserved MicroRNAs in wheat chromosome 5D revealed by next-generation sequencing

    Get PDF
    MicroRNAs are a class of short, non-coding, single-stranded RNAs that act as post-transcriptional regulators in gene expression. miRNA analysis of Triticum aestivum chromosome 5D was performed on 454 GS FLX Titanium sequences of flow sorted chromosome 5D with a total of 3,208,630 good quality reads representing 1.34x and 1.61x coverage of the short (5DS) and long (5DL) arms of the chromosome respectively. In silico and structural analyses revealed a total of 55 miRNAs; 48 and 42 miRNAs were found to be present on 5DL and 5DS respectively, of which 35 were common to both chromosome arms, while 13 miRNAs were specific to 5DL and 7 miRNAs were specific to 5DS. In total, 14 of the predicted miRNAs were identified in wheat for the first time. Representation (the copy number of each miRNA) was also found to be higher in 5DL (1,949) compared to 5DS (1,191). Targets were predicted for each miRNA, while expression analysis gave evidence of expression for 6 out of 55 miRNAs. Occurrences of the same miRNAs were also found in Brachypodium distachyon and Oryza sativa genome sequences to identify syntenic miRNA coding sequences. Based on this analysis, two other miRNAs: miR1133 and miR167 were detected in B. distachyon syntenic region of wheat 5DS. Five of the predicted miRNA coding regions (miR6220, miR5070, miR169, miR5085, miR2118) were experimentally verified to be located to the 5D chromosome and three of them : miR2118, miR169 and miR5085, were shown to be 5D specific. Furthermore miR2118 was shown to be expressed in Chinese Spring adult leaves. miRNA genes identified in this study will expand our understanding of gene regulation in bread wheat

    Origins and Evolution of MicroRNA Genes in Drosophila Species

    Get PDF
    MicroRNAs (miRs) regulate gene expression at the posttranscriptional level. To obtain some insights into the origins and evolutionary patterns of miR genes, we have identified miR genes in the genomes of 12 Drosophila species by bioinformatics approaches and examined their evolutionary changes. The results showed that the extant and ancestral Drosophila species had more than 100 miR genes and frequent gains and losses of miR genes have occurred during evolution. Although many miR genes appear to have originated from random hairpin structures in intronic or intergenic regions, duplication of miR genes has also contributed to the generation of new miR genes. Estimating the rate of nucleotide substitution of miR genes, we have found that newly arisen miR genes have a substitution rate similar to that of synonymous nucleotide sites in protein-coding genes and evolve almost neutrally. This suggests that most new miR genes have not acquired any important function and would become inactive. By contrast, old miR genes show a substitution rate much lower than the synonymous rate. Moreover, paired and unpaired nucleotide sites of miR genes tend to remain unchanged during evolution. Therefore, once miR genes acquired their functions, they appear to have evolved very slowly, maintaining essentially the same structures for a long time

    Large-scale discovery of insertion hotspots and preferential integration sites of human transposed elements

    Get PDF
    Throughout evolution, eukaryotic genomes have been invaded by transposable elements (TEs). Little is known about the factors leading to genomic proliferation of TEs, their preferred integration sites and the molecular mechanisms underlying their insertion. We analyzed hundreds of thousands nested TEs in the human genome, i.e. insertions of TEs into existing ones. We first discovered that most TEs insert within specific ‘hotspots’ along the targeted TE. In particular, retrotransposed Alu elements contain a non-canonical single nucleotide hotspot for insertion of other Alu sequences. We next devised a method for identification of integration sequence motifs of inserted TEs that are conserved within the targeted TEs. This method revealed novel sequences motifs characterizing insertions of various important TE families: Alu, hAT, ERV1 and MaLR. Finally, we performed a global assessment to determine the extent to which young TEs tend to nest within older transposed elements and identified a 4-fold higher tendency of TEs to insert into existing TEs than to insert within non-TE intergenic regions. Our analysis demonstrates that TEs are highly biased to insert within certain TEs, in specific orientations and within specific targeted TE positions. TE nesting events also reveal new characteristics of the molecular mechanisms underlying transposition
    corecore