9 research outputs found
Automated paleontology of repetitive DNA with REANNOTATE
<p>Abstract</p> <p>Background</p> <p>Dispersed repeats are a major component of eukaryotic genomes and drivers of genome evolution. Annotation of DNA sequences homologous to known repetitive elements has been mainly performed with the program R<smcaps>EPEAT</smcaps>M<smcaps>ASKER</smcaps>. Sequences annotated by R<smcaps>EPEAT</smcaps>M<smcaps>ASKER</smcaps> often correspond to fragments of repetitive elements resulting from the insertion of younger elements or other rearrangements. Although R<smcaps>EPEAT</smcaps>M<smcaps>ASKER</smcaps> annotation is indispensable for studying genome biology, this annotation does not contain much information on the common origin of fossil fragments that share an insertion event, especially where clusters of nested insertions of repetitive elements have occurred.</p> <p>Results</p> <p>Here I present RE<smcaps>ANNOTATE</smcaps>, a computational tool to process R<smcaps>EPEAT</smcaps>M<smcaps>ASKER</smcaps> annotation for automated i) defragmentation of dispersed repetitive elements, ii) resolution of the temporal order of insertions in clusters of nested elements, and iii) estimating the age of the elements, if they have long terminal repeats. I have re-annotated the repetitive content of human chromosomes, providing evidence for a recent expansion of satellite repeats on the Y chromosome and, from the retroviral age distribution, for a higher rate of evolution on the Y relative to autosomes.</p> <p>Conclusion</p> <p>RE<smcaps>ANNOTATE</smcaps> is ready to process existing annotation for automated evolutionary analysis of all types of complex repeats in any genome. The tool is freely available under the GPL at <url>http://www.bioinformatics.org/reannotate</url>.</p
Insertion bias and purifying selection of retrotransposons in the Arabidopsis thaliana genome
BACKGROUND: Genome evolution and size variation in multicellular organisms are profoundly influenced by the activity of retrotransposons. In higher eukaryotes with compact genomes retrotransposons are found in lower copy numbers than in larger genomes, which could be due to either suppression of transposition or to elimination of insertions, and are non-randomly distributed along the chromosomes. The evolutionary mechanisms constraining retrotransposon copy number and chromosomal distribution are still poorly understood. RESULTS: I investigated the evolutionary dynamics of long terminal repeat (LTR)-retrotransposons in the compact Arabidopsis thaliana genome, using an automated method for obtaining genome-wide, age and physical distribution profiles for different groups of elements, and then comparing the distributions of young and old insertions. Elements of the Pseudoviridae family insert randomly along the chromosomes and have been recently active, but insertions tend to be lost from euchromatic regions where they are less likely to fix, with a half-life estimated at approximately 470,000 years. In contrast, members of the Metaviridae (particularly Athila) preferentially target heterochromatin, and were more active in the past. CONCLUSION: Diverse evolutionary mechanisms have constrained both the copy number and chromosomal distribution of retrotransposons within a single genome. In A. thaliana, their non-random genomic distribution is due to both selection against insertions in euchromatin and preferential targeting of heterochromatin. Constant turnover of euchromatic insertions and a decline in activity for the elements that target heterochromatin have both limited the contribution of retrotransposon DNA to genome size expansion in A. thaliana
Hundreds of putatively functional small open reading frames in Drosophila
Background: The relationship between DNA sequence and encoded information is still an unsolved puzzle. The number of protein-coding genes in higher eukaryotes identified by genome projects is lower than was expected, while a considerable amount of putatively non-coding transcription has been detected. Functional small open reading frames (smORFs) are known to exist in several organisms. However, coding sequence detection methods are biased against detecting such very short open reading frames. Thus, a substantial number of non-canonical coding regions encoding short peptides might await characterization.
Results: Using bio-informatics methods, we have searched for smORFs of less than 100 amino acids in the putatively non-coding euchromatic DNA of Drosophila melanogaster, and initially identified nearly 600,000 of them. We have studied the pattern of conservation of these smORFs as coding entities between D. melanogaster and Drosophila pseudoobscura, their presence in syntenic and in transcribed regions of the genome, and their ratio of conservative versus non-conservative nucleotide changes. For negative controls, we compared the results with those obtained using random short sequences, while a positive control was provided by smORFs validated by proteomics data.
Conclusions: The combination of these analyses led us to postulate the existence of at least 401 functional smORFs in Drosophila, with the possibility that as many as 4,561 such functional smORFs may exist
The Effect of Transposable Element Insertions on Gene Expression Evolution in Rodents
Background:Many genomes contain a substantial number of transposable elements (TEs), a few of which are known to be involved in regulating gene expression. However, recent observations suggest that TEs may have played a very important role in the evolution of gene expression because many conserved non-genic sequences, some of which are know to be involved in gene regulation, resemble TEs. Results:Here we investigate whether new TE insertions affect gene expression profiles by testing whether gene expression divergence between mouse and rat is correlated to the numbers of new transposable elements inserted near genes. We show that expression divergence is significantly correlated to the number of new LTR and SINE elements, but not to the numbers of LINEs. We also show that expression divergence is not significantly correlated to the numbers of ancestral TEs in most cases, which suggests that the correlations between expression divergence and the numbers of new TEs are causal in nature. We quantify the effect and estimate that TE insertion has accounted for ~20% (95% confidence interval: 12% to 26%) of all expression profile divergence in rodents. Conclusions:We conclude that TE insertions may have had a major impact on the evolution of gene expression levels in rodents
Effects of Recombination Rate on Human Endogenous Retrovirus Fixation and Persistence▿
Endogenous retroviruses (ERVs) result from germ line infections by exogenous retroviruses. They can proliferate within the genome of their host species until they are either inactivated by mutation or removed by recombinational deletion. ERVs belong to a diverse group of mobile genetic elements collectively termed transposable elements (TEs). Numerous studies have attempted to elucidate the factors determining the genomic distribution and persistence of TEs. Here we show that, within humans, gene density and not recombination rate correlates with fixation of endogenous retroviruses, whereas the local recombination rate determines their persistence in a full-length state. Recombination does not appear to influence fixation either via the ectopic exchange model or by indirect models based on the efficacy of selection. We propose a model linking rates of meiotic recombination to the probability of recombinational deletion to explain the effect of recombination rate on persistence. Chromosomes 19 and Y are exceptions, possessing more elements than other regions, and we suggest this is due to low gene density and elevated rates of human ERV integration in males for chromosome Y and segmental duplication for chromosome 19
A Problem With the Correlation Coefficient as a Measure of Gene Expression Divergence
The correlation coefficient is commonly used as a measure of the divergence of gene expression profiles between different species. Here we point out a potential problem with this statistic: if measurement error is large relative to the differences in expression, the correlation coefficient will tend to show high divergence for genes that have relatively uniform levels of expression across tissues or time points. We show that genes with a conserved uniform pattern of expression have significantly higher levels of expression divergence, when measured using the correlation coefficient, than other genes, in a data set from mouse, rat, and human. We also show that the Euclidean distance yields low estimates of expression divergence for genes with a conserved uniform pattern of expression
International Nosocomial Infection Control Consortium report, data summary of 50 countries for 2010-2015: Device-associated module
•We report INICC device-associated module data of 50 countries from 2010-2015.•We collected prospective data from 861,284 patients in 703 ICUs for 3,506,562 days.•DA-HAI rates and bacterial resistance were higher in the INICC ICUs than in CDC-NHSN's.•Device utilization ratio in the INICC ICUs was similar to CDC-NHSN's.
Background: We report the results of International Nosocomial Infection Control Consortium (INICC) surveillance study from January 2010-December 2015 in 703 intensive care units (ICUs) in Latin America, Europe, Eastern Mediterranean, Southeast Asia, and Western Pacific.
Methods: During the 6-year study period, using Centers for Disease Control and Prevention National Healthcare Safety Network (CDC-NHSN) definitions for device-associated health care-associated infection (DA-HAI), we collected prospective data from 861,284 patients hospitalized in INICC hospital ICUs for an aggregate of 3,506,562 days.
Results: Although device use in INICC ICUs was similar to that reported from CDC-NHSN ICUs, DA-HAI rates were higher in the INICC ICUs: in the INICC medical-surgical ICUs, the pooled rate of central line-associated bloodstream infection, 4.1 per 1,000 central line-days, was nearly 5-fold higher than the 0.8 per 1,000 central line-days reported from comparable US ICUs, the overall rate of ventilator-associated pneumonia was also higher, 13.1 versus 0.9 per 1,000 ventilator-days, as was the rate of catheter-associated urinary tract infection, 5.07 versus 1.7 per 1,000 catheter-days. From blood cultures samples, frequencies of resistance of Pseudomonas isolates to amikacin (29.87% vs 10%) and to imipenem (44.3% vs 26.1%), and of Klebsiella pneumoniae isolates to ceftazidime (73.2% vs 28.8%) and to imipenem (43.27% vs 12.8%) were also higher in the INICC ICUs compared with CDC-NHSN ICUs.
Conclusions: Although DA-HAIs in INICC ICU patients continue to be higher than the rates reported in CDC-NSHN ICUs representing the developed world, we have observed a significant trend toward the reduction of DA-HAI rates in INICC ICUs as shown in each international report. It is INICC's main goal to continue facilitating education, training, and basic and cost-effective tools and resources, such as standardized forms and an online platform, to tackle this problem effectively and systematically