Search CORE

43 research outputs found

Expedited batch processing and analysis of transposon insertions

Author: A Bohne
AL Price
BC Meyers
CM Bergman
CM Bergman
David A Ray
ES Lander
GD Schuler
JE Stajich
Jeremy D Smith
PL Deininger
RC Edgar
RC Edgar
RH Waterston
Sela
SF Altschul
Publication venue: BioMed Central
Publication date: 01/11/2011
Field of study

Abstract Background With advances in sequencing technology, greater and greater amounts of eukaryotic genome data are becoming available. Often, large portions of these genomes consist of transposable elements, frequently accounting for 50% or more in vertebrates. Each transposable element family may have thousands or tens of thousands of individual copies within a given genome, and therefore it can take an exorbitant amount of time and effort to process data in a meaningful fashion. Findings In order to combat this problem, we developed a set of bioinformatics techniques and programs to streamline the analysis. This includes a unique Perl script which automates the process of taking BLAST, Repeatmasker and similar data to extract and manipulate the hit sequences from the genome. This script, called Process_hits uses an object-oriented methodology to compile all hit locations from a given file for processing, organize this data into useable categories, and output it in multiple formats. Conclusions The program proved capable of handling large amounts of transposon data in an efficient fashion. It is equipped with a number of useful sub-functions, each of which is contained within its own sub-module to allow for greater expandability and as a foundation for future program design.</p

Crossref

Directory of Open Access Journals

PubMed Central

A Bioinformatics Approach for Detecting Repetitive Nested Motifs using Pattern Matching

Author: Carballido Jessica Andrea
Echenique Carmen Viviana
Garbus Ingrid
Ponzoni Ignacio
Romero José Rodolfo
Publication venue: Bioinformatics Inst
Publication date: 30/10/2016
Field of study

The identification of nested motifs in genomic sequences is a complex computational problem. The detection of these patterns is important to allow discovery of transposable element (TE) insertions, incomplete reverse transcripts, deletions, and/or mutations. Here, we designed a de novo strategy for detecting patterns that represent nested motifs based on exhaustive searches for pairs of motifs and combinatorial pattern analysis. These patterns can be grouped into three categories: motifs within other motifs, motifs flanked by other motifs, and motifs of large size. Our methodology, applied to genomic sequences from the plant species Aegilops tauschii and Oryza sativa, revealed that it is possible to find putative nested TEs by detecting these three types of patterns. The results were validated though BLAST alignments, which revealed the efficacy and usefulness of the new method, which we call Mamushka.Fil: Romero, José Rodolfo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; ArgentinaFil: Carballido, Jessica Andrea. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Cs. E Ingeniería de la Computacion; ArgentinaFil: Garbus, Ingrid. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; ArgentinaFil: Echenique, Carmen Viviana. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; ArgentinaFil: Ponzoni, Ignacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Cs. E Ingeniería de la Computacion; Argentin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

Bioinformatics: Strategies, Trends, and Perspectives

Author: Adriane Beatriz de Souza Serapião
Carlos Norberto Fischer
Publication venue: 'IntechOpen'
Publication date: 01/03/2010
Field of study

IntechOpen

A machine learning based framework to identify and classify long terminal repeat retrotransposons

Author: Blockeel Hendrik
Carareto Claudia MA
Cerri Ricardo
Costa Eduardo
Fischer Carlos N
Ramon Jan
Schietgat Leander
Vens Celine
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

Transposable elements (TEs) are repetitive nucleotide sequences that make up a large portion of eukaryotic genomes. They can move and duplicate within a genome, increasing genome size and contributing to genetic diversity within and across species. Accurate identification and classification of TEs present in a genome is an important step towards understanding their effects on genes and their role in genome evolution. We introduce TE-LEARNER, a framework based on machine learning that automatically identifies TEs in a given genome and assigns a classification to them. We present an implementation of our framework towards LTR retrotransposons, a particular type of TEs characterized by having long terminal repeats (LTRs) at their boundaries. We evaluate the predictive performance of our framework on the well-annotated genomes of Drosophila melanogaster and Arabidopsis thaliana and we compare our results for three LTR retrotransposon superfamilies with the results of three widely used methods for TE identification or classification: REPEATMASKER, CENSOR and LTRDIGEST. In contrast to these methods, TE-LEARNER is the first to incorporate machine learning techniques, outperforming these methods in terms of predictive performance , while able to learn models and make predictions efficiently. Moreover, we show that our method was able to identify TEs that none of the above method could find, and we investigated TE-LEARNER'S predictions which did not correspond to an official annotation. It turns out that many of these predictions are in fact strongly homologous to a known TE

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Ghent University Academic Bibliography

Directory of Open Access Journals

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Elements transposables, de l'excepció a la norma

Author: Casals Ferran
Publication venue: Institut d'Estudis Catalans
Publication date: 01/01/2009
Field of study

Els elements transposables són seqüències amb la capacitat de canviar la seva posició en el genoma. Són uns components molt abundants, que en el cas del genoma humà representen fins al 50 % del genoma. Tot i la seva gran diversitat, es poden agrupar en dos grans tipus, segons el seu mecanisme de mobilització. Essencialment són considerats paràsits intracel·lulars, amb una gran habilitat per replicar-se i evitar ser eliminats per l'hoste. A més de mobilitzar-se dins del genoma i transmetre's verticalment a la descendència, molts elements transposables han estat capaços de saltar la barrera de les espècies i transferir-se horitzontalment entre els genomes. La genètica ha desenvolupat diferents mètodes per detectar els elements transposables dins de les seqüències genòmiques i estudiar-ne el comportament, tant dins com entre les espècies. En alguns casos el genoma ha domesticat un element transposable, que desenvolupa una funció cel·lular. Finalment, constitueixen una font de variabilitat, que és la matèria primera per a l'evolució de les espècies.Transposable elements are sequences with the ability to change their position in the genome. They are very abundant, representing up to 50% of the sequence in the case of the human genome. In spite of their high diversity they can be grouped into two big classes, according to their mechanism of mobilization. They are essentially considered to be intracellular parasites, with a great ability to replicate and to avoid elimination by the host. Besides mobilizing inside the genome and being vertically transmitted to descendants, several transposable elements have been able to cross the species borders, horizontally transmitting across genomes. Genetics has developed different methods to detect transposable elements in genome sequences, as well as to study their behavior within and between species. In some cases genomes have been able to domesticate some of them, those that are developing cellular functions. Finally, they are a source of variability, the raw material for the evolution of species

Revistes Catalanes amb Accés Obert

Hemeroteca Cientifica Catalana

MITE Digger, an efficient and accurate algorithm for genome wide discovery of miniature inverted repeat transposable elements

Author: B Yaakov
C Feschotte
C Feschotte
C Lu
CM Bergman
D Cantu
E Lerat
F Isam
G Yang
G Yang
GJ Yang
GJ Yang
GJ Yang
GJ Yang
Guojun Yang
HH Kuang
I Fattash
J Piriyapongsa
JE Stajich
K Naito
KC Park
M Janicki
M Momose
M Yano
MJ Han
N Jiang
N Jiang
R Rooke
S Tempel
S Wang
T Tanaka
TE Bureau
TE Bureau
V Nene
Y Chen
Y Han
YS Yan
ZJ Tu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences

Author: Bureau
Bureau
Bureau
Chen
Edgar
Feschotte
Jiang
Jiang
Jurka
Kirkness
Lander
Mathew
Moreno-V zquez
Oki
Osborne
Paterson
S. R. Wessler
Santiago
Schnable
Smit
Tu
Waterston
Y. Han
Yang
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Miniature inverted-repeat transposable elements (MITEs) are a special type of Class 2 non-autonomous transposable element (TE) that are abundant in the non-coding regions of the genes of many plant and animal species. The accurate identification of MITEs has been a challenge for existing programs because they lack coding sequences and, as such, evolve very rapidly. Because of their importance to gene and genome evolution, we developed MITE-Hunter, a program pipeline that can identify MITEs as well as other small Class 2 non-autonomous TEs from genomic DNA data sets. The output of MITE-Hunter is composed of consensus TE sequences grouped into families that can be used as a library file for homology-based TE detection programs such as RepeatMasker. MITE-Hunter was evaluated by searching the rice genomic database and comparing the output with known rice TEs. It discovered most of the previously reported rice MITEs (97.6%), and found sixteen new elements. MITE-Hunter was also compared with two other MITE discovery programs, FINDMITE and MUST. Unlike MITE-Hunter, neither of these programs can search large genomic data sets including whole genome sequences. More importantly, MITE-Hunter is significantly more accurate than either FINDMITE or MUST as the vast majority of their outputs are false-positives

CiteSeerX

Crossref

PubMed Central

eScholarship - University of California

Recommended from our members

Transposable Element Abundance and Variability in 28 Different Species in the Family Solanaceae

Author: Mendieta John P
Publication venue: CU Scholar
Publication date: 01/01/2015
Field of study

Transposable Elements (TEs) are small nucleic acid parasites that replicate and reinsert themselves into the genome of their host organism. These small genetic parasites have in recent times been seen as possible evolutionary drivers in the development and evolution of genomic adaptations as well as genomic architecture. While much is known about the possible effects of TEs on an individual organism, little is known about their dynamics on a family level scale. In order to investigate this relationship, TE types and abundances were analyzed for 28 species in the highly diverse plant family Solanaceae. Transposable Elements were identified and investigated by running the program RepeatExplorer on whole genome shotgun data sets from 28 different species in the Physaleae and Solanaea tribes in the Solanacea family. I identified the genomic proportion of repetitive elements in all species and found that on a family level, two TE types, LTR gypsy and unclassified repetitive content were the most abundant for all species. On a family level, class II TEs were found to be far less numerous in genomic proportion, but were far more variable on an individual level. These results indicated that while LTR gypsy and Unclassified TEs are more important for long-term genomic dynamics, Class II TEs act more significantly in the short term. Clades also appear to have a relationship on TE abundances with more closely related species having similar genomic percentage of TEs, but due to our lack of branch lengths in the phylogeny I was unable to calculate this metric. Finally, while these results are interesting, there is currently no all-encompassing biological explanation as to exactly why these family level genomic trends are being exhibited

CU Scholar Institutional Repository

RiTE database: a resource database for genus-wide rice genomics and evolutionary biology

Crossref