15 research outputs found

    Fig 1 -

    No full text
    Internal structure and organization of LTR-retrotransposons in plants for: (A) Ty1/copia superfamily and (B) Ty3/gypsy superfamily. Depending on the position of the integrase (INT) domain, the element can be classify to Ty1/copia or Ty3/gypsy superfamily.</p

    Neural network architecture of YORO.

    No full text
    Analysis of eukaryotic genomes requires the detection and classification of transposable elements (TEs), a crucial but complex and time-consuming task. To improve the performance of tools that accomplish these tasks, Machine Learning approaches (ML) that leverage computer resources, such as GPUs (Graphical Processing Unit) and multiple CPU (Central Processing Unit) cores, have been adopted. However, until now, the use of ML techniques has mostly been limited to classification of TEs. Herein, a detection-classification strategy (named YORO) based on convolutional neural networks is adapted from computer vision (YOLO) to genomics. This approach enables the detection of genomic objects through the prediction of the position, length, and classification in large DNA sequences such as fully sequenced genomes. As a proof of concept, the internal protein-coding domains of LTR-retrotransposons are used to train the proposed neural network. Precision, recall, accuracy, F1-score, execution times and time ratios, as well as several graphical representations were used as metrics to measure performance. These promising results open the door for a new generation of Deep Learning tools for genomics. YORO architecture is available at https://github.com/simonorozcoarias/YORO.</div

    Tools and approaches that used ML or DL approaches to analyze TEs.

    No full text
    TIR-Learner uses neural network, k-nearest neighbors, random forest, and Adaboost for the ensemble method, while ClassifyTE uses k-nearest neighbors, extra trees, random forest, support vector machine, AdaBoost, logistic regression, Gradient Boosting Classifiers and XGBoost Classifier for the stacking method. Abbreviations: RFSB: Random forest selective binary classifier, C: Classification, D: detection, A: annotation, CL: curation of TE libraries, NI: novel insertions, TU: TransposonUltimate.</p

    Fig 5 -

    No full text
    Nucleotide distances between domains reported by [12] for: (A) Ty3/gypsy and (B) Ty1/copia, and predicted by the Genomic Object Detection approach (throught YORO) for: (C) Ty3/gypsy and (D) Ty1/copia.</p

    Structure of the labels encoding the domain information of the LTR-RT domains.

    No full text
    The initial nucleotide sequence (upper section) is divided into 100 bp sections each one with 22 labels (lower section). Those labels represent the following: (1) Detection and structure information: Presence of a domain, starting position of the domain in the 100 bp section and length; (2) Domain classification: to which domain it is related (in one-hot coding, e.g. GAG = 1,0,0,0,0,0); (3) Lineage classification: to which lineage the domain is related (in one-hot coding, e.g. Tork = 1,0,0,0,0,0,0,0,0,0,0,0,0). Both the domain start position and the length are normalized to be values between 0 and 1. Following this approach, a neural network can learn how to do three different task at once (domain detection, domain classification, and lineage classification).</p
    corecore