Search CORE

15 research outputs found

Genomic object detection: An improved approach for transposable elements detection and classification using convolutional neural networks

Author: Estiven Valencia-Castrillon
Gustavo Isaza
Johan S. Piña
Luis Castillo-Ossa
Luis Humberto Lopez-Murillo
Reinel Tabares-Soto
Romain Guyot
Simon Orozco-Arias
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2023
Field of study

Directory of Open Access Journals

COCO [email protected] calculations for domain classification.

Author: Estiven Valencia-Castrillon (17020681)
Gustavo Isaza (4386541)
Johan S. Piña (10153621)
Luis Castillo-Ossa (17020684)
Luis Humberto Lopez-Murillo (17020678)
Reinel Tabares-Soto (10153624)
Romain Guyot (166092)
Simon Orozco-Arias (12065342)
Publication venue
Publication date: 21/09/2023
Field of study

COCO [email protected] calculations for domain classification.</p

FigShare

Fig 1 -

Author: Estiven Valencia-Castrillon (17020681)
Gustavo Isaza (4386541)
Johan S. Piña (10153621)
Luis Castillo-Ossa (17020684)
Luis Humberto Lopez-Murillo (17020678)
Reinel Tabares-Soto (10153624)
Romain Guyot (166092)
Simon Orozco-Arias (12065342)
Publication venue
Publication date: 21/09/2023
Field of study

Internal structure and organization of LTR-retrotransposons in plants for: (A) Ty1/copia superfamily and (B) Ty3/gypsy superfamily. Depending on the position of the integrase (INT) domain, the element can be classify to Ty1/copia or Ty3/gypsy superfamily.</p

FigShare

Neural network architecture of YORO.

Author: Estiven Valencia-Castrillon (17020681)
Gustavo Isaza (4386541)
Johan S. Piña (10153621)
Luis Castillo-Ossa (17020684)
Luis Humberto Lopez-Murillo (17020678)
Reinel Tabares-Soto (10153624)
Romain Guyot (166092)
Simon Orozco-Arias (12065342)
Publication venue
Publication date: 21/09/2023
Field of study

Analysis of eukaryotic genomes requires the detection and classification of transposable elements (TEs), a crucial but complex and time-consuming task. To improve the performance of tools that accomplish these tasks, Machine Learning approaches (ML) that leverage computer resources, such as GPUs (Graphical Processing Unit) and multiple CPU (Central Processing Unit) cores, have been adopted. However, until now, the use of ML techniques has mostly been limited to classification of TEs. Herein, a detection-classification strategy (named YORO) based on convolutional neural networks is adapted from computer vision (YOLO) to genomics. This approach enables the detection of genomic objects through the prediction of the position, length, and classification in large DNA sequences such as fully sequenced genomes. As a proof of concept, the internal protein-coding domains of LTR-retrotransposons are used to train the proposed neural network. Precision, recall, accuracy, F1-score, execution times and time ratios, as well as several graphical representations were used as metrics to measure performance. These promising results open the door for a new generation of Deep Learning tools for genomics. YORO architecture is available at https://github.com/simonorozcoarias/YORO.</div

FigShare

Average distance between domains for the Ty1/copia superfamily as observed in the analysis of 300 plant genomes [12] and in the YORO prediction.

Author: Estiven Valencia-Castrillon (17020681)
Gustavo Isaza (4386541)
Johan S. Piña (10153621)
Luis Castillo-Ossa (17020684)
Luis Humberto Lopez-Murillo (17020678)
Reinel Tabares-Soto (10153624)
Romain Guyot (166092)
Simon Orozco-Arias (12065342)
Publication venue
Publication date: 21/09/2023
Field of study

Average distance between domains for the Ty1/copia superfamily as observed in the analysis of 300 plant genomes [12] and in the YORO prediction.</p

FigShare

Tools and approaches that used ML or DL approaches to analyze TEs.

Author: Estiven Valencia-Castrillon (17020681)
Gustavo Isaza (4386541)
Johan S. Piña (10153621)
Luis Castillo-Ossa (17020684)
Luis Humberto Lopez-Murillo (17020678)
Reinel Tabares-Soto (10153624)
Romain Guyot (166092)
Simon Orozco-Arias (12065342)
Publication venue
Publication date: 21/09/2023
Field of study

TIR-Learner uses neural network, k-nearest neighbors, random forest, and Adaboost for the ensemble method, while ClassifyTE uses k-nearest neighbors, extra trees, random forest, support vector machine, AdaBoost, logistic regression, Gradient Boosting Classifiers and XGBoost Classifier for the stacking method. Abbreviations: RFSB: Random forest selective binary classifier, C: Classification, D: detection, A: annotation, CL: curation of TE libraries, NI: novel insertions, TU: TransposonUltimate.</p

FigShare

Precision and recall for YORO predictions on the <i>Oryza Sativa</i> ssp. <i>indica</i> genome versus its publicly available annotation [48].

Author: Estiven Valencia-Castrillon (17020681)
Gustavo Isaza (4386541)
Johan S. Piña (10153621)
Luis Castillo-Ossa (17020684)
Luis Humberto Lopez-Murillo (17020678)
Reinel Tabares-Soto (10153624)
Romain Guyot (166092)
Simon Orozco-Arias (12065342)
Publication venue
Publication date: 21/09/2023
Field of study

Precision and recall for YORO predictions on the Oryza Sativa ssp. indica genome versus its publicly available annotation [48].</p

FigShare

Average distance between domains for the Ty3/gypsy superfamily as observed in the analysis of 300 plant genomes [12] and in the YORO prediction.

Author: Estiven Valencia-Castrillon (17020681)
Gustavo Isaza (4386541)
Johan S. Piña (10153621)
Luis Castillo-Ossa (17020684)
Luis Humberto Lopez-Murillo (17020678)
Reinel Tabares-Soto (10153624)
Romain Guyot (166092)
Simon Orozco-Arias (12065342)
Publication venue
Publication date: 21/09/2023
Field of study

Average distance between domains for the Ty3/gypsy superfamily as observed in the analysis of 300 plant genomes [12] and in the YORO prediction.</p

FigShare

Fig 5 -

Author: Estiven Valencia-Castrillon (17020681)
Gustavo Isaza (4386541)
Johan S. Piña (10153621)
Luis Castillo-Ossa (17020684)
Luis Humberto Lopez-Murillo (17020678)
Reinel Tabares-Soto (10153624)
Romain Guyot (166092)
Simon Orozco-Arias (12065342)
Publication venue
Publication date: 21/09/2023
Field of study

Nucleotide distances between domains reported by [12] for: (A) Ty3/gypsy and (B) Ty1/copia, and predicted by the Genomic Object Detection approach (throught YORO) for: (C) Ty3/gypsy and (D) Ty1/copia.</p

FigShare

Structure of the labels encoding the domain information of the LTR-RT domains.

Author: Estiven Valencia-Castrillon (17020681)
Gustavo Isaza (4386541)
Johan S. Piña (10153621)
Luis Castillo-Ossa (17020684)
Luis Humberto Lopez-Murillo (17020678)
Reinel Tabares-Soto (10153624)
Romain Guyot (166092)
Simon Orozco-Arias (12065342)
Publication venue
Publication date: 21/09/2023
Field of study

The initial nucleotide sequence (upper section) is divided into 100 bp sections each one with 22 labels (lower section). Those labels represent the following: (1) Detection and structure information: Presence of a domain, starting position of the domain in the 100 bp section and length; (2) Domain classification: to which domain it is related (in one-hot coding, e.g. GAG = 1,0,0,0,0,0); (3) Lineage classification: to which lineage the domain is related (in one-hot coding, e.g. Tork = 1,0,0,0,0,0,0,0,0,0,0,0,0). Both the domain start position and the length are normalized to be values between 0 and 1. Following this approach, a neural network can learn how to do three different task at once (domain detection, domain classification, and lineage classification).</p

FigShare