Search CORE

19 research outputs found

Number of syntactic relations classified by age and label, recovered from each whole graph (hence, including the giant connected component and other smaller networks).

Author: Diego Serna Salazar (4386544)
Gustavo Isaza (4386541)
Lluís Barceló-Coblijn (4386535)
Luis F. Castillo Ossa (4386538)
Manuel G. Bedia (694599)
Publication venue
Publication date
Field of study

Number of syntactic relations classified by age and label, recovered from each whole graph (hence, including the giant connected component and other smaller networks).</p

FigShare

Main features of the networks as a result from the analysis of corpora by means of Netlang.

Author: Diego Serna Salazar (4386544)
Gustavo Isaza (4386541)
Lluís Barceló-Coblijn (4386535)
Luis F. Castillo Ossa (4386538)
Manuel G. Bedia (694599)
Publication venue
Publication date
Field of study

Analysis of the giant connected component of the graph: (C) Clustering coefficient, Nodes or number of different words, Edges or number of syntactic links, <k> average number of edges per node, L or characteristic path length. Age is typically written “years;months.days”, hence the bilingual child is 2 years, 1 month and 20 days old.</p

FigShare

Hubs (or highly connected words) of the networks in three temporal periods, at 2.6, 3.1 and 7 years of the child’s life.

Author: Diego Serna Salazar (4386544)
Gustavo Isaza (4386541)
Lluís Barceló-Coblijn (4386535)
Luis F. Castillo Ossa (4386538)
Manuel G. Bedia (694599)
Publication venue
Publication date
Field of study

Hubs (or highly connected words) of the networks in three temporal periods, at 2.6, 3.1 and 7 years of the child’s life.</p

FigShare

Six networks reflecting the twins’ language ontogeny.

Author: Diego Serna Salazar (4386544)
Gustavo Isaza (4386541)
Lluís Barceló-Coblijn (4386535)
Luis F. Castillo Ossa (4386538)
Manuel G. Bedia (694599)
Publication venue
Publication date
Field of study

In three different periods of their life: at 2 years and 6 months, at 3 years and 1 month and at 7 years. The child MH (files NAM), letters (A), (B) and (C), had a focal lesion.</p

FigShare

The bilingual network.

Author: Diego Serna Salazar (4386544)
Gustavo Isaza (4386541)
Lluís Barceló-Coblijn (4386535)
Luis F. Castillo Ossa (4386538)
Manuel G. Bedia (694599)
Publication venue
Publication date
Field of study

Lexical categories have been customized in order to reflect whether the word is English or Spanish. An additional third color has been selected for proper names. Syntactic relations are also reflected in the network.</p

FigShare

Structure of the labels encoding the domain information of the LTR-RT domains.

Author: Estiven Valencia-Castrillon (17020681)
Gustavo Isaza (4386541)
Johan S. Piña (10153621)
Luis Castillo-Ossa (17020684)
Luis Humberto Lopez-Murillo (17020678)
Reinel Tabares-Soto (10153624)
Romain Guyot (166092)
Simon Orozco-Arias (12065342)
Publication venue
Publication date: 21/09/2023
Field of study

The initial nucleotide sequence (upper section) is divided into 100 bp sections each one with 22 labels (lower section). Those labels represent the following: (1) Detection and structure information: Presence of a domain, starting position of the domain in the 100 bp section and length; (2) Domain classification: to which domain it is related (in one-hot coding, e.g. GAG = 1,0,0,0,0,0); (3) Lineage classification: to which lineage the domain is related (in one-hot coding, e.g. Tork = 1,0,0,0,0,0,0,0,0,0,0,0,0). Both the domain start position and the length are normalized to be values between 0 and 1. Following this approach, a neural network can learn how to do three different task at once (domain detection, domain classification, and lineage classification).</p

FigShare

Tools and approaches that used ML or DL approaches to analyze TEs.

Author: Estiven Valencia-Castrillon (17020681)
Gustavo Isaza (4386541)
Johan S. Piña (10153621)
Luis Castillo-Ossa (17020684)
Luis Humberto Lopez-Murillo (17020678)
Reinel Tabares-Soto (10153624)
Romain Guyot (166092)
Simon Orozco-Arias (12065342)
Publication venue
Publication date: 21/09/2023
Field of study

TIR-Learner uses neural network, k-nearest neighbors, random forest, and Adaboost for the ensemble method, while ClassifyTE uses k-nearest neighbors, extra trees, random forest, support vector machine, AdaBoost, logistic regression, Gradient Boosting Classifiers and XGBoost Classifier for the stacking method. Abbreviations: RFSB: Random forest selective binary classifier, C: Classification, D: detection, A: annotation, CL: curation of TE libraries, NI: novel insertions, TU: TransposonUltimate.</p

FigShare

Neural network architecture of YORO.

Author: Estiven Valencia-Castrillon (17020681)
Gustavo Isaza (4386541)
Johan S. Piña (10153621)
Luis Castillo-Ossa (17020684)
Luis Humberto Lopez-Murillo (17020678)
Reinel Tabares-Soto (10153624)
Romain Guyot (166092)
Simon Orozco-Arias (12065342)
Publication venue
Publication date: 21/09/2023
Field of study

Analysis of eukaryotic genomes requires the detection and classification of transposable elements (TEs), a crucial but complex and time-consuming task. To improve the performance of tools that accomplish these tasks, Machine Learning approaches (ML) that leverage computer resources, such as GPUs (Graphical Processing Unit) and multiple CPU (Central Processing Unit) cores, have been adopted. However, until now, the use of ML techniques has mostly been limited to classification of TEs. Herein, a detection-classification strategy (named YORO) based on convolutional neural networks is adapted from computer vision (YOLO) to genomics. This approach enables the detection of genomic objects through the prediction of the position, length, and classification in large DNA sequences such as fully sequenced genomes. As a proof of concept, the internal protein-coding domains of LTR-retrotransposons are used to train the proposed neural network. Precision, recall, accuracy, F1-score, execution times and time ratios, as well as several graphical representations were used as metrics to measure performance. These promising results open the door for a new generation of Deep Learning tools for genomics. YORO architecture is available at https://github.com/simonorozcoarias/YORO.</div

FigShare

Average distance between domains for the Ty1/copia superfamily as observed in the analysis of 300 plant genomes [12] and in the YORO prediction.

Author: Estiven Valencia-Castrillon (17020681)
Gustavo Isaza (4386541)
Johan S. Piña (10153621)
Luis Castillo-Ossa (17020684)
Luis Humberto Lopez-Murillo (17020678)
Reinel Tabares-Soto (10153624)
Romain Guyot (166092)
Simon Orozco-Arias (12065342)
Publication venue
Publication date: 21/09/2023
Field of study

Average distance between domains for the Ty1/copia superfamily as observed in the analysis of 300 plant genomes [12] and in the YORO prediction.</p

FigShare

YORO’s performance in detecting LTR-retrotransposon’s internal part grouped by at least three domains with a maximum distance of 3, 000 bp, according the Genomic Object Detection approach.

Author: Estiven Valencia-Castrillon (17020681)
Gustavo Isaza (4386541)
Johan S. Piña (10153621)
Luis Castillo-Ossa (17020684)
Luis Humberto Lopez-Murillo (17020678)
Reinel Tabares-Soto (10153624)
Romain Guyot (166092)
Simon Orozco-Arias (12065342)
Publication venue
Publication date: 21/09/2023
Field of study

(A) Precision-Recall curve with TP (True Positive), TN (True Negative), FP (False Positive) and FN (False Negative) defined on a nucleotide basis. Clusters with a minimum of three domains and a maximum separation of 3, 000 bp. (B) Parity plot for the positions of the beginning of the clusters with a minimum of three domains and a maximum separation of 3, 000 bp. (C) Visualization of the clusters with a minimum of three domains and a maximum separation of 3, 000 bp, in the 50, 000-bp window. The top section corresponds to the predictions by YORO. The bottom section corresponds to the real label. There is a false negative. (D) Visualization of the clusters with a minimum of three domains and a maximum separation of 3, 000 bp, in the 50, 000-bp window. The upper section corresponds to the predictions by YORO. The lower section corresponds to the real label.</p

FigShare