19 research outputs found

    Number of syntactic relations classified by age and label, recovered from each whole graph (hence, including the giant connected component and other smaller networks).

    No full text
    <p>Number of syntactic relations classified by age and label, recovered from each whole graph (hence, including the giant connected component and other smaller networks).</p

    Main features of the networks as a result from the analysis of corpora by means of <i>Netlang</i>.

    No full text
    <p>Analysis of the giant connected component of the graph: (<i>C</i>) Clustering coefficient, Nodes or number of different words, Edges or number of syntactic links, <<i>k></i> average number of edges per node, <i>L</i> or characteristic path length. Age is typically written “<i>years</i>;<i>months</i>.<i>days</i>”, hence the bilingual child is 2 years, 1 month and 20 days old.</p

    Hubs (or highly connected words) of the networks in three temporal periods, at 2.6, 3.1 and 7 years of the child’s life.

    No full text
    <p>Hubs (or highly connected words) of the networks in three temporal periods, at 2.6, 3.1 and 7 years of the child’s life.</p

    Six networks reflecting the twins’ language ontogeny.

    No full text
    <p>In three different periods of their life: at 2 years and 6 months, at 3 years and 1 month and at 7 years. The child MH (files NAM), letters (A), (B) and (C), had a focal lesion.</p

    The bilingual network.

    No full text
    <p>Lexical categories have been customized in order to reflect whether the word is English or Spanish. An additional third color has been selected for proper names. Syntactic relations are also reflected in the network.</p

    Structure of the labels encoding the domain information of the LTR-RT domains.

    No full text
    The initial nucleotide sequence (upper section) is divided into 100 bp sections each one with 22 labels (lower section). Those labels represent the following: (1) Detection and structure information: Presence of a domain, starting position of the domain in the 100 bp section and length; (2) Domain classification: to which domain it is related (in one-hot coding, e.g. GAG = 1,0,0,0,0,0); (3) Lineage classification: to which lineage the domain is related (in one-hot coding, e.g. Tork = 1,0,0,0,0,0,0,0,0,0,0,0,0). Both the domain start position and the length are normalized to be values between 0 and 1. Following this approach, a neural network can learn how to do three different task at once (domain detection, domain classification, and lineage classification).</p

    Tools and approaches that used ML or DL approaches to analyze TEs.

    No full text
    TIR-Learner uses neural network, k-nearest neighbors, random forest, and Adaboost for the ensemble method, while ClassifyTE uses k-nearest neighbors, extra trees, random forest, support vector machine, AdaBoost, logistic regression, Gradient Boosting Classifiers and XGBoost Classifier for the stacking method. Abbreviations: RFSB: Random forest selective binary classifier, C: Classification, D: detection, A: annotation, CL: curation of TE libraries, NI: novel insertions, TU: TransposonUltimate.</p

    Neural network architecture of YORO.

    No full text
    Analysis of eukaryotic genomes requires the detection and classification of transposable elements (TEs), a crucial but complex and time-consuming task. To improve the performance of tools that accomplish these tasks, Machine Learning approaches (ML) that leverage computer resources, such as GPUs (Graphical Processing Unit) and multiple CPU (Central Processing Unit) cores, have been adopted. However, until now, the use of ML techniques has mostly been limited to classification of TEs. Herein, a detection-classification strategy (named YORO) based on convolutional neural networks is adapted from computer vision (YOLO) to genomics. This approach enables the detection of genomic objects through the prediction of the position, length, and classification in large DNA sequences such as fully sequenced genomes. As a proof of concept, the internal protein-coding domains of LTR-retrotransposons are used to train the proposed neural network. Precision, recall, accuracy, F1-score, execution times and time ratios, as well as several graphical representations were used as metrics to measure performance. These promising results open the door for a new generation of Deep Learning tools for genomics. YORO architecture is available at https://github.com/simonorozcoarias/YORO.</div

    YORO’s performance in detecting LTR-retrotransposon’s internal part grouped by at least three domains with a maximum distance of 3, 000 bp, according the Genomic Object Detection approach.

    No full text
    (A) Precision-Recall curve with TP (True Positive), TN (True Negative), FP (False Positive) and FN (False Negative) defined on a nucleotide basis. Clusters with a minimum of three domains and a maximum separation of 3, 000 bp. (B) Parity plot for the positions of the beginning of the clusters with a minimum of three domains and a maximum separation of 3, 000 bp. (C) Visualization of the clusters with a minimum of three domains and a maximum separation of 3, 000 bp, in the 50, 000-bp window. The top section corresponds to the predictions by YORO. The bottom section corresponds to the real label. There is a false negative. (D) Visualization of the clusters with a minimum of three domains and a maximum separation of 3, 000 bp, in the 50, 000-bp window. The upper section corresponds to the predictions by YORO. The lower section corresponds to the real label.</p
    corecore