13 research outputs found

    Comparative overview of the evaluation results – with and without domain-specific dictionaries.

    No full text
    <p>This comparative overview shows the difference in performance between all aggregation techniques. We can see that this difference is of almost 1% between the best performing individual classifier and the best aggregation technique – the voting mechanism.</p

    Evaluation results for the voting aggregation technique – with and without domain-specific dictionaries.

    No full text
    <p>The results of the voting method are in line with the rest of the aggregation methods. The highest score (71.52%/71.06%) is achieved by using MALLET as veto owner.</p

    Evaluation results for the simple set aggregation technique – with and without domain-specific dictionaries.

    No full text
    <p>The best scoring direct set operations are those that include MALLET in their composition, which is in line with the individual classification results. The italicised results demonstrate the effect of the set operations: union increases the recall with almost 13%, while intersection increases the precision with around 12%.</p

    Evaluation results for MALLET ten-fold cross validation with leave-one-out feature.

    No full text
    <p>This overview shows the individual importance of each of the features in the overall classification model. The large majority of features have very little impact over the model, i.e., a decrease in performance of 1–2%. The only two features that make a difference are the Prefix and the token context (Token_Bi3) that affect the overall performance with almost 15%.</p

    Evaluation results for MALLET ten-fold cross validation using single features.

    No full text
    <p>The graph groups the features according to the categories used to describe them in the Materials and Methods section. We can observe that the simple and morphological features perform the best, with the Prefix feature achieving an F-1 score of 66.22%. Among the token context features, the token bigrams with a window of 3 provides the best configuration (almost 30% F-1). Dictionary-based features, both generic and domain-specific, have a poor performance, which is associated with their lack of discriminative power.</p

    Evaluation results for individual classifiers – with and without domain-specific dictionaries.

    No full text
    <p>We can see that MALLET constantly outperforms all the other approaches, with a margin of almost 5% without using dictionaries and almost 3% when using domain-specific dictionaries. The surprising aspect is the decrease in performance when using dictionaries as opposed to the setting that omits them.</p

    Evaluation results for the paired set aggregation technique – with and without domain-specific dictionaries.

    No full text
    <p>Evaluation results for the paired set aggregation technique – with and without domain-specific dictionaries.</p

    Statistics of the phenotype descriptions corpus.

    No full text
    <p>The corpus used for training the classifiers has been manually compiled from 395 random publications from three different academic journals. It consists of 1,194 image captions that describe 5,423 phenotype descriptions. The total number of tokens in the corpus is 64,052, with an average of 5 tokens per phenotype description. The longest phenotype description comprises 31 tokens, while the shortest consists of only one token.</p

    Relative distribution of dysplasia diagnoses according to different ranges of number of cases.

    No full text
    <p>More than 70% of the bone dysplasias present in the ESDN dataset have a very small number of cases (up to 5), while those that are well represented (i.e., over 50 cases) represent a mere fraction of the total number –4%.</p

    Experimental results: Overall comparative accuracy across all considered approaches.

    No full text
    <p>Our solution outperforms the five Machine Learning approaches we have considered within our experiments: around 4% more accuracy than Naive Bayes and around 6% more accuracy than SVM. Although Naive Bayes has performed very well, its results are boosted by overfitting the classes that had more data (e.g., ) at the expense of others, such as for which it achieved 0 precision and recall. Unlike Naive Bayes, our approach has performed fairly uniform and consistent across all classes.</p
    corecore