Search CORE

13 research outputs found

Comparative overview of the evaluation results – with and without domain-specific dictionaries.

Author: Andreas Zankl (60186)
Jane Hunter (377001)
Tudor Groza (115516)
Publication venue
Publication date
Field of study

This comparative overview shows the difference in performance between all aggregation techniques. We can see that this difference is of almost 1% between the best performing individual classifier and the best aggregation technique – the voting mechanism.</p

FigShare

Evaluation results for the voting aggregation technique – with and without domain-specific dictionaries.

Author: Andreas Zankl (60186)
Jane Hunter (377001)
Tudor Groza (115516)
Publication venue
Publication date
Field of study

The results of the voting method are in line with the rest of the aggregation methods. The highest score (71.52%/71.06%) is achieved by using MALLET as veto owner.</p

FigShare

Evaluation results for the simple set aggregation technique – with and without domain-specific dictionaries.

Author: Andreas Zankl (60186)
Jane Hunter (377001)
Tudor Groza (115516)
Publication venue
Publication date
Field of study

The best scoring direct set operations are those that include MALLET in their composition, which is in line with the individual classification results. The italicised results demonstrate the effect of the set operations: union increases the recall with almost 13%, while intersection increases the precision with around 12%.</p

FigShare

Evaluation results for MALLET ten-fold cross validation with leave-one-out feature.

Author: Andreas Zankl (60186)
Jane Hunter (377001)
Tudor Groza (115516)
Publication venue
Publication date
Field of study

This overview shows the individual importance of each of the features in the overall classification model. The large majority of features have very little impact over the model, i.e., a decrease in performance of 1–2%. The only two features that make a difference are the Prefix and the token context (Token_Bi3) that affect the overall performance with almost 15%.</p

FigShare

Evaluation results for MALLET ten-fold cross validation using single features.

Author: Andreas Zankl (60186)
Jane Hunter (377001)
Tudor Groza (115516)
Publication venue
Publication date
Field of study

The graph groups the features according to the categories used to describe them in the Materials and Methods section. We can observe that the simple and morphological features perform the best, with the Prefix feature achieving an F-1 score of 66.22%. Among the token context features, the token bigrams with a window of 3 provides the best configuration (almost 30% F-1). Dictionary-based features, both generic and domain-specific, have a poor performance, which is associated with their lack of discriminative power.</p

FigShare

Evaluation results for individual classifiers – with and without domain-specific dictionaries.

Author: Andreas Zankl (60186)
Jane Hunter (377001)
Tudor Groza (115516)
Publication venue
Publication date
Field of study

We can see that MALLET constantly outperforms all the other approaches, with a margin of almost 5% without using dictionaries and almost 3% when using domain-specific dictionaries. The surprising aspect is the decrease in performance when using dictionaries as opposed to the setting that omits them.</p

FigShare

Evaluation results for the paired set aggregation technique – with and without domain-specific dictionaries.

Author: Andreas Zankl (60186)
Jane Hunter (377001)
Tudor Groza (115516)
Publication venue
Publication date
Field of study

Evaluation results for the paired set aggregation technique – with and without domain-specific dictionaries.</p

FigShare

Statistics of the phenotype descriptions corpus.

Author: Andreas Zankl (60186)
Jane Hunter (377001)
Tudor Groza (115516)
Publication venue
Publication date
Field of study

The corpus used for training the classifiers has been manually compiled from 395 random publications from three different academic journals. It consists of 1,194 image captions that describe 5,423 phenotype descriptions. The total number of tokens in the corpus is 64,052, with an average of 5 tokens per phenotype description. The longest phenotype description comprises 31 tokens, while the shortest consists of only one token.</p

FigShare

Relative distribution of dysplasia diagnoses according to different ranges of number of cases.

Author: Andreas Zankl (60186)
Jane Hunter (377001)
Razan Paul (115513)
Tudor Groza (115516)
Publication venue
Publication date
Field of study

More than 70% of the bone dysplasias present in the ESDN dataset have a very small number of cases (up to 5), while those that are well represented (i.e., over 50 cases) represent a mere fraction of the total number –4%.</p

FigShare

Experimental results: Overall comparative accuracy across all considered approaches.

Author: Andreas Zankl (60186)
Jane Hunter (377001)
Razan Paul (115513)
Tudor Groza (115516)
Publication venue
Publication date
Field of study

Our solution outperforms the five Machine Learning approaches we have considered within our experiments: around 4% more accuracy than Naive Bayes and around 6% more accuracy than SVM. Although Naive Bayes has performed very well, its results are boosted by overfitting the classes that had more data (e.g., ) at the expense of others, such as for which it achieved 0 precision and recall. Unlike Naive Bayes, our approach has performed fairly uniform and consistent across all classes.</p

FigShare