12 research outputs found

    Meaning of each feature of the dataset.

    No full text
    We reported a detailed description of each feature in the Supplementary Information.</p

    Results of the computational predictions of patient diagnosis on the complete dataset.

    No full text
    Matthews correlation coefficient (MCC): Eq 3. Accuracy: Eq 1. F1 score: Eq 4. Sensitivity (true positive rate): Eq 5. Specificity (true negative rate): Eq 6. The scores are the medians of the results’ ten separate program executions. We report the results of the application of the methods on all the dataset features, plus the results of the decision tree only to the two selected features: the row entitled “Decision tree (applied only to lung side & platelet count)”. Dataset imbalance: 29.63% positive data instances (all the 96 mesothelioma patients), and 70.37% negative data instances (all the 228 non-mesothelioma patients).</p

    Strip plot of platelet count (PLT) by lung side.

    No full text
    We exclude one outlier on the X axis with 3,335 platelet/microliter. Vertical blue dotted line: lower boundary of the platelet count normality test.</p

    Dataset features with ranges and measurement units.

    No full text
    We removed “diagnosis method” from the classification and feature selection phases, because it has the same values of “class of diagnosis” target we predict. We changed some feature names to add clarity: “blood lactic dehydrogenise (LDH)” into “lactate dehydrogenase test”, “cell count (WBC)” into “white blood cells (WBC)”, “cytology” into “cytology exam of pleural fluid”, “hemoglobin (HGB)” into “hemoglobin normality test”, “keep side” into “lung side”, “pleural glucose” into “pleural fluid glucose”, and “white blood” into “pleural fluid WBC count”.</p

    Architecture of a multi-layer perceptron-based neural network.

    No full text
    In our model, the input layer neurons are 33. We found different optimized numbers of hidden layers and hidden units, for each program execution. The top architecture among the ten executions had 20 hidden units and 1 hidden layer.</p

    Results of the computational predictions of patient diagnosis, after under-sampling.

    No full text
    Matthews correlation coefficient (MCC): Eq 3. Accuracy: Eq 1. F1 score: Eq 4. Sensitivity (true positive rate): Eq 5. Specificity (true negative rate): Eq 6. The scores are the medians of the results’ ten separate program executions, run with different subset content selected randomly for training set, validation set, and test set every time. We report the results of the application of the methods on all the dataset features, plus the results of the decision tree only to the two selected features: the row entitled “Decision tree (applied only to lung side & platelet count)”. Dataset balance: 50% positive data instances (all the 96 mesothelioma patients), and 50% negative data instances (96 non-mesothelioma patients, randomly selected). Perceptron: learning rate = 0.1.</p

    Decision tree.

    No full text
    An example of decision tree, which can classify each patient as healthy (non-mesothelioma) or unhealthy (mesothelioma). Random forest generates a set of predictive decision trees.</p

    Architecture of the probabilistic neural network.

    No full text
    In our model, there are 33 neurons in the input layer, 33 neurons in the pattern layer, and 2 neurons in the summation layer.</p

    Gini impurity decreases of each random forest tree node.

    No full text
    Random forest feature selection rely on bootstrap aggregation (bagging), and therefore does not have training set, validation set, and test set [69]. The bars represent the importance of each feature, measured through the sum of all the Gini impurity index decreases for each specific feature [39] (Methods).</p

    Merged rank of features.

    No full text
    We sorted the features by combining ranking of the node impurity and the ranking of the percentage of MSE decrease in accuracy (Methods).</p
    corecore