32 research outputs found

    The blind leading the blind? Filling the knowledge gap by student peer assessment

    Get PDF
    Academic presentation at the MNT conference, 16.03. - 17.03.23, Stavanger, Norway. The conference was arranged by the University of Stavanger. https://realfagsrekruttering.no/konferanser/mnt-konferansen-2023#om-konferansen.Prior knowledge in certain mathematical topics is essential for fundamental understanding of most STEM subjects, and closing the gap from secondary education is a prerequisite for success. The teacher's dilemma in higher education is the time dedicated to teaching secondary level maths versus the course's actual curriculum. To close the knowledge gap for a group of PhD students at the Faculty of Health Sciences at UiT, they were given a set of exercises to solve at home, and then did peer assessment in groups in the classroom. This contribution presents pros and cons of active learning in the form of cooperative formative peer assessment exemplified in a two-hour math seminar. Although both students and teaching staff were positive, there are several risks to be considered. In conclusion, the math seminar succeeded in time-efficient assessment, but the final quality control is missing. The described balance between resources and quality can hopefully spark discussion within the framework of everyday higher education

    The 99% accuracy club

    Get PDF
    Melanoma Classification - a 10,000competition.Forthe2020MelanomaClassificationcompetitionhostedbykaggle,33,126imagesweremadeavailablefortraining(ofwhich2 competition. For the 2020 Melanoma Classification competition hosted by kaggle, 33,126 images were made available for training (of which 2% were melanomas), and an additional 10,982 were used for final ranking of the 3,308 teams who entered the competition with eyes on the 10,000 prize. The task was simple: provide a probability for melanoma (deadly skin cancer) for each image. The ranking was based on the area under the ROC curve (AUC). Up until the deadline, contestants could submit their training results for an intermediate ranking. And the prize goes to... A team of three kaggle grandmasters ran away with the first prize with an AUC of 0.9490. Their intermediate ranking was 881st - not even in the top 25%. The dynamics between intermediate and final ranking is easily explained by overfitting - the real enigma is how come computer scientists seemingly never learn

    Replication study: Development and validation of deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs

    Get PDF
    Replication studies are essential for validation of new methods, and are crucial to maintain the high standards of scientific publications, and to use the results in practice. We have attempted to replicate the main method in 'Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs' published in JAMA 2016; 316(22). We re-implemented the method since the source code is not available, and we used publicly available data sets. The original study used non-public fundus images from EyePACS and three hospitals in India for training. We used a different EyePACS data set from Kaggle. The original study used the benchmark data set Messidor-2 to evaluate the algorithm's performance. We used the same data set. In the original study, ophthalmologists re-graded all images for diabetic retinopathy, macular edema, and image gradability. There was one diabetic retinopathy grade per image for our data sets, and we assessed image gradability ourselves. Hyper-parameter settings were not described in the original study. But some of these were later published. We were not able to replicate the original study. Our algorithm's area under the receiver operating curve (AUC) of 0.94 on the Kaggle EyePACS test set and 0.80 on Messidor-2 did not come close to the reported AUC of 0.99 in the original study. This may be caused by the use of a single grade per image, different data, or different not described hyper-parameter settings. This study shows the challenges of replicating deep learning, and the need for more replication studies to validate deep learning methods, especially for medical image analysis. Our source code and instructions are available at: https://github.com/mikevoets/jama16-retina-replicationComment: The third version of this paper includes results from replication after certain hyper-parameters were published in later article. 16 pages, 6 figures, 1 table, presented at NOBIM 201

    What is the state of the art? Accounting for multiplicity in machine learning benchmark performance

    Full text link
    Machine learning methods are commonly evaluated and compared by their performance on data sets from public repositories. This allows for multiple methods, oftentimes several thousands, to be evaluated under identical conditions and across time. The highest ranked performance on a problem is referred to as state-of-the-art (SOTA) performance, and is used, among other things, as a reference point for publication of new methods. Using the highest-ranked performance as an estimate for SOTA is a biased estimator, giving overly optimistic results. The mechanisms at play are those of multiplicity, a topic that is well-studied in the context of multiple comparisons and multiple testing, but has, as far as the authors are aware of, been nearly absent from the discussion regarding SOTA estimates. The optimistic state-of-the-art estimate is used as a standard for evaluating new methods, and methods with substantial inferior results are easily overlooked. In this article, we provide a probability distribution for the case of multiple classifiers so that known analyses methods can be engaged and a better SOTA estimate can be provided. We demonstrate the impact of multiplicity through a simulated example with independent classifiers. We show how classifier dependency impacts the variance, but also that the impact is limited when the accuracy is high. Finally, we discuss a real-world example; a Kaggle competition from 2020

    Cancer detection for white urban Americans

    Get PDF
    Poster presentation at the NORA Annual Conference 2023 05.06. - 06.06.23, Tromsø, Norway.Development, validation and comparison of machine learning methods require access to data, sometimes lots of data. Within health applications, data sharing can be restricted due to patient privacy, and the few publicly available data sets become even more valuable for the machine learning community. One such type of data are H&E whole slide images (WSI), which are stained tumour tissue, used in hospitals to detect and classify cancer, see Fig. 1. The Cancer Genome Atlas (TCGA) has made an enormous contribution to publicly available data sets. For breast cancer H&E WSI they are by far the largest data set, with more than 1,000 patients, twice as many as the second largest contributor, the two Camelyon competition data sets [1] with 399 + 200 patients

    On Data-Independent Properties for Density-Based Dissimilarity Measures in Hybrid Clustering

    Get PDF
    Hybrid clustering combines partitional and hierarchical clustering for computational effectiveness and versatility in cluster shape. In such clustering, a dissimilarity measure plays a crucial role in the hierarchical merging. The dissimilarity measure has great impact on the final clustering, and data-independent properties are needed to choose the right dissimilarity measure for the problem at hand. Properties for distance-based dissimilarity measures have been studied for decades, but properties for density-based dissimilarity measures have so far received little attention. Here, we propose six data-independent properties to evaluate density-based dissimilarity measures associated with hybrid clustering, regarding equality, orthogonality, symmetry, outlier and noise observations, and light-tailed models for heavy-tailed clusters. The significance of the properties is investigated, and we study some well-known dissimilarity measures based on Shannon entropy, misclassification rate, Bhattacharyya distance and Kullback-Leibler divergence with respect to the proposed properties. As none of them satisfy all the proposed properties, we introduce a new dissimilarity measure based on the Kullback-Leibler information and show that it satisfies all proposed properties. The effect of the proposed properties is also illustrated on several real and simulated data sets

    Computer-Aided Decision Support for Melanoma Detection Applied on Melanocytic and Nonmelanocytic Skin Lesions: A Comparison of Two Systems Based on Automatic Analysis of Dermoscopic Images

    Get PDF
    Commercially available clinical decision support systems (CDSSs) for skin cancer have been designed for the detection of melanoma only. Correct use of the systems requires expert knowledge, hampering their utility for nonexperts. Furthermore, there are no systems to detect other common skin cancer types, that is, nonmelanoma skin cancer (NMSC). As early diagnosis of skin cancer is essential, there is a need for a CDSS that is applicable to all types of skin lesions and is suitable for nonexperts. Nevus Doctor (ND) is a CDSS being developed by the authors. We here investigate ND's ability to detect both melanoma and NMSC and the opportunities for improvement. An independent test set of dermoscopic images of 870 skin lesions, including 44 melanomas and 101 NMSCs, were analysed by ND. Its sensitivity to melanoma and NMSC was compared to that of Mole Expert (ME), a commercially available CDSS, using the same set of lesions. ND and ME had similar sensitivity to melanoma. For ND at 95 percent melanoma sensitivity, the NMSC sensitivity was 100 percent, and the specificity was 12 percent. The melanomas misclassified by ND at 95 percent sensitivity were correctly classified by ME, and vice versa. ND is able to detect NMSC without sacrificing melanoma sensitivity
    corecore