32 research outputs found
The blind leading the blind? Filling the knowledge gap by student peer assessment
Academic presentation at the MNT conference, 16.03. - 17.03.23, Stavanger, Norway. The conference was arranged by the University of Stavanger.
https://realfagsrekruttering.no/konferanser/mnt-konferansen-2023#om-konferansen.Prior knowledge in certain mathematical topics is essential for fundamental understanding of most STEM subjects, and closing the gap from secondary education is a prerequisite for success. The teacher's dilemma in higher education is the time dedicated to teaching secondary level maths versus the course's actual curriculum. To close the knowledge gap for a group of PhD students at the Faculty of Health Sciences at UiT, they were given a set of exercises to solve at home, and then did peer assessment in groups in the classroom.
This contribution presents pros and cons of active learning in the form of cooperative formative peer assessment exemplified in a two-hour math seminar. Although both students and teaching staff were positive, there are several risks to be considered. In conclusion, the math seminar succeeded in time-efficient assessment, but the final quality control is missing. The described balance between resources and quality can hopefully spark discussion within the framework of everyday higher education
The 99% accuracy club
Melanoma Classification - a 10,000 prize. The task was simple: provide a probability for melanoma (deadly skin cancer) for each image. The ranking was based on the area under the ROC curve (AUC). Up until the deadline, contestants could submit their training results for an intermediate ranking.
And the prize goes to... A team of three kaggle grandmasters ran away with the first prize with an AUC of 0.9490. Their intermediate ranking was 881st - not even in the top 25%. The dynamics between intermediate and final ranking is easily explained by overfitting - the real enigma is how come computer scientists seemingly never learn
Replication study: Development and validation of deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs
Replication studies are essential for validation of new methods, and are
crucial to maintain the high standards of scientific publications, and to use
the results in practice. We have attempted to replicate the main method in
'Development and validation of a deep learning algorithm for detection of
diabetic retinopathy in retinal fundus photographs' published in JAMA 2016;
316(22). We re-implemented the method since the source code is not available,
and we used publicly available data sets. The original study used non-public
fundus images from EyePACS and three hospitals in India for training. We used a
different EyePACS data set from Kaggle. The original study used the benchmark
data set Messidor-2 to evaluate the algorithm's performance. We used the same
data set. In the original study, ophthalmologists re-graded all images for
diabetic retinopathy, macular edema, and image gradability. There was one
diabetic retinopathy grade per image for our data sets, and we assessed image
gradability ourselves. Hyper-parameter settings were not described in the
original study. But some of these were later published. We were not able to
replicate the original study. Our algorithm's area under the receiver operating
curve (AUC) of 0.94 on the Kaggle EyePACS test set and 0.80 on Messidor-2 did
not come close to the reported AUC of 0.99 in the original study. This may be
caused by the use of a single grade per image, different data, or different not
described hyper-parameter settings. This study shows the challenges of
replicating deep learning, and the need for more replication studies to
validate deep learning methods, especially for medical image analysis.
Our source code and instructions are available at:
https://github.com/mikevoets/jama16-retina-replicationComment: The third version of this paper includes results from replication
after certain hyper-parameters were published in later article. 16 pages, 6
figures, 1 table, presented at NOBIM 201
What is the state of the art? Accounting for multiplicity in machine learning benchmark performance
Machine learning methods are commonly evaluated and compared by their
performance on data sets from public repositories. This allows for multiple
methods, oftentimes several thousands, to be evaluated under identical
conditions and across time. The highest ranked performance on a problem is
referred to as state-of-the-art (SOTA) performance, and is used, among other
things, as a reference point for publication of new methods. Using the
highest-ranked performance as an estimate for SOTA is a biased estimator,
giving overly optimistic results. The mechanisms at play are those of
multiplicity, a topic that is well-studied in the context of multiple
comparisons and multiple testing, but has, as far as the authors are aware of,
been nearly absent from the discussion regarding SOTA estimates. The optimistic
state-of-the-art estimate is used as a standard for evaluating new methods, and
methods with substantial inferior results are easily overlooked. In this
article, we provide a probability distribution for the case of multiple
classifiers so that known analyses methods can be engaged and a better SOTA
estimate can be provided. We demonstrate the impact of multiplicity through a
simulated example with independent classifiers. We show how classifier
dependency impacts the variance, but also that the impact is limited when the
accuracy is high. Finally, we discuss a real-world example; a Kaggle
competition from 2020
Cancer detection for white urban Americans
Poster presentation at the NORA Annual Conference 2023 05.06. - 06.06.23, Tromsø, Norway.Development, validation and comparison of machine learning methods require access to
data, sometimes lots of data. Within health applications, data sharing can be restricted due to patient
privacy, and the few publicly available data sets become even more valuable for the machine learning community. One such type of data are H&E whole slide images (WSI), which are stained tumour tissue, used
in hospitals to detect and classify cancer, see Fig. 1. The Cancer Genome Atlas (TCGA) has made an enormous contribution to publicly available data sets. For breast cancer H&E WSI they are by far the
largest data set, with more than 1,000 patients, twice as many as the second largest contributor, the two
Camelyon competition data sets [1] with 399 + 200 patients
On Data-Independent Properties for Density-Based Dissimilarity Measures in Hybrid Clustering
Hybrid clustering combines partitional and hierarchical clustering for
computational effectiveness and versatility in cluster shape. In such
clustering, a dissimilarity measure plays a crucial role in the hierarchical
merging. The dissimilarity measure has great impact on the final clustering,
and data-independent properties are needed to choose the right dissimilarity
measure for the problem at hand. Properties for distance-based dissimilarity
measures have been studied for decades, but properties for density-based
dissimilarity measures have so far received little attention. Here, we propose
six data-independent properties to evaluate density-based dissimilarity
measures associated with hybrid clustering, regarding equality, orthogonality,
symmetry, outlier and noise observations, and light-tailed models for
heavy-tailed clusters. The significance of the properties is investigated, and
we study some well-known dissimilarity measures based on Shannon entropy,
misclassification rate, Bhattacharyya distance and Kullback-Leibler divergence
with respect to the proposed properties. As none of them satisfy all the
proposed properties, we introduce a new dissimilarity measure based on the
Kullback-Leibler information and show that it satisfies all proposed
properties. The effect of the proposed properties is also illustrated on
several real and simulated data sets
Computer-Aided Decision Support for Melanoma Detection Applied on Melanocytic and Nonmelanocytic Skin Lesions: A Comparison of Two Systems Based on Automatic Analysis of Dermoscopic Images
Commercially available clinical decision support systems (CDSSs) for skin
cancer have been designed for the detection of melanoma only. Correct use of
the systems requires expert knowledge, hampering their utility for nonexperts.
Furthermore, there are no systems to detect other common skin cancer types,
that is, nonmelanoma skin cancer (NMSC). As early diagnosis of skin cancer is
essential, there is a need for a CDSS that is applicable to all types of skin
lesions and is suitable for nonexperts. Nevus Doctor (ND) is a CDSS being
developed by the authors. We here investigate ND's ability to detect both
melanoma and NMSC and the opportunities for improvement. An independent test
set of dermoscopic images of 870 skin lesions, including 44 melanomas and 101
NMSCs, were analysed by ND. Its sensitivity to melanoma and NMSC was compared
to that of Mole Expert (ME), a commercially available CDSS, using the same set
of lesions. ND and ME had similar sensitivity to melanoma. For ND at 95 percent
melanoma sensitivity, the NMSC sensitivity was 100 percent, and the specificity
was 12 percent. The melanomas misclassified by ND at 95 percent sensitivity
were correctly classified by ME, and vice versa. ND is able to detect NMSC
without sacrificing melanoma sensitivity
Effect of Codend Design and Postponed Bleeding on Hemoglobin in Cod Fillets Caught by Bottom Trawl in the Barents Sea Demersal Fishery
publishedVersio