15 research outputs found
A New Suite of Statistical Algorithms for Bayesian Model Fitting with Both Intrinsic and Extrinsic Uncertainties in Two Dimensions
Fitting a statistical model to data is one of the most important tools in any scientific or data-driven field, and rigorously fitting a two dimensional statistical model to data that has intrinsic uncertainties (error bars) in both the independent variable and the dependent variable is a daunting task, especially if the data also has extrinsic uncertainty (sample variance) that cannot be fully accounted for by the error bars. Here, I introduce a novel statistic (described as the Trotter, Reichart, Konz statistic, or TRK) developed in Trotter (2011) that is advantageous towards model-fitting in this "worst-case data" scenario, especially when compared to other methods. I implemented this statistic as a suite of fitting algorithms in C++ that comes equipped with many capabilities, including: support for any nonlinear model; probability distribution generation, correlation removal and custom priors for model parameters; asymmetric uncertainties in the data and/or model, and more. I also built an end-to-end website through which the algorithm can be used easily, but generally, with a high degree of customizability. The statistic is applicable to practically any data-driven field, and I show a few examples of its usage within the realm of astronomy. This thesis along with Trotter (2011) form the foundations for Trotter, Daniel E. Reichart, and Konz (2020), in preparation. The TRK source code and web-based calculator can be found at https://github.com/nickk124/TRK and https://skynet.unc.edu/rcr/calculator/trk, respectively.Bachelor of Scienc
The Effect of Intrinsic Dataset Properties on Generalization: Unraveling Learning Differences Between Natural and Medical Images
This paper investigates discrepancies in how neural networks learn from
different imaging domains, which are commonly overlooked when adopting computer
vision techniques from the domain of natural images to other specialized
domains such as medical images. Recent works have found that the generalization
error of a trained network typically increases with the intrinsic dimension
() of its training set. Yet, the steepness of this relationship
varies significantly between medical (radiological) and natural imaging
domains, with no existing theoretical explanation. We address this gap in
knowledge by establishing and empirically validating a generalization scaling
law with respect to , and propose that the substantial scaling
discrepancy between the two considered domains may be at least partially
attributed to the higher intrinsic ``label sharpness'' () of
medical imaging datasets, a metric which we propose. Next, we demonstrate an
additional benefit of measuring the label sharpness of a training set: it is
negatively correlated with the trained model's adversarial robustness, which
notably leads to models for medical images having a substantially higher
vulnerability to adversarial attack. Finally, we extend our
formalism to the related metric of learned representation intrinsic dimension
(), derive a generalization scaling law with respect to ,
and show that serves as an upper bound for . Our
theoretical results are supported by thorough experiments with six models and
eleven natural and medical imaging datasets over a range of training set sizes.
Our findings offer insights into the influence of intrinsic dataset properties
on generalization, representation learning, and robustness in deep neural
networks. Code link: https://github.com/mazurowski-lab/intrinsic-propertiesComment: ICLR 2024. Code:
https://github.com/mazurowski-lab/intrinsic-propertie
COMPARATIVE BIOMECHANICAL ANALYSIS OF A FEMALE HAMMER THROW ATHLETE FOR BACK-TO-BACK AMERICAN RECORD YEARS
Hammer athletes must optimize performance variables to maximize their official distance. Analysis of key performance variables might explain how the subject improved an American record year in 2018 to another record in 2019. A 3-D analysis was performed on trial videos from 2018 and 2019. Release height, release velocity, release angle, and hip-shoulder separation were compared among years and throws, and their relationship with official distance was assessed. Release height (p \u3c 0.01) and release angle (p \u3c 0.01) were more consistent in 2019 than 2018. The relationships among official distance, release height (p = 0.06), and hip-shoulder separation (p = 0.04) were different between years. The efficient use of hip-shoulder separation could be responsible for the increase in official distance between years
A systematic study of the foreground-background imbalance problem in deep learning for object detection
The class imbalance problem in deep learning has been explored in several
studies, but there has yet to be a systematic analysis of this phenomenon in
object detection. Here, we present comprehensive analyses and experiments of
the foreground-background (F-B) imbalance problem in object detection, which is
very common and caused by small, infrequent objects of interest. We
experimentally study the effects of different aspects of F-B imbalance (object
size, number of objects, dataset size, object type) on detection performance.
In addition, we also compare 9 leading methods for addressing this problem,
including Faster-RCNN, SSD, OHEM, Libra-RCNN, Focal-Loss, GHM, PISA, YOLO-v3,
and GFL with a range of datasets from different imaging domains. We conclude
that (1) the F-B imbalance can indeed cause a significant drop in detection
performance, (2) The detection performance is more affected by F-B imbalance
when fewer training data are available, (3) in most cases, decreasing object
size leads to larger performance drop than decreasing number of objects, given
the same change in the ratio of object pixels to non-object pixels, (6) among
all selected methods, Libra-RCNN and PISA demonstrate the best performance in
addressing the issue of F-B imbalance. (7) When the training dataset size is
large, the choice of method is not impactful (8) Soft-sampling methods,
including focal-loss, GHM, and GFL, perform fairly well on average but are
relatively unstable
Deep Learning for Breast MRI Style Transfer with Limited Training Data
In this work we introduce a novel medical image style transfer method,
StyleMapper, that can transfer medical scans to an unseen style with access to
limited training data. This is made possible by training our model on unlimited
possibilities of simulated random medical imaging styles on the training set,
making our work more computationally efficient when compared with other style
transfer methods. Moreover, our method enables arbitrary style transfer:
transferring images to styles unseen in training. This is useful for medical
imaging, where images are acquired using different protocols and different
scanner models, resulting in a variety of styles that data may need to be
transferred between. Methods: Our model disentangles image content from style
and can modify an image's style by simply replacing the style encoding with one
extracted from a single image of the target style, with no additional
optimization required. This also allows the model to distinguish between
different styles of images, including among those that were unseen in training.
We propose a formal description of the proposed model. Results: Experimental
results on breast magnetic resonance images indicate the effectiveness of our
method for style transfer. Conclusion: Our style transfer method allows for the
alignment of medical images taken with different scanners into a single unified
style dataset, allowing for the training of other downstream tasks on such a
dataset for tasks such as classification, object detection and others.Comment: preprint version, accepted in the Journal of Digital Imaging (JDIM).
16 pages (+ author names + references + supplementary), 6 figure
Medical Image Segmentation with InTEnt: Integrated Entropy Weighting for Single Image Test-Time Adaptation
Test-time adaptation (TTA) refers to adapting a trained model to a new domain
during testing. Existing TTA techniques rely on having multiple test images
from the same domain, yet this may be impractical in real-world applications
such as medical imaging, where data acquisition is expensive and imaging
conditions vary frequently. Here, we approach such a task, of adapting a
medical image segmentation model with only a single unlabeled test image. Most
TTA approaches, which directly minimize the entropy of predictions, fail to
improve performance significantly in this setting, in which we also observe the
choice of batch normalization (BN) layer statistics to be a highly important
yet unstable factor due to only having a single test domain example. To
overcome this, we propose to instead integrate over predictions made with
various estimates of target domain statistics between the training and test
statistics, weighted based on their entropy statistics. Our method, validated
on 24 source/target domain splits across 3 medical image datasets surpasses the
leading method by 2.9% Dice coefficient on average.Comment: Code and pre-trained weights:
https://github.com/mazurowski-lab/single-image-test-time-adaptatio
The Intrinsic Manifolds of Radiological Images and their Role in Deep Learning
The manifold hypothesis is a core mechanism behind the success of deep
learning, so understanding the intrinsic manifold structure of image data is
central to studying how neural networks learn from the data. Intrinsic dataset
manifolds and their relationship to learning difficulty have recently begun to
be studied for the common domain of natural images, but little such research
has been attempted for radiological images. We address this here. First, we
compare the intrinsic manifold dimensionality of radiological and natural
images. We also investigate the relationship between intrinsic dimensionality
and generalization ability over a wide range of datasets. Our analysis shows
that natural image datasets generally have a higher number of intrinsic
dimensions than radiological images. However, the relationship between
generalization ability and intrinsic dimensionality is much stronger for
medical images, which could be explained as radiological images having
intrinsic features that are more difficult to learn. These results give a more
principled underpinning for the intuition that radiological images can be more
challenging to apply deep learning to than natural image datasets common to
machine learning research. We believe rather than directly applying models
developed for natural images to the radiological imaging domain, more care
should be taken to developing architectures and algorithms that are more
tailored to the specific characteristics of this domain. The research shown in
our paper, demonstrating these characteristics and the differences from natural
images, is an important first step in this direction.Comment: preprint version, accepted for MICCAI 2022 (25th International
Conference on Medical Image Computing and Computer Assisted Intervention). 8
pages (+ author names + references + supplementary), 4 figures. Code
available at https://github.com/mazurowski-lab/radiologyintrinsicmanifold
Understanding the Inner Workings of Language Models Through Representation Dissimilarity
As language models are applied to an increasing number of real-world
applications, understanding their inner workings has become an important issue
in model trust, interpretability, and transparency. In this work we show that
representation dissimilarity measures, which are functions that measure the
extent to which two model's internal representations differ, can be a valuable
tool for gaining insight into the mechanics of language models. Among our
insights are: (i) an apparent asymmetry in the internal representations of
model using SoLU and GeLU activation functions, (ii) evidence that
dissimilarity measures can identify and locate generalization properties of
models that are invisible via in-distribution test set performance, and (iii)
new evaluations of how language model features vary as width and depth are
increased. Our results suggest that dissimilarity measures are a promising set
of tools for shedding light on the inner workings of language models.Comment: EMNLP 2023 (main
Rethinking Perceptual Metrics for Medical Image Translation
Modern medical image translation methods use generative models for tasks such
as the conversion of CT images to MRI. Evaluating these methods typically
relies on some chosen downstream task in the target domain, such as
segmentation. On the other hand, task-agnostic metrics are attractive, such as
the network feature-based perceptual metrics (e.g., FID) that are common to
image translation in general computer vision. In this paper, we investigate
evaluation metrics for medical image translation on two medical image
translation tasks (GE breast MRI to Siemens breast MRI and lumbar spine MRI to
CT), tested on various state-of-the-art translation methods. We show that
perceptual metrics do not generally correlate with segmentation metrics due to
them extending poorly to the anatomical constraints of this sub-field, with FID
being especially inconsistent. However, we find that the lesser-used
pixel-level SWD metric may be useful for subtle intra-modality translation. Our
results demonstrate the need for further research into helpful metrics for
medical image translation
Attributing Learned Concepts in Neural Networks to Training Data
By now there is substantial evidence that deep learning models learn certain
human-interpretable features as part of their internal representations of data.
As having the right (or wrong) concepts is critical to trustworthy machine
learning systems, it is natural to ask which inputs from the model's original
training set were most important for learning a concept at a given layer. To
answer this, we combine data attribution methods with methods for probing the
concepts learned by a model. Training network and probe ensembles for two
concept datasets on a range of network layers, we use the recently developed
TRAK method for large-scale data attribution. We find some evidence for
convergence, where removing the 10,000 top attributing images for a concept and
retraining the model does not change the location of the concept in the network
nor the probing sparsity of the concept. This suggests that rather than being
highly dependent on a few specific examples, the features that inform the
development of a concept are spread in a more diffuse manner across its
exemplars, implying robustness in concept formation