1,119 research outputs found
SizeGAN: Improving Size Representation in Clothing Catalogs
Online clothing catalogs lack diversity in body shape and garment size.
Brands commonly display their garments on models of one or two sizes, rarely
including plus-size models. In this work, we propose a new method, SizeGAN, for
generating images of garments on different-sized models. To change the garment
and model size while maintaining a photorealistic image, we incorporate image
alignment ideas from the medical imaging literature into the StyleGAN2-ADA
architecture. Our method learns deformation fields at multiple resolutions and
uses a spatial transformer to modify the garment and model size. We evaluate
our approach along three dimensions: realism, garment faithfulness, and size.
To our knowledge, SizeGAN is the first method to focus on this size
under-representation problem for modeling clothing. We provide an analysis
comparing SizeGAN to other plausible approaches and additionally provide the
first clothing dataset with size labels. In a user study comparing SizeGAN and
two recent virtual try-on methods, we show that our method ranks first in each
dimension, and was vastly preferred for realism and garment faithfulness. In
comparison to most previous work, which has focused on generating
photorealistic images of garments, our work shows that it is possible to
generate images that are both photorealistic and cover diverse garment sizes
GELDA: A generative language annotation framework to reveal visual biases in datasets
Bias analysis is a crucial step in the process of creating fair datasets for
training and evaluating computer vision models. The bottleneck in dataset
analysis is annotation, which typically requires: (1) specifying a list of
attributes relevant to the dataset domain, and (2) classifying each
image-attribute pair. While the second step has made rapid progress in
automation, the first has remained human-centered, requiring an experimenter to
compile lists of in-domain attributes. However, an experimenter may have
limited foresight leading to annotation "blind spots," which in turn can lead
to flawed downstream dataset analyses. To combat this, we propose GELDA, a
nearly automatic framework that leverages large generative language models
(LLMs) to propose and label various attributes for a domain. GELDA takes a
user-defined domain caption (e.g., "a photo of a bird," "a photo of a living
room") and uses an LLM to hierarchically generate attributes. In addition,
GELDA uses the LLM to decide which of a set of vision-language models (VLMs) to
use to classify each attribute in images. Results on real datasets show that
GELDA can generate accurate and diverse visual attribute suggestions, and
uncover biases such as confounding between class labels and background
features. Results on synthetic datasets demonstrate that GELDA can be used to
evaluate the biases of text-to-image diffusion models and generative
adversarial networks. Overall, we show that while GELDA is not accurate enough
to replace human annotators, it can serve as a complementary tool to help
humans analyze datasets in a cheap, low-effort, and flexible manner.Comment: 21 pages, 15 figures, 9 table
Bayesian information-theoretic calibration of patient-specific radiotherapy sensitivity parameters for informing effective scanning protocols in cancer
With new advancements in technology, it is now possible to collect data for a
variety of different metrics describing tumor growth, including tumor volume,
composition, and vascularity, among others. For any proposed model of tumor
growth and treatment, we observe large variability among individual patients'
parameter values, particularly those relating to treatment response; thus,
exploiting the use of these various metrics for model calibration can be
helpful to infer such patient-specific parameters both accurately and early, so
that treatment protocols can be adjusted mid-course for maximum efficacy.
However, taking measurements can be costly and invasive, limiting clinicians to
a sparse collection schedule. As such, the determination of optimal times and
metrics for which to collect data in order to best inform proper treatment
protocols could be of great assistance to clinicians. In this investigation, we
employ a Bayesian information-theoretic calibration protocol for experimental
design in order to identify the optimal times at which to collect data for
informing treatment parameters. Within this procedure, data collection times
are chosen sequentially to maximize the reduction in parameter uncertainty with
each added measurement, ensuring that a budget of high-fidelity
experimental measurements results in maximum information gain about the
low-fidelity model parameter values. In addition to investigating the optimal
temporal pattern for data collection, we also develop a framework for deciding
which metrics should be utilized at each data collection point. We illustrate
this framework with a variety of toy examples, each utilizing a radiotherapy
treatment regimen. For each scenario, we analyze the dependence of the
predictive power of the low-fidelity model upon the measurement budget
Designing experimental conditions to use the Lotka-Volterra model to infer tumor cell line interaction types
The Lotka-Volterra model is widely used to model interactions between two
species. Here, we generate synthetic data mimicking competitive, mutualistic
and antagonistic interactions between two tumor cell lines, and then use the
Lotka-Volterra model to infer the interaction type. Structural identifiability
of the Lotka-Volterra model is confirmed, and practical identifiability is
assessed for three experimental designs: (a) use of a single data set, with a
mixture of both cell lines observed over time, (b) a sequential design where
growth rates and carrying capacities are estimated using data from experiments
in which each cell line is grown in isolation, and then interaction parameters
are estimated from an experiment involving a mixture of both cell lines, and
(c) a parallel experimental design where all model parameters are fitted to
data from two mixtures simultaneously. In addition to assessing each design for
practical identifiability, we investigate how the predictive power of the
model-i.e., its ability to fit data for initial ratios other than those to
which it was calibrated-is affected by the choice of experimental design. The
parallel calibration procedure is found to be optimal and is further tested on
in silico data generated from a spatially-resolved cellular automaton model,
which accounts for oxygen consumption and allows for variation in the intensity
level of the interaction between the two cell lines. We use this study to
highlight the care that must be taken when interpreting parameter estimates for
the spatially-averaged Lotka-Volterra model when it is calibrated against data
produced by the spatially-resolved cellular automaton model, since baseline
competition for space and resources in the CA model may contribute to a
discrepancy between the type of interaction used to generate the CA data and
the type of interaction inferred by the LV model.Comment: 25 pages, 18 figure
Generating Image-Specific Text Improves Fine-grained Image Classification
Recent vision-language models outperform vision-only models on many image
classification tasks. However, because of the absence of paired text/image
descriptions, it remains difficult to fine-tune these models for fine-grained
image classification. In this work, we propose a method, GIST, for generating
image-specific fine-grained text descriptions from image-only datasets, and
show that these text descriptions can be used to improve classification. Key
parts of our method include 1. prompting a pretrained large language model with
domain-specific prompts to generate diverse fine-grained text descriptions for
each class and 2. using a pretrained vision-language model to match each image
to label-preserving text descriptions that capture relevant visual features in
the image. We demonstrate the utility of GIST by fine-tuning vision-language
models on the image-and-generated-text pairs to learn an aligned
vision-language representation space for improved classification. We evaluate
our learned representation space in full-shot and few-shot scenarios across
four diverse fine-grained classification datasets, each from a different
domain. Our method achieves an average improvement of in accuracy over
CLIP linear probes and an average of improvement in accuracy over the
previous state-of-the-art image-text classification method on the full-shot
datasets. Our method achieves similar improvements across few-shot regimes.
Code is available at https://github.com/emu1729/GIST.Comment: The first two authors contributed equally to this wor
Intuitively Assessing ML Model Reliability through Example-Based Explanations and Editing Model Inputs
Interpretability methods aim to help users build trust in and understand the
capabilities of machine learning models. However, existing approaches often
rely on abstract, complex visualizations that poorly map to the task at hand or
require non-trivial ML expertise to interpret. Here, we present two visual
analytics modules that facilitate an intuitive assessment of model reliability.
To help users better characterize and reason about a model's uncertainty, we
visualize raw and aggregate information about a given input's nearest
neighbors. Using an interactive editor, users can manipulate this input in
semantically-meaningful ways, determine the effect on the output, and compare
against their prior expectations. We evaluate our interface using an
electrocardiogram beat classification case study. Compared to a baseline
feature importance interface, we find that 14 physicians are better able to
align the model's uncertainty with domain-relevant factors and build intuition
about its capabilities and limitations
Environmental assessment of sugar beet production
Environmental impact assessments are often used to aid decision-making on complex planning issues and the use of such techniques within agriculture is about to come of age. Sophisticated risk assessment methods are now available for planning pesticide strategies and mathematical models have been developed which simulate the nitrogen dynamics within arable land to generate field specific fertiliser recommendations. In addition, energy budgeting techniques have been published in the scientific press. However, to date few have attempted to draw together these techniques to quantify the environmental impact of a specific crop. The British Beet Research Organisation has a key research target to improve the environmental impact of the sugar beet crop and the sugar industry. Consequently, they have funded a research project to use state-of-the-art tools to compare the potential environmental impact of a range of conventional beet production systems in the UK and to present the findings alongside an economic assessment. This paper will provide an insight to the techniques being used and will present the interim project findings.Peer reviewedFinal Accepted Versio
- …