1,119 research outputs found

    SizeGAN: Improving Size Representation in Clothing Catalogs

    Full text link
    Online clothing catalogs lack diversity in body shape and garment size. Brands commonly display their garments on models of one or two sizes, rarely including plus-size models. In this work, we propose a new method, SizeGAN, for generating images of garments on different-sized models. To change the garment and model size while maintaining a photorealistic image, we incorporate image alignment ideas from the medical imaging literature into the StyleGAN2-ADA architecture. Our method learns deformation fields at multiple resolutions and uses a spatial transformer to modify the garment and model size. We evaluate our approach along three dimensions: realism, garment faithfulness, and size. To our knowledge, SizeGAN is the first method to focus on this size under-representation problem for modeling clothing. We provide an analysis comparing SizeGAN to other plausible approaches and additionally provide the first clothing dataset with size labels. In a user study comparing SizeGAN and two recent virtual try-on methods, we show that our method ranks first in each dimension, and was vastly preferred for realism and garment faithfulness. In comparison to most previous work, which has focused on generating photorealistic images of garments, our work shows that it is possible to generate images that are both photorealistic and cover diverse garment sizes

    GELDA: A generative language annotation framework to reveal visual biases in datasets

    Full text link
    Bias analysis is a crucial step in the process of creating fair datasets for training and evaluating computer vision models. The bottleneck in dataset analysis is annotation, which typically requires: (1) specifying a list of attributes relevant to the dataset domain, and (2) classifying each image-attribute pair. While the second step has made rapid progress in automation, the first has remained human-centered, requiring an experimenter to compile lists of in-domain attributes. However, an experimenter may have limited foresight leading to annotation "blind spots," which in turn can lead to flawed downstream dataset analyses. To combat this, we propose GELDA, a nearly automatic framework that leverages large generative language models (LLMs) to propose and label various attributes for a domain. GELDA takes a user-defined domain caption (e.g., "a photo of a bird," "a photo of a living room") and uses an LLM to hierarchically generate attributes. In addition, GELDA uses the LLM to decide which of a set of vision-language models (VLMs) to use to classify each attribute in images. Results on real datasets show that GELDA can generate accurate and diverse visual attribute suggestions, and uncover biases such as confounding between class labels and background features. Results on synthetic datasets demonstrate that GELDA can be used to evaluate the biases of text-to-image diffusion models and generative adversarial networks. Overall, we show that while GELDA is not accurate enough to replace human annotators, it can serve as a complementary tool to help humans analyze datasets in a cheap, low-effort, and flexible manner.Comment: 21 pages, 15 figures, 9 table

    Bayesian information-theoretic calibration of patient-specific radiotherapy sensitivity parameters for informing effective scanning protocols in cancer

    Full text link
    With new advancements in technology, it is now possible to collect data for a variety of different metrics describing tumor growth, including tumor volume, composition, and vascularity, among others. For any proposed model of tumor growth and treatment, we observe large variability among individual patients' parameter values, particularly those relating to treatment response; thus, exploiting the use of these various metrics for model calibration can be helpful to infer such patient-specific parameters both accurately and early, so that treatment protocols can be adjusted mid-course for maximum efficacy. However, taking measurements can be costly and invasive, limiting clinicians to a sparse collection schedule. As such, the determination of optimal times and metrics for which to collect data in order to best inform proper treatment protocols could be of great assistance to clinicians. In this investigation, we employ a Bayesian information-theoretic calibration protocol for experimental design in order to identify the optimal times at which to collect data for informing treatment parameters. Within this procedure, data collection times are chosen sequentially to maximize the reduction in parameter uncertainty with each added measurement, ensuring that a budget of nn high-fidelity experimental measurements results in maximum information gain about the low-fidelity model parameter values. In addition to investigating the optimal temporal pattern for data collection, we also develop a framework for deciding which metrics should be utilized at each data collection point. We illustrate this framework with a variety of toy examples, each utilizing a radiotherapy treatment regimen. For each scenario, we analyze the dependence of the predictive power of the low-fidelity model upon the measurement budget

    Designing experimental conditions to use the Lotka-Volterra model to infer tumor cell line interaction types

    Full text link
    The Lotka-Volterra model is widely used to model interactions between two species. Here, we generate synthetic data mimicking competitive, mutualistic and antagonistic interactions between two tumor cell lines, and then use the Lotka-Volterra model to infer the interaction type. Structural identifiability of the Lotka-Volterra model is confirmed, and practical identifiability is assessed for three experimental designs: (a) use of a single data set, with a mixture of both cell lines observed over time, (b) a sequential design where growth rates and carrying capacities are estimated using data from experiments in which each cell line is grown in isolation, and then interaction parameters are estimated from an experiment involving a mixture of both cell lines, and (c) a parallel experimental design where all model parameters are fitted to data from two mixtures simultaneously. In addition to assessing each design for practical identifiability, we investigate how the predictive power of the model-i.e., its ability to fit data for initial ratios other than those to which it was calibrated-is affected by the choice of experimental design. The parallel calibration procedure is found to be optimal and is further tested on in silico data generated from a spatially-resolved cellular automaton model, which accounts for oxygen consumption and allows for variation in the intensity level of the interaction between the two cell lines. We use this study to highlight the care that must be taken when interpreting parameter estimates for the spatially-averaged Lotka-Volterra model when it is calibrated against data produced by the spatially-resolved cellular automaton model, since baseline competition for space and resources in the CA model may contribute to a discrepancy between the type of interaction used to generate the CA data and the type of interaction inferred by the LV model.Comment: 25 pages, 18 figure

    Generating Image-Specific Text Improves Fine-grained Image Classification

    Full text link
    Recent vision-language models outperform vision-only models on many image classification tasks. However, because of the absence of paired text/image descriptions, it remains difficult to fine-tune these models for fine-grained image classification. In this work, we propose a method, GIST, for generating image-specific fine-grained text descriptions from image-only datasets, and show that these text descriptions can be used to improve classification. Key parts of our method include 1. prompting a pretrained large language model with domain-specific prompts to generate diverse fine-grained text descriptions for each class and 2. using a pretrained vision-language model to match each image to label-preserving text descriptions that capture relevant visual features in the image. We demonstrate the utility of GIST by fine-tuning vision-language models on the image-and-generated-text pairs to learn an aligned vision-language representation space for improved classification. We evaluate our learned representation space in full-shot and few-shot scenarios across four diverse fine-grained classification datasets, each from a different domain. Our method achieves an average improvement of 4.1%4.1\% in accuracy over CLIP linear probes and an average of 1.1%1.1\% improvement in accuracy over the previous state-of-the-art image-text classification method on the full-shot datasets. Our method achieves similar improvements across few-shot regimes. Code is available at https://github.com/emu1729/GIST.Comment: The first two authors contributed equally to this wor

    Intuitively Assessing ML Model Reliability through Example-Based Explanations and Editing Model Inputs

    Full text link
    Interpretability methods aim to help users build trust in and understand the capabilities of machine learning models. However, existing approaches often rely on abstract, complex visualizations that poorly map to the task at hand or require non-trivial ML expertise to interpret. Here, we present two visual analytics modules that facilitate an intuitive assessment of model reliability. To help users better characterize and reason about a model's uncertainty, we visualize raw and aggregate information about a given input's nearest neighbors. Using an interactive editor, users can manipulate this input in semantically-meaningful ways, determine the effect on the output, and compare against their prior expectations. We evaluate our interface using an electrocardiogram beat classification case study. Compared to a baseline feature importance interface, we find that 14 physicians are better able to align the model's uncertainty with domain-relevant factors and build intuition about its capabilities and limitations

    Environmental assessment of sugar beet production

    Get PDF
    Environmental impact assessments are often used to aid decision-making on complex planning issues and the use of such techniques within agriculture is about to come of age. Sophisticated risk assessment methods are now available for planning pesticide strategies and mathematical models have been developed which simulate the nitrogen dynamics within arable land to generate field specific fertiliser recommendations. In addition, energy budgeting techniques have been published in the scientific press. However, to date few have attempted to draw together these techniques to quantify the environmental impact of a specific crop. The British Beet Research Organisation has a key research target to improve the environmental impact of the sugar beet crop and the sugar industry. Consequently, they have funded a research project to use state-of-the-art tools to compare the potential environmental impact of a range of conventional beet production systems in the UK and to present the findings alongside an economic assessment. This paper will provide an insight to the techniques being used and will present the interim project findings.Peer reviewedFinal Accepted Versio
    corecore