5 research outputs found

    Estimating the size distribution of plastics ingested by animals

    Get PDF
    The ingestion of plastics appears to be widespread throughout the animal kingdom with risks to individuals, ecosystems and human health. Despite growing information on the location, abundance and size distribution of plastics in the environment, it cannot be assumed that any given animal will ingest all sizes of plastic encountered. Here, we use published data to develop an allometric relationship between plastic consumption and animal size to estimate the size distribution of plastics feasibly ingested by animals. Based on more than 2000 gut content analyses from animals ranging over three orders of magnitude in size (lengths 9 mm to 10 m), body length alone accounts for 42% of the variance in the length of plastic an animal may ingest and indicates a size ratio of roughly 20:1 between animal body length and the largest plastic the animal may ingest. We expect this work to improve global assessments of plastic pollution risk by introducing a quantifiable link between animals and the plastics they can ingest

    MaWGAN: a generative adversarial network to create synthetic data from datasets with missing data

    Get PDF
    The creation of synthetic data are important for a range of applications, for example, to anonymise sensitive datasets or to increase the volume of data in a dataset. When the target dataset has missing data, then it is common to just discard incomplete observations, even though this necessarily means some loss of information. However, when the proportion of missing data are large, discarding incomplete observations may not leave enough data to accurately estimate their joint distribution. Thus, there is a need for data synthesis methods capable of using datasets with missing data, to improve accuracy and, in more extreme cases, to make data synthesis possible. To achieve this, we propose a novel generative adversarial network (GAN) called MaWGAN (for masked Wasserstein GAN), which creates synthetic data directly from datasets with missing values. As with existing GAN approaches, the MaWGAN synthetic data generator generates samples from the full joint distribution. We introduce a novel methodology for comparing the generator output with the original data that does not require us to discard incomplete observations, based on a modification of the Wasserstein distance and easily implemented using masks generated from the pattern of missing data in the original dataset. Numerical experiments are used to demonstrate the superior performance of MaWGAN compared to (a) discarding incomplete observations before using a GAN, and (b) imputing missing values (using the GAIN algorithm) before using a GA

    MaWGAN: A Generative Adversarial Network to Create Synthetic Data from Datasets with Missing Data

    No full text
    The creation of synthetic data are important for a range of applications, for example, to anonymise sensitive datasets or to increase the volume of data in a dataset. When the target dataset has missing data, then it is common to just discard incomplete observations, even though this necessarily means some loss of information. However, when the proportion of missing data are large, discarding incomplete observations may not leave enough data to accurately estimate their joint distribution. Thus, there is a need for data synthesis methods capable of using datasets with missing data, to improve accuracy and, in more extreme cases, to make data synthesis possible. To achieve this, we propose a novel generative adversarial network (GAN) called MaWGAN (for masked Wasserstein GAN), which creates synthetic data directly from datasets with missing values. As with existing GAN approaches, the MaWGAN synthetic data generator generates samples from the full joint distribution. We introduce a novel methodology for comparing the generator output with the original data that does not require us to discard incomplete observations, based on a modification of the Wasserstein distance and easily implemented using masks generated from the pattern of missing data in the original dataset. Numerical experiments are used to demonstrate the superior performance of MaWGAN compared to (a) discarding incomplete observations before using a GAN, and (b) imputing missing values (using the GAIN algorithm) before using a GAN
    corecore