15,336 research outputs found
Generative Adversarial Networks for Bitcoin Data Augmentation
In Bitcoin entity classification, results are strongly conditioned by the
ground-truth dataset, especially when applying supervised machine learning
approaches. However, these ground-truth datasets are frequently affected by
significant class imbalance as generally they contain much more information
regarding legal services (Exchange, Gambling), than regarding services that may
be related to illicit activities (Mixer, Service). Class imbalance increases
the complexity of applying machine learning techniques and reduces the quality
of classification results, especially for underrepresented, but critical
classes.
In this paper, we propose to address this problem by using Generative
Adversarial Networks (GANs) for Bitcoin data augmentation as GANs recently have
shown promising results in the domain of image classification. However, there
is no "one-fits-all" GAN solution that works for every scenario. In fact,
setting GAN training parameters is non-trivial and heavily affects the quality
of the generated synthetic data. We therefore evaluate how GAN parameters such
as the optimization function, the size of the dataset and the chosen batch size
affect GAN implementation for one underrepresented entity class (Mining Pool)
and demonstrate how a "good" GAN configuration can be obtained that achieves
high similarity between synthetically generated and real Bitcoin address data.
To the best of our knowledge, this is the first study presenting GANs as a
valid tool for generating synthetic address data for data augmentation in
Bitcoin entity classification.Comment: 8 pages, 5 figures, 4 table
Data Augmentation Using Generative Adversarial Networks
Většina dat z reálného světa není rovnoměrně rozdělena do odpovídajících tříd, ale je nevyvážená, což může mít velký vliv na kvalitu predikce klasifikačních modelů. Obecný přístup k řešení tohoto problému je modifikace původních datových sad tak, abychom dosáhli vyváženosti jednotlivých tříd. Tato práce se zaobírá vyvážením obrazových dat za pomoci generativních adversariálních sítí. Primární důraz je kladen na generování obrazových dat náležících do tříd s nedostatečným počtem reprezentantů, což je proces známý jako class balancing. Práce se zabývá analýzou a porovnáním různých technik používaných pro rozšíření dat, jako jsou geometrické metody nebo modely založené na principu neuronových sítí. Vyhodnocení je provedeno pomocí klasifikačních modelů, natrénovaných na původních, nevyvážených i uměle vyvážených datových sadách. Dosažené výsledky naznačují, jak schopnost jednotlivých metod rozšířit datové sady klesá se zvětšující mírou nevyvážení a rozmanitostí těchto sad.Most labelled real-world data is not uniformly distributed within classes, which can have a severe impact on the prediction quality of classification models. A general approach is to overcome this issue by modifying the original data to restore the balance of the classes. This thesis deals with balancing image datasets by data augmentation using generative adversarial neural networks. The primary focus is on generating images of underrepresented classes in imbalanced datasets, which is a process known as class balancing. The aim of this thesis is to analyse and compare data augmentation techniques including standard methods, generative adversarial networks and autoencoders. Evaluation is done using classifiers trained on the original, unbalanced and augmented datasets. The results achieved suggest how the performance of the methods proportionately deteriorates with increasing imbalance rate and diversity of datasets
Generative Adversarial Networks for Data Augmentation
One way to expand the available dataset for training AI models in the medical
field is through the use of Generative Adversarial Networks (GANs) for data
augmentation. GANs work by employing a generator network to create new data
samples that are then assessed by a discriminator network to determine their
similarity to real samples. The discriminator network is taught to
differentiate between actual and synthetic samples, while the generator system
is trained to generate data that closely resemble real ones. The process is
repeated until the generator network can produce synthetic data that is
indistinguishable from genuine data. GANs have been utilized in medical image
analysis for various tasks, including data augmentation, image creation, and
domain adaptation. They can generate synthetic samples that can be used to
increase the available dataset, especially in cases where obtaining large
amounts of genuine data is difficult or unethical. However, it is essential to
note that the use of GANs in medical imaging is still an active area of
research to ensure that the produced images are of high quality and suitable
for use in clinical settings.Comment: 13 pages, 6 figures, 1 table; Acceptance of the chapter for the
Springer book "Data-driven approaches to medical imaging
GAN Augmentation: Augmenting Training Data using Generative Adversarial Networks
One of the biggest issues facing the use of machine learning in medical
imaging is the lack of availability of large, labelled datasets. The annotation
of medical images is not only expensive and time consuming but also highly
dependent on the availability of expert observers. The limited amount of
training data can inhibit the performance of supervised machine learning
algorithms which often need very large quantities of data on which to train to
avoid overfitting. So far, much effort has been directed at extracting as much
information as possible from what data is available. Generative Adversarial
Networks (GANs) offer a novel way to unlock additional information from a
dataset by generating synthetic samples with the appearance of real images.
This paper demonstrates the feasibility of introducing GAN derived synthetic
data to the training datasets in two brain segmentation tasks, leading to
improvements in Dice Similarity Coefficient (DSC) of between 1 and 5 percentage
points under different conditions, with the strongest effects seen fewer than
ten training image stacks are available
- …