113,081 research outputs found
Generation of realistic synthetic validation healthcare datasets using generative adversarial networks
Background:
Assurance of digital health interventions involves, amongst others, clinical validation, which requires large datasets to test the application in realistic clinical scenarios. Development of such datasets is time consuming and challenging in terms of maintaining patient anonymity and consent.
Objective:
The development of synthetic datasets that maintain the statistical properties of the real datasets.
Method:
An artificial neural network based, generative adversarial network was implemented and trained, using numerical and categorical variables, including ICD-9 codes from the MIMIC III dataset, to produce a synthetic dataset.
Results:
The synthetic dataset, exhibits a correlation matrix highly similar to the real dataset, good Jaccard similarity and passing the KS test.
Conclusions:
The proof of concept was successful with the approach being promising for further work
UNCLES: Method for the identification of genes differentially consistently co-expressed in a specific subset of datasets
Background: Collective analysis of the increasingly emerging gene expression datasets are required. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method can combine clustering results from multiple datasets to identify the subsets of genes which are consistently co-expressed in all of the provided datasets in a tuneable manner. However, results validation and parameter setting are issues that complicate the design of such methods. Moreover, although it is a common practice to test methods by application to synthetic datasets, the mathematical models used to synthesise such datasets are usually based on approximations which may not always be sufficiently representative of real datasets. Results: Here, we propose an unsupervised method for the unification of clustering results from multiple datasets using external specifications (UNCLES). This method has the ability to identify the subsets of genes consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets, and to identify the subsets of genes consistently co-expressed in all given datasets. We also propose the M-N scatter plots validation technique and adopt it to set the parameters of UNCLES, such as the number of clusters, automatically. Additionally, we propose an approach for the synthesis of gene expression datasets using real data profiles in a way which combines the ground-truth-knowledge of synthetic data and the realistic expression values of real data, and therefore overcomes the problem of faithfulness of synthetic expression data modelling. By application to those datasets, we validate UNCLES while comparing it with other conventional clustering methods, and of particular relevance, biclustering methods. We further validate UNCLES by application to a set of 14 real genome-wide yeast datasets as it produces focused clusters that conform well to known biological facts. Furthermore, in-silico-based hypotheses regarding the function of a few previously unknown genes in those focused clusters are drawn. Conclusions: The UNCLES method, the M-N scatter plots technique, and the expression data synthesis approach will have wide application for the comprehensive analysis of genomic and other sources of multiple complex biological datasets. Moreover, the derived in-silico-based biological hypotheses represent subjects for future functional studies.The National Institute for Health Research (NIHR) under its Programme Grants for Applied Research
Programme (Grant Reference Number RP-PG-0310-1004)
Play and Learn: Using Video Games to Train Computer Vision Models
Video games are a compelling source of annotated data as they can readily
provide fine-grained groundtruth for diverse tasks. However, it is not clear
whether the synthetically generated data has enough resemblance to the
real-world images to improve the performance of computer vision models in
practice. We present experiments assessing the effectiveness on real-world data
of systems trained on synthetic RGB images that are extracted from a video
game. We collected over 60000 synthetic samples from a modern video game with
similar conditions to the real-world CamVid and Cityscapes datasets. We provide
several experiments to demonstrate that the synthetically generated RGB images
can be used to improve the performance of deep neural networks on both image
segmentation and depth estimation. These results show that a convolutional
network trained on synthetic data achieves a similar test error to a network
that is trained on real-world data for dense image classification. Furthermore,
the synthetically generated RGB images can provide similar or better results
compared to the real-world datasets if a simple domain adaptation technique is
applied. Our results suggest that collaboration with game developers for an
accessible interface to gather data is potentially a fruitful direction for
future work in computer vision.Comment: To appear in the British Machine Vision Conference (BMVC), September
2016. -v2: fixed a typo in the reference
- ā¦