4 research outputs found
A multi-stage GAN for multi-organ chest X-ray image generation and segmentation
Multi-organ segmentation of X-ray images is of fundamental importance for
computer aided diagnosis systems. However, the most advanced semantic
segmentation methods rely on deep learning and require a huge amount of labeled
images, which are rarely available due to both the high cost of human resources
and the time required for labeling. In this paper, we present a novel
multi-stage generation algorithm based on Generative Adversarial Networks
(GANs) that can produce synthetic images along with their semantic labels and
can be used for data augmentation. The main feature of the method is that,
unlike other approaches, generation occurs in several stages, which simplifies
the procedure and allows it to be used on very small datasets. The method has
been evaluated on the segmentation of chest radiographic images, showing
promising results. The multistage approach achieves state-of-the-art and, when
very few images are used to train the GANs, outperforms the corresponding
single-stage approach
UPOCR: Towards Unified Pixel-Level OCR Interface
In recent years, the optical character recognition (OCR) field has been
proliferating with plentiful cutting-edge approaches for a wide spectrum of
tasks. However, these approaches are task-specifically designed with divergent
paradigms, architectures, and training strategies, which significantly
increases the complexity of research and maintenance and hinders the fast
deployment in applications. To this end, we propose UPOCR, a
simple-yet-effective generalist model for Unified Pixel-level OCR interface.
Specifically, the UPOCR unifies the paradigm of diverse OCR tasks as
image-to-image transformation and the architecture as a vision Transformer
(ViT)-based encoder-decoder. Learnable task prompts are introduced to push the
general feature representations extracted by the encoder toward task-specific
spaces, endowing the decoder with task awareness. Moreover, the model training
is uniformly aimed at minimizing the discrepancy between the generated and
ground-truth images regardless of the inhomogeneity among tasks. Experiments
are conducted on three pixel-level OCR tasks including text removal, text
segmentation, and tampered text detection. Without bells and whistles, the
experimental results showcase that the proposed method can simultaneously
achieve state-of-the-art performance on three tasks with a unified single
model, which provides valuable strategies and insights for future research on
generalist OCR models. Code will be publicly available
Weak supervision for generating pixel–level annotations in scene text segmentation
Providing pixel–level supervisions for scene text segmentation is inherently difficult and costly, so that only few small datasets are available for this task. To face the scarcity of training data, previous approaches based on Convolutional Neural Networks (CNNs) rely on the use of a synthetic dataset for pre–training. However, synthetic data cannot reproduce the complexity and variability of natural images. In this work, we propose to use a weakly supervised learning approach to reduce the domain–shift between synthetic and real data. Leveraging the bounding–box supervision of the COCO–Text and the MLT datasets, we generate weak pixel–level supervisions of real images. In particular, the COCO–Text–Segmentation (COCO_TS) and the MLT–Segmentation (MLT_S) datasets are created and released. These two datasets are used to train a CNN, the Segmentation Multiscale Attention Network (SMANet), which is specifically designed to face some peculiarities of the scene text segmentation task. The SMANet is trained end–to–end on the proposed datasets, and the experiments show that COCO_TS and MLT_S are a valid alternative to synthetic images, allowing to use only a fraction of the training samples, with a significant improvement in performance