41,036 research outputs found
Text segmentation with character-level text embeddings
Learning word representations has recently seen much success in computational
linguistics. However, assuming sequences of word tokens as input to linguistic
analysis is often unjustified. For many languages word segmentation is a
non-trivial task and naturally occurring text is sometimes a mixture of natural
language strings and other character data. We propose to learn text
representations directly from raw character sequences by training a Simple
recurrent Network to predict the next character in text. The network uses its
hidden layer to evolve abstract representations of the character sequences it
sees. To demonstrate the usefulness of the learned text embeddings, we use them
as features in a supervised character level text segmentation and labeling
task: recognizing spans of text containing programming language code. By using
the embeddings as features we are able to substantially improve over a baseline
which uses only surface character n-grams.Comment: Workshop on Deep Learning for Audio, Speech and Language Processing,
ICML 201
Self-Supervised Learning for Segmentation using Image Reconstruction
Deep learning is the engine that is piloting tremendous growth in various segments of the industry by consuming valuable fuel called data. We are witnessing many businesses adopting this technology be it healthcare, transportation, defense, semiconductor, or retail. But most of the accomplishments that we see now rely on supervised learning. Supervised learning needs a substantial volume of labeled data which are usually annotated by humans- an arduous and expensive task often leading to datasets that are insufficient in size or human labeling errors. The performance of deep learning models is only as good as the data. Self-supervised learning minimizes the need for labeled data as it extracts the pertinent context and inherited data content. We are inspired by image interpolation where we resize an image from a one-pixel grid to another. We introduce a novel self-supervised learning method specialized for semantic segmentation tasks. We use Image reconstruction as a pre-text task where pixels and or pixel channel (R or G or B pixel channel) in the input images are dropped in a defined or random manner and the original image serves as ground truth. We use the ImageNet dataset for a pretext learning task, and PASCAL V0C to evaluate efficacy of proposed methods. In segmentation tasks decoder is equally important as the encoder, since our proposed method learns both the encoder and decoder as a part of a pretext task, our method outperforms existing self-supervised segmentation methods
Structured prediction models for argumentative claim parsing from text
The internet abounds with opinions expressed in text. While a number of natural language processing techniques have been proposed for opinion analysis from text, most offer only a shallow analysis without providing any insights into reasons supporting the opinions. In online discussions, however, opinions are typically expressed as arguments, consisting of a set of claims endowed with internal semantic structure amenable to deeper analysis. In this article, we introduce the task of argumentative claim parsing (ACP), which aims at extracting semantic structures of claims from argumentative text. The task is split into two subtasks: claim segmentation and claim structuring. We present a new dataset on two discussion topics with claims manually annotated for both subtasks. Inspired by structured prediction approaches, we propose a number of supervised machine learning models for the ACP task, including deep learning, chain classifier, and joint learning models. Our experiments reveal that claim segmentation is a relatively feasible task, with the best-performing model achieving up to 0.37 and 0.79 exact and lenient macro-averaged F1-score, respectively. Claim structuring, however, proved to be a more challenging task, with the best-performing models achieving at most 0.08 macro-averaged F1-score
Towards Omni-supervised Referring Expression Segmentation
Referring Expression Segmentation (RES) is an emerging task in computer
vision, which segments the target instances in images based on text
descriptions. However, its development is plagued by the expensive segmentation
labels. To address this issue, we propose a new learning task for RES called
Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to
make full use of unlabeled, fully labeled and weakly labeled data, e.g.,
referring points or grounding boxes, for efficient RES training. To accomplish
this task, we also propose a novel yet strong baseline method for Omni-RES
based on the recently popular teacher-student learning, where the weak labels
are not directly transformed into supervision signals but used as a yardstick
to select and refine high-quality pseudo-masks for teacher-student learning. To
validate the proposed Omni-RES method, we apply it to a set of state-of-the-art
RES models and conduct extensive experiments on a bunch of RES datasets. The
experimental results yield the obvious merits of Omni-RES than the
fully-supervised and semi-supervised training schemes. For instance, with only
10% fully labeled data, Omni-RES can help the base model achieve 100% fully
supervised performance, and it also outperform the semi-supervised alternative
by a large margin, e.g., +14.93% on RefCOCO and +14.95% on RefCOCO+,
respectively. More importantly, Omni-RES also enable the use of large-scale
vision-langauges like Visual Genome to facilitate low-cost RES training, and
achieve new SOTA performance of RES, e.g., 80.66 on RefCOCO
Shatter and Gather: Learning Referring Image Segmentation with Text Supervision
Referring image segmentation, the task of segmenting any arbitrary entities
described in free-form texts, opens up a variety of vision applications.
However, manual labeling of training data for this task is prohibitively
costly, leading to lack of labeled data for training. We address this issue by
a weakly supervised learning approach using text descriptions of training
images as the only source of supervision. To this end, we first present a new
model that discovers semantic entities in input image and then combines such
entities relevant to text query to predict the mask of the referent. We also
present a new loss function that allows the model to be trained without any
further supervision. Our method was evaluated on four public benchmarks for
referring image segmentation, where it clearly outperformed the existing method
for the same task and recent open-vocabulary segmentation models on all the
benchmarks.Comment: Accepted to ICCV 2023, Project page:
https://southflame.github.io/sag
Enhancing Self-Supervised Learning for Remote Sensing with Elevation Data: A Case Study with Scarce And High Level Semantic Labels
This work proposes a hybrid unsupervised and supervised learning method to
pre-train models applied in Earth observation downstream tasks when only a
handful of labels denoting very general semantic concepts are available. We
combine a contrastive approach to pre-train models with a pixel-wise regression
pre-text task to predict coarse elevation maps, which are commonly available
worldwide. We hypothesize that this will allow the model to pre-learn useful
representations, as there is generally some correlation between elevation maps
and targets in many remote sensing tasks. We assess the performance of our
approach on a binary semantic segmentation task and a binary image
classification task, both derived from a dataset created for the northwest of
Colombia. In both cases, we pre-train our models with 39k unlabeled images,
fine-tune them on the downstream tasks with only 80 labeled images, and
evaluate them with 2944 labeled images. Our experiments show that our methods,
GLCNet+Elevation for segmentation, and SimCLR+Elevation for classification,
outperform their counterparts without the pixel-wise regression pre-text task,
namely SimCLR and GLCNet, in terms of macro-average F1 Score and Mean
Intersection over Union (MIoU). Our study not only encourages the development
of pre-training methods that leverage readily available geographical
information, such as elevation data, to enhance the performance of
self-supervised methods when applied to Earth observation tasks, but also
promotes the use of datasets with high-level semantic labels, which are more
likely to be updated frequently. Project code can be found in this link
\href{https://github.com/omarcastano/Elevation-Aware-SSL}{https://github.com/omarcastano/Elevation-Aware-SSL}
VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation
Rich and dense human labeled datasets are among the main enabling factors for
the recent advance on vision-language understanding. Many seemingly distant
annotations (e.g., semantic segmentation and visual question answering (VQA))
are inherently connected in that they reveal different levels and perspectives
of human understandings about the same visual scenes --- and even the same set
of images (e.g., of COCO). The popularity of COCO correlates those annotations
and tasks. Explicitly linking them up may significantly benefit both individual
tasks and the unified vision and language modeling. We present the preliminary
work of linking the instance segmentations provided by COCO to the questions
and answers (QAs) in the VQA dataset, and name the collected links visual
questions and segmentation answers (VQS). They transfer human supervision
between the previously separate tasks, offer more effective leverage to
existing problems, and also open the door for new research problems and models.
We study two applications of the VQS data in this paper: supervised attention
for VQA and a novel question-focused semantic segmentation task. For the
former, we obtain state-of-the-art results on the VQA real multiple-choice task
by simply augmenting the multilayer perceptrons with some attention features
that are learned using the segmentation-QA links as explicit supervision. To
put the latter in perspective, we study two plausible methods and compare them
to an oracle method assuming that the instance segmentations are given at the
test stage.Comment: To appear on ICCV 201
- …