41,036 research outputs found

    Text segmentation with character-level text embeddings

    Get PDF
    Learning word representations has recently seen much success in computational linguistics. However, assuming sequences of word tokens as input to linguistic analysis is often unjustified. For many languages word segmentation is a non-trivial task and naturally occurring text is sometimes a mixture of natural language strings and other character data. We propose to learn text representations directly from raw character sequences by training a Simple recurrent Network to predict the next character in text. The network uses its hidden layer to evolve abstract representations of the character sequences it sees. To demonstrate the usefulness of the learned text embeddings, we use them as features in a supervised character level text segmentation and labeling task: recognizing spans of text containing programming language code. By using the embeddings as features we are able to substantially improve over a baseline which uses only surface character n-grams.Comment: Workshop on Deep Learning for Audio, Speech and Language Processing, ICML 201

    Self-Supervised Learning for Segmentation using Image Reconstruction

    Get PDF
    Deep learning is the engine that is piloting tremendous growth in various segments of the industry by consuming valuable fuel called data. We are witnessing many businesses adopting this technology be it healthcare, transportation, defense, semiconductor, or retail. But most of the accomplishments that we see now rely on supervised learning. Supervised learning needs a substantial volume of labeled data which are usually annotated by humans- an arduous and expensive task often leading to datasets that are insufficient in size or human labeling errors. The performance of deep learning models is only as good as the data. Self-supervised learning minimizes the need for labeled data as it extracts the pertinent context and inherited data content. We are inspired by image interpolation where we resize an image from a one-pixel grid to another. We introduce a novel self-supervised learning method specialized for semantic segmentation tasks. We use Image reconstruction as a pre-text task where pixels and or pixel channel (R or G or B pixel channel) in the input images are dropped in a defined or random manner and the original image serves as ground truth. We use the ImageNet dataset for a pretext learning task, and PASCAL V0C to evaluate efficacy of proposed methods. In segmentation tasks decoder is equally important as the encoder, since our proposed method learns both the encoder and decoder as a part of a pretext task, our method outperforms existing self-supervised segmentation methods

    Structured prediction models for argumentative claim parsing from text

    Get PDF
    The internet abounds with opinions expressed in text. While a number of natural language processing techniques have been proposed for opinion analysis from text, most offer only a shallow analysis without providing any insights into reasons supporting the opinions. In online discussions, however, opinions are typically expressed as arguments, consisting of a set of claims endowed with internal semantic structure amenable to deeper analysis. In this article, we introduce the task of argumentative claim parsing (ACP), which aims at extracting semantic structures of claims from argumentative text. The task is split into two subtasks: claim segmentation and claim structuring. We present a new dataset on two discussion topics with claims manually annotated for both subtasks. Inspired by structured prediction approaches, we propose a number of supervised machine learning models for the ACP task, including deep learning, chain classifier, and joint learning models. Our experiments reveal that claim segmentation is a relatively feasible task, with the best-performing model achieving up to 0.37 and 0.79 exact and lenient macro-averaged F1-score, respectively. Claim structuring, however, proved to be a more challenging task, with the best-performing models achieving at most 0.08 macro-averaged F1-score

    Towards Omni-supervised Referring Expression Segmentation

    Full text link
    Referring Expression Segmentation (RES) is an emerging task in computer vision, which segments the target instances in images based on text descriptions. However, its development is plagued by the expensive segmentation labels. To address this issue, we propose a new learning task for RES called Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to make full use of unlabeled, fully labeled and weakly labeled data, e.g., referring points or grounding boxes, for efficient RES training. To accomplish this task, we also propose a novel yet strong baseline method for Omni-RES based on the recently popular teacher-student learning, where the weak labels are not directly transformed into supervision signals but used as a yardstick to select and refine high-quality pseudo-masks for teacher-student learning. To validate the proposed Omni-RES method, we apply it to a set of state-of-the-art RES models and conduct extensive experiments on a bunch of RES datasets. The experimental results yield the obvious merits of Omni-RES than the fully-supervised and semi-supervised training schemes. For instance, with only 10% fully labeled data, Omni-RES can help the base model achieve 100% fully supervised performance, and it also outperform the semi-supervised alternative by a large margin, e.g., +14.93% on RefCOCO and +14.95% on RefCOCO+, respectively. More importantly, Omni-RES also enable the use of large-scale vision-langauges like Visual Genome to facilitate low-cost RES training, and achieve new SOTA performance of RES, e.g., 80.66 on RefCOCO

    Shatter and Gather: Learning Referring Image Segmentation with Text Supervision

    Full text link
    Referring image segmentation, the task of segmenting any arbitrary entities described in free-form texts, opens up a variety of vision applications. However, manual labeling of training data for this task is prohibitively costly, leading to lack of labeled data for training. We address this issue by a weakly supervised learning approach using text descriptions of training images as the only source of supervision. To this end, we first present a new model that discovers semantic entities in input image and then combines such entities relevant to text query to predict the mask of the referent. We also present a new loss function that allows the model to be trained without any further supervision. Our method was evaluated on four public benchmarks for referring image segmentation, where it clearly outperformed the existing method for the same task and recent open-vocabulary segmentation models on all the benchmarks.Comment: Accepted to ICCV 2023, Project page: https://southflame.github.io/sag

    Enhancing Self-Supervised Learning for Remote Sensing with Elevation Data: A Case Study with Scarce And High Level Semantic Labels

    Full text link
    This work proposes a hybrid unsupervised and supervised learning method to pre-train models applied in Earth observation downstream tasks when only a handful of labels denoting very general semantic concepts are available. We combine a contrastive approach to pre-train models with a pixel-wise regression pre-text task to predict coarse elevation maps, which are commonly available worldwide. We hypothesize that this will allow the model to pre-learn useful representations, as there is generally some correlation between elevation maps and targets in many remote sensing tasks. We assess the performance of our approach on a binary semantic segmentation task and a binary image classification task, both derived from a dataset created for the northwest of Colombia. In both cases, we pre-train our models with 39k unlabeled images, fine-tune them on the downstream tasks with only 80 labeled images, and evaluate them with 2944 labeled images. Our experiments show that our methods, GLCNet+Elevation for segmentation, and SimCLR+Elevation for classification, outperform their counterparts without the pixel-wise regression pre-text task, namely SimCLR and GLCNet, in terms of macro-average F1 Score and Mean Intersection over Union (MIoU). Our study not only encourages the development of pre-training methods that leverage readily available geographical information, such as elevation data, to enhance the performance of self-supervised methods when applied to Earth observation tasks, but also promotes the use of datasets with high-level semantic labels, which are more likely to be updated frequently. Project code can be found in this link \href{https://github.com/omarcastano/Elevation-Aware-SSL}{https://github.com/omarcastano/Elevation-Aware-SSL}

    VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation

    Full text link
    Rich and dense human labeled datasets are among the main enabling factors for the recent advance on vision-language understanding. Many seemingly distant annotations (e.g., semantic segmentation and visual question answering (VQA)) are inherently connected in that they reveal different levels and perspectives of human understandings about the same visual scenes --- and even the same set of images (e.g., of COCO). The popularity of COCO correlates those annotations and tasks. Explicitly linking them up may significantly benefit both individual tasks and the unified vision and language modeling. We present the preliminary work of linking the instance segmentations provided by COCO to the questions and answers (QAs) in the VQA dataset, and name the collected links visual questions and segmentation answers (VQS). They transfer human supervision between the previously separate tasks, offer more effective leverage to existing problems, and also open the door for new research problems and models. We study two applications of the VQS data in this paper: supervised attention for VQA and a novel question-focused semantic segmentation task. For the former, we obtain state-of-the-art results on the VQA real multiple-choice task by simply augmenting the multilayer perceptrons with some attention features that are learned using the segmentation-QA links as explicit supervision. To put the latter in perspective, we study two plausible methods and compare them to an oracle method assuming that the instance segmentations are given at the test stage.Comment: To appear on ICCV 201