4,315 research outputs found
Data Augmentation for Spoken Language Understanding via Joint Variational Generation
Data scarcity is one of the main obstacles of domain adaptation in spoken
language understanding (SLU) due to the high cost of creating manually tagged
SLU datasets. Recent works in neural text generative models, particularly
latent variable models such as variational autoencoder (VAE), have shown
promising results in regards to generating plausible and natural sentences. In
this paper, we propose a novel generative architecture which leverages the
generative power of latent variable models to jointly synthesize fully
annotated utterances. Our experiments show that existing SLU models trained on
the additional synthetic examples achieve performance gains. Our approach not
only helps alleviate the data scarcity issue in the SLU task for many datasets
but also indiscriminately improves language understanding performances for
various SLU models, supported by extensive experiments and rigorous statistical
testing.Comment: 8 pages, 3 figures, 4 tables, Accepted in AAAI201
Learning to Compose Task-Specific Tree Structures
For years, recursive neural networks (RvNNs) have been shown to be suitable
for representing text into fixed-length vectors and achieved good performance
on several natural language processing tasks. However, the main drawback of
RvNNs is that they require structured input, which makes data preparation and
model implementation hard. In this paper, we propose Gumbel Tree-LSTM, a novel
tree-structured long short-term memory architecture that learns how to compose
task-specific tree structures only from plain text data efficiently. Our model
uses Straight-Through Gumbel-Softmax estimator to decide the parent node among
candidates dynamically and to calculate gradients of the discrete decision. We
evaluate the proposed model on natural language inference and sentiment
analysis, and show that our model outperforms or is at least comparable to
previous models. We also find that our model converges significantly faster
than other models.Comment: AAAI 201
Self-Guided Contrastive Learning for BERT Sentence Representations
Although BERT and its variants have reshaped the NLP landscape, it still
remains unclear how best to derive sentence embeddings from such pre-trained
Transformers. In this work, we propose a contrastive learning method that
utilizes self-guidance for improving the quality of BERT sentence
representations. Our method fine-tunes BERT in a self-supervised fashion, does
not rely on data augmentation, and enables the usual [CLS] token embeddings to
function as sentence vectors. Moreover, we redesign the contrastive learning
objective (NT-Xent) and apply it to sentence representation learning. We
demonstrate with extensive experiments that our approach is more effective than
competitive baselines on diverse sentence-related tasks. We also show it is
efficient at inference and robust to domain shifts.Comment: ACL 202
Computer use at work is associated with self-reported depressive and anxiety disorder
Adjusted OR* of DAD considering the combined effect of computer use and occupational group, education, and job status. (DOC 61 kb
Prompt-Augmented Linear Probing: Scaling Beyond The Limit of Few-shot In-Context Learners
Through in-context learning (ICL), large-scale language models are effective
few-shot learners without additional model fine-tuning. However, the ICL
performance does not scale well with the number of available training samples
as it is limited by the inherent input length constraint of the underlying
language model. Meanwhile, many studies have revealed that language models are
also powerful feature extractors, allowing them to be utilized in a black-box
manner and enabling the linear probing paradigm, where lightweight
discriminators are trained on top of the pre-extracted input representations.
This paper proposes prompt-augmented linear probing (PALP), a hybrid of linear
probing and ICL, which leverages the best of both worlds. PALP inherits the
scalability of linear probing and the capability of enforcing language models
to derive more meaningful representations via tailoring input into a more
conceivable form. Throughout in-depth investigations on various datasets, we
verified that PALP significantly enhances the input representations closing the
gap between ICL in the data-hungry scenario and fine-tuning in the
data-abundant scenario with little training overhead, potentially making PALP a
strong alternative in a black-box scenario.Comment: AAAI 202
Continuous Decomposition of Granularity for Neural Paraphrase Generation
While Transformers have had significant success in paragraph generation, they
treat sentences as linear sequences of tokens and often neglect their
hierarchical information. Prior work has shown that decomposing the levels of
granularity~(e.g., word, phrase, or sentence) for input tokens has produced
substantial improvements, suggesting the possibility of enhancing Transformers
via more fine-grained modeling of granularity. In this work, we propose a
continuous decomposition of granularity for neural paraphrase generation
(C-DNPG). In order to efficiently incorporate granularity into sentence
encoding, C-DNPG introduces a granularity-aware attention (GA-Attention)
mechanism which extends the multi-head self-attention with: 1) a granularity
head that automatically infers the hierarchical structure of a sentence by
neurally estimating the granularity level of each input token; and 2) two novel
attention masks, namely, granularity resonance and granularity scope, to
efficiently encode granularity into attention. Experiments on two benchmarks,
including Quora question pairs and Twitter URLs have shown that C-DNPG
outperforms baseline models by a remarkable margin and achieves
state-of-the-art results in terms of many metrics. Qualitative analysis reveals
that C-DNPG indeed captures fine-grained levels of granularity with
effectiveness.Comment: Accepted to be published in COLING 202
Evaluation of Left Atrial Volumes Using Multidetector Computed Tomography: Comparison with Echocardiography
OBJECTIVE: To prospectively assess the relationship between the two different measurement methods for the evaluation of left atrial (LA) volume using cardiac multidetector computed tomography (MDCT) and to compare the results between cardiac MDCT and echocardiography. MATERIALS AND METHODS: Thirty-five patients (20 men, 15 women; mean age, 60 years) underwent cardiac MDCT angiography for coronary artery disease. The LA volumes were measured using two different methods: the two dimensional (2D) length-based (LB) method measured along the three-orthogonal planes of the LA and the 3D volumetric threshold-based (VTB) method measured according to the threshold 3D segmentation of the LA. The results obtained by cardiac MDCT were compared with those obtained by echocardiography. RESULTS: The LA end-systolic and end-diastolic volumes (LAESV and LAEDV) measured by the 2D-LB method correlated well with those measured by the 3D-VTB method using cardiac MDCT (r = 0.763, r = 0.786, p = 0.001). However, there was a significant difference in the LAESVs between the two measurement methods using cardiac MDCT (p < 0.05). The LAESV measured by cardiac MDCT correlated well with measurements by echocardiography (r = 0.864, p = 0.001), however with a significant difference (p < 0.01) in their volumes. The cardiac MDCT overestimated the LAESV by 22% compared to measurements by echocardiography. CONCLUSION: A significant correlation was found between the two different measurement methods for evaluating LA volumes by cardiac MDCT. Further, cardiac MDCT correlates well with echocardiography in evaluating the LA volume. However, there are significant differences in the LAESV between the two measurement methods using cardiac MDCT and between cardiac MDCT and echocardiographyope
- …