10 research outputs found
Blessing Nonvital Tooth with Life through Revascularization
In recent times, revascularization has been found to be a better alternative in treatment of immature, nonvital tooth with blunderbuss canal since, it enables formation of root apex radiographically which allows the clinician to get a better hermeatic seal in the apical area. Success of the treatment also dependsgreatly upon the disinfection of the canal, which is achieved not only by the use of intracanal irrigants, but also with the use of intracanal medicaments like triple antibiotic paste, which is followed by getting a good coronal seal to prevent orthograde infection during the procedure. However, long-term prognosisof the treatment and the tissue occupying the canal space requires further investigatio
A novel multimodal dynamic fusion network for disfluency detection in spoken utterances
Disfluency, though originating from human spoken utterances, is primarily
studied as a uni-modal text-based Natural Language Processing (NLP) task. Based
on early-fusion and self-attention-based multimodal interaction between text
and acoustic modalities, in this paper, we propose a novel multimodal
architecture for disfluency detection from individual utterances. Our
architecture leverages a multimodal dynamic fusion network that adds minimal
parameters over an existing text encoder commonly used in prior art to leverage
the prosodic and acoustic cues hidden in speech. Through experiments, we show
that our proposed model achieves state-of-the-art results on the widely used
English Switchboard for disfluency detection and outperforms prior unimodal and
multimodal systems in literature by a significant margin. In addition, we make
a thorough qualitative analysis and show that, unlike text-only systems, which
suffer from spurious correlations in the data, our system overcomes this
problem through additional cues from speech signals. We make all our codes
publicly available on GitHub.Comment: Submitted to ICASSP 2023. arXiv admin note: text overlap with
arXiv:2203.1679
AdVerb: Visually Guided Audio Dereverberation
We present AdVerb, a novel audio-visual dereverberation framework that uses
visual cues in addition to the reverberant sound to estimate clean audio.
Although audio-only dereverberation is a well-studied problem, our approach
incorporates the complementary visual modality to perform audio
dereverberation. Given an image of the environment where the reverberated sound
signal has been recorded, AdVerb employs a novel geometry-aware cross-modal
transformer architecture that captures scene geometry and audio-visual
cross-modal relationship to generate a complex ideal ratio mask, which, when
applied to the reverberant audio predicts the clean sound. The effectiveness of
our method is demonstrated through extensive quantitative and qualitative
evaluations. Our approach significantly outperforms traditional audio-only and
audio-visual baselines on three downstream tasks: speech enhancement, speech
recognition, and speaker verification, with relative improvements in the range
of 18% - 82% on the LibriSpeech test-clean set. We also achieve highly
satisfactory RT60 error scores on the AVSpeech dataset.Comment: Accepted at ICCV 2023. For project page, see
https://gamma.umd.edu/researchdirections/speech/adver
CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a Context Synergized Hyperbolic Network
The tremendous growth of social media users interacting in online
conversations has also led to significant growth in hate speech. Most of the
prior works focus on detecting explicit hate speech, which is overt and
leverages hateful phrases, with very little work focusing on detecting hate
speech that is implicit or denotes hatred through indirect or coded language.
In this paper, we present CoSyn, a user- and conversational-context synergized
network for detecting implicit hate speech in online conversation trees. CoSyn
first models the user's personal historical and social context using a novel
hyperbolic Fourier attention mechanism and hyperbolic graph convolution
network. Next, we jointly model the user's personal context and the
conversational context using a novel context interaction mechanism in the
hyperbolic space that clearly captures the interplay between the two and makes
independent assessments on the amounts of information to be retrieved from both
contexts. CoSyn performs all operations in the hyperbolic space to account for
the scale-free dynamics of social media. We demonstrate the effectiveness of
CoSyn both qualitatively and quantitatively on an open-source hate speech
dataset with Twitter conversations and show that CoSyn outperforms all our
baselines in detecting implicit hate speech with absolute improvements in the
range of 8.15% - 19.50%.Comment: Under review at IJCAI 202
DALE: Generative Data Augmentation for Low-Resource Legal NLP
We present DALE, a novel and effective generative Data Augmentation framework
for low-resource LEgal NLP. DALE addresses the challenges existing frameworks
pose in generating effective data augmentations of legal documents - legal
language, with its specialized vocabulary and complex semantics, morphology,
and syntax, does not benefit from data augmentations that merely rephrase the
source sentence. To address this, DALE, built on an Encoder-Decoder Language
Model, is pre-trained on a novel unsupervised text denoising objective based on
selective masking - our masking strategy exploits the domain-specific language
characteristics of templatized legal documents to mask collocated spans of
text. Denoising these spans helps DALE acquire knowledge about legal concepts,
principles, and language usage. Consequently, it develops the ability to
generate coherent and diverse augmentations with novel contexts. Finally, DALE
performs conditional generation to generate synthetic augmentations for
low-resource Legal NLP tasks. We demonstrate the effectiveness of DALE on 13
datasets spanning 6 tasks and 4 low-resource settings. DALE outperforms all our
baselines, including LLMs, qualitatively and quantitatively, with improvements
of 1%-50%.Comment: Accepted to EMNLP 2023 Main Conference. Code:
https://github.com/Sreyan88/DAL
ASPIRE: Language-Guided Augmentation for Robust Image Classification
Neural image classifiers can often learn to make predictions by overly
relying on non-predictive features that are spuriously correlated with the
class labels in the training data. This leads to poor performance in real-world
atypical scenarios where such features are absent. Supplementing the training
dataset with images without such spurious features can aid robust learning
against spurious correlations via better generalization. This paper presents
ASPIRE (Language-guided data Augmentation for SPurIous correlation REmoval), a
simple yet effective solution for expanding the training dataset with synthetic
images without spurious features. ASPIRE, guided by language, generates these
images without requiring any form of additional supervision or existing
examples. Precisely, we employ LLMs to first extract foreground and background
features from textual descriptions of an image, followed by advanced
language-guided image editing to discover the features that are spuriously
correlated with the class label. Finally, we personalize a text-to-image
generation model to generate diverse in-domain images without spurious
features. We demonstrate the effectiveness of ASPIRE on 4 datasets, including
the very challenging Hard ImageNet dataset, and 9 baselines and show that
ASPIRE improves the classification accuracy of prior methods by 1% - 38%. Code
soon at: https://github.com/Sreyan88/ASPIRE.Comment: Pre-print Under Revie
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
A fundamental characteristic of audio is its compositional nature.
Audio-language models (ALMs) trained using a contrastive approach (e.g., CLAP)
that learns a shared representation between audio and language modalities have
improved performance in many downstream applications, including zero-shot audio
classification, audio retrieval, etc. However, the ability of these models to
effectively perform compositional reasoning remains largely unexplored and
necessitates additional research. In this paper, we propose CompA, a collection
of two expert-annotated benchmarks with a majority of real-world audio samples,
to evaluate compositional reasoning in ALMs. Our proposed CompA-order evaluates
how well an ALM understands the order or occurrence of acoustic events in
audio, and CompA-attribute evaluates attribute binding of acoustic events. An
instance from either benchmark consists of two audio-caption pairs, where both
audios have the same acoustic events but with different compositions. An ALM is
evaluated on how well it matches the right audio to the right caption. Using
this benchmark, we first show that current ALMs perform only marginally better
than random chance, thereby struggling with compositional reasoning. Next, we
propose CompA-CLAP, where we fine-tune CLAP using a novel learning method to
improve its compositional reasoning abilities. To train CompA-CLAP, we first
propose improvements to contrastive training with composition-aware hard
negatives, allowing for more focused training. Next, we propose a novel modular
contrastive loss that helps the model learn fine-grained compositional
understanding and overcomes the acute scarcity of openly available
compositional audios. CompA-CLAP significantly improves over all our baseline
models on the CompA benchmark, indicating its superior compositional reasoning
capabilities.Comment: Pre-print under revie
An ultra-thin quad-band metamaterial inspired absorber using symmetric bent-arrow shaped resonator for sensing and imaging in defense applications
In this paper, a novel compact quad-band polarization insensitive metamaterial absorber has been proposed to be employable in the microwave frequency regime. The unit-cell geometry comprises of four symmetric bent-arrow shaped resonators, where each arrow has been bounded by an open ring. The resultant structure is further surrounded by a closed ring to obtain an extra resonance band. Full-wave simulation with normal incidence depicts quad-band operation with absorption peaks at 3.24 GHz (S-band), 6.55 GHz (C-band), 15.22 GHz (Ku-Band), 15.94 GHz (Ku-band) and absorptivity levels of 99.57%, 99.94%, 96.10%, 98.65% correspondingly. It also shows full width half maximum (FWHM) bandwidth of 100 MHz, 200 MHz and 1210 MHz in the first, second and third bands respectively. Furthermore, the proposed structure is based on four-fold symmetry therefore exhibits polarization-insensitive behaviour unlike conventional absorbers. The structure is fabricated on 1 mm FR4 Glass Epoxy substrate equivalent to 0.0108 λ _0 hence, can be used as absorber coating for planar surfaces. The designed absorber has been fabricated and experimental results were in good agreement with the simulated responses enabling its wide application in various technologies like stealth technology, radar cross section reduction, anechoic chambers, electromagnetic interference/electromagnetic compatibility, and radio frequency identification
M-MELD: A Multilingual Multi-Party Dataset for Emotion Recognition in Conversations
Expression of emotions is a crucial part of daily human communication.
Emotion recognition in conversations (ERC) is an emerging field of study, where
the primary task is to identify the emotion behind each utterance in a
conversation. Though a lot of work has been done on ERC in the past, these
works only focus on ERC in the English language, thereby ignoring any other
languages. In this paper, we present Multilingual MELD (M-MELD), where we
extend the Multimodal EmotionLines Dataset (MELD) \cite{poria2018meld} to 4
other languages beyond English, namely Greek, Polish, French, and Spanish.
Beyond just establishing strong baselines for all of these 4 languages, we also
propose a novel architecture, DiscLSTM, that uses both sequential and
conversational discourse context in a conversational dialogue for ERC. Our
proposed approach is computationally efficient, can transfer across languages
using just a cross-lingual encoder, and achieves better performance than most
uni-modal text approaches in the literature on both MELD and M-MELD. We make
our data and code publicly on GitHub.Comment: Submitted to ICASSP 202