10 research outputs found

    Blessing Nonvital Tooth with Life through Revascularization

    Get PDF
    In recent times, revascularization has been found to be a better alternative in treatment of immature, nonvital tooth with blunderbuss canal since, it enables formation of root apex radiographically which allows the clinician to get a better hermeatic seal in the apical area. Success of the treatment also dependsgreatly upon the disinfection of the canal, which is achieved not only by the use of intracanal irrigants, but also with the use of intracanal medicaments like triple antibiotic paste, which is followed by getting a good coronal seal to prevent orthograde infection during the procedure. However, long-term prognosisof the treatment and the tissue occupying the canal space requires further investigatio

    A novel multimodal dynamic fusion network for disfluency detection in spoken utterances

    Full text link
    Disfluency, though originating from human spoken utterances, is primarily studied as a uni-modal text-based Natural Language Processing (NLP) task. Based on early-fusion and self-attention-based multimodal interaction between text and acoustic modalities, in this paper, we propose a novel multimodal architecture for disfluency detection from individual utterances. Our architecture leverages a multimodal dynamic fusion network that adds minimal parameters over an existing text encoder commonly used in prior art to leverage the prosodic and acoustic cues hidden in speech. Through experiments, we show that our proposed model achieves state-of-the-art results on the widely used English Switchboard for disfluency detection and outperforms prior unimodal and multimodal systems in literature by a significant margin. In addition, we make a thorough qualitative analysis and show that, unlike text-only systems, which suffer from spurious correlations in the data, our system overcomes this problem through additional cues from speech signals. We make all our codes publicly available on GitHub.Comment: Submitted to ICASSP 2023. arXiv admin note: text overlap with arXiv:2203.1679

    AdVerb: Visually Guided Audio Dereverberation

    Full text link
    We present AdVerb, a novel audio-visual dereverberation framework that uses visual cues in addition to the reverberant sound to estimate clean audio. Although audio-only dereverberation is a well-studied problem, our approach incorporates the complementary visual modality to perform audio dereverberation. Given an image of the environment where the reverberated sound signal has been recorded, AdVerb employs a novel geometry-aware cross-modal transformer architecture that captures scene geometry and audio-visual cross-modal relationship to generate a complex ideal ratio mask, which, when applied to the reverberant audio predicts the clean sound. The effectiveness of our method is demonstrated through extensive quantitative and qualitative evaluations. Our approach significantly outperforms traditional audio-only and audio-visual baselines on three downstream tasks: speech enhancement, speech recognition, and speaker verification, with relative improvements in the range of 18% - 82% on the LibriSpeech test-clean set. We also achieve highly satisfactory RT60 error scores on the AVSpeech dataset.Comment: Accepted at ICCV 2023. For project page, see https://gamma.umd.edu/researchdirections/speech/adver

    CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a Context Synergized Hyperbolic Network

    Full text link
    The tremendous growth of social media users interacting in online conversations has also led to significant growth in hate speech. Most of the prior works focus on detecting explicit hate speech, which is overt and leverages hateful phrases, with very little work focusing on detecting hate speech that is implicit or denotes hatred through indirect or coded language. In this paper, we present CoSyn, a user- and conversational-context synergized network for detecting implicit hate speech in online conversation trees. CoSyn first models the user's personal historical and social context using a novel hyperbolic Fourier attention mechanism and hyperbolic graph convolution network. Next, we jointly model the user's personal context and the conversational context using a novel context interaction mechanism in the hyperbolic space that clearly captures the interplay between the two and makes independent assessments on the amounts of information to be retrieved from both contexts. CoSyn performs all operations in the hyperbolic space to account for the scale-free dynamics of social media. We demonstrate the effectiveness of CoSyn both qualitatively and quantitatively on an open-source hate speech dataset with Twitter conversations and show that CoSyn outperforms all our baselines in detecting implicit hate speech with absolute improvements in the range of 8.15% - 19.50%.Comment: Under review at IJCAI 202

    DALE: Generative Data Augmentation for Low-Resource Legal NLP

    Full text link
    We present DALE, a novel and effective generative Data Augmentation framework for low-resource LEgal NLP. DALE addresses the challenges existing frameworks pose in generating effective data augmentations of legal documents - legal language, with its specialized vocabulary and complex semantics, morphology, and syntax, does not benefit from data augmentations that merely rephrase the source sentence. To address this, DALE, built on an Encoder-Decoder Language Model, is pre-trained on a novel unsupervised text denoising objective based on selective masking - our masking strategy exploits the domain-specific language characteristics of templatized legal documents to mask collocated spans of text. Denoising these spans helps DALE acquire knowledge about legal concepts, principles, and language usage. Consequently, it develops the ability to generate coherent and diverse augmentations with novel contexts. Finally, DALE performs conditional generation to generate synthetic augmentations for low-resource Legal NLP tasks. We demonstrate the effectiveness of DALE on 13 datasets spanning 6 tasks and 4 low-resource settings. DALE outperforms all our baselines, including LLMs, qualitatively and quantitatively, with improvements of 1%-50%.Comment: Accepted to EMNLP 2023 Main Conference. Code: https://github.com/Sreyan88/DAL

    ASPIRE: Language-Guided Augmentation for Robust Image Classification

    Full text link
    Neural image classifiers can often learn to make predictions by overly relying on non-predictive features that are spuriously correlated with the class labels in the training data. This leads to poor performance in real-world atypical scenarios where such features are absent. Supplementing the training dataset with images without such spurious features can aid robust learning against spurious correlations via better generalization. This paper presents ASPIRE (Language-guided data Augmentation for SPurIous correlation REmoval), a simple yet effective solution for expanding the training dataset with synthetic images without spurious features. ASPIRE, guided by language, generates these images without requiring any form of additional supervision or existing examples. Precisely, we employ LLMs to first extract foreground and background features from textual descriptions of an image, followed by advanced language-guided image editing to discover the features that are spuriously correlated with the class label. Finally, we personalize a text-to-image generation model to generate diverse in-domain images without spurious features. We demonstrate the effectiveness of ASPIRE on 4 datasets, including the very challenging Hard ImageNet dataset, and 9 baselines and show that ASPIRE improves the classification accuracy of prior methods by 1% - 38%. Code soon at: https://github.com/Sreyan88/ASPIRE.Comment: Pre-print Under Revie

    CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models

    Full text link
    A fundamental characteristic of audio is its compositional nature. Audio-language models (ALMs) trained using a contrastive approach (e.g., CLAP) that learns a shared representation between audio and language modalities have improved performance in many downstream applications, including zero-shot audio classification, audio retrieval, etc. However, the ability of these models to effectively perform compositional reasoning remains largely unexplored and necessitates additional research. In this paper, we propose CompA, a collection of two expert-annotated benchmarks with a majority of real-world audio samples, to evaluate compositional reasoning in ALMs. Our proposed CompA-order evaluates how well an ALM understands the order or occurrence of acoustic events in audio, and CompA-attribute evaluates attribute binding of acoustic events. An instance from either benchmark consists of two audio-caption pairs, where both audios have the same acoustic events but with different compositions. An ALM is evaluated on how well it matches the right audio to the right caption. Using this benchmark, we first show that current ALMs perform only marginally better than random chance, thereby struggling with compositional reasoning. Next, we propose CompA-CLAP, where we fine-tune CLAP using a novel learning method to improve its compositional reasoning abilities. To train CompA-CLAP, we first propose improvements to contrastive training with composition-aware hard negatives, allowing for more focused training. Next, we propose a novel modular contrastive loss that helps the model learn fine-grained compositional understanding and overcomes the acute scarcity of openly available compositional audios. CompA-CLAP significantly improves over all our baseline models on the CompA benchmark, indicating its superior compositional reasoning capabilities.Comment: Pre-print under revie

    An ultra-thin quad-band metamaterial inspired absorber using symmetric bent-arrow shaped resonator for sensing and imaging in defense applications

    No full text
    In this paper, a novel compact quad-band polarization insensitive metamaterial absorber has been proposed to be employable in the microwave frequency regime. The unit-cell geometry comprises of four symmetric bent-arrow shaped resonators, where each arrow has been bounded by an open ring. The resultant structure is further surrounded by a closed ring to obtain an extra resonance band. Full-wave simulation with normal incidence depicts quad-band operation with absorption peaks at 3.24 GHz (S-band), 6.55 GHz (C-band), 15.22 GHz (Ku-Band), 15.94 GHz (Ku-band) and absorptivity levels of 99.57%, 99.94%, 96.10%, 98.65% correspondingly. It also shows full width half maximum (FWHM) bandwidth of 100 MHz, 200 MHz and 1210 MHz in the first, second and third bands respectively. Furthermore, the proposed structure is based on four-fold symmetry therefore exhibits polarization-insensitive behaviour unlike conventional absorbers. The structure is fabricated on 1 mm FR4 Glass Epoxy substrate equivalent to 0.0108 λ _0 hence, can be used as absorber coating for planar surfaces. The designed absorber has been fabricated and experimental results were in good agreement with the simulated responses enabling its wide application in various technologies like stealth technology, radar cross section reduction, anechoic chambers, electromagnetic interference/electromagnetic compatibility, and radio frequency identification

    M-MELD: A Multilingual Multi-Party Dataset for Emotion Recognition in Conversations

    Full text link
    Expression of emotions is a crucial part of daily human communication. Emotion recognition in conversations (ERC) is an emerging field of study, where the primary task is to identify the emotion behind each utterance in a conversation. Though a lot of work has been done on ERC in the past, these works only focus on ERC in the English language, thereby ignoring any other languages. In this paper, we present Multilingual MELD (M-MELD), where we extend the Multimodal EmotionLines Dataset (MELD) \cite{poria2018meld} to 4 other languages beyond English, namely Greek, Polish, French, and Spanish. Beyond just establishing strong baselines for all of these 4 languages, we also propose a novel architecture, DiscLSTM, that uses both sequential and conversational discourse context in a conversational dialogue for ERC. Our proposed approach is computationally efficient, can transfer across languages using just a cross-lingual encoder, and achieves better performance than most uni-modal text approaches in the literature on both MELD and M-MELD. We make our data and code publicly on GitHub.Comment: Submitted to ICASSP 202
    corecore