49 research outputs found

    Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar Instances

    Full text link
    Despite the great progress of Visual Question Answering (VQA), current VQA models heavily rely on the superficial correlation between the question type and its corresponding frequent answers (i.e., language priors) to make predictions, without really understanding the input. In this work, we define the training instances with the same question type but different answers as \textit{superficially similar instances}, and attribute the language priors to the confusion of VQA model on such instances. To solve this problem, we propose a novel training framework that explicitly encourages the VQA model to distinguish between the superficially similar instances. Specifically, for each training instance, we first construct a set that contains its superficially similar counterparts. Then we exploit the proposed distinguishing module to increase the distance between the instance and its counterparts in the answer space. In this way, the VQA model is forced to further focus on the other parts of the input beyond the question type, which helps to overcome the language priors. Experimental results show that our method achieves the state-of-the-art performance on VQA-CP v2. Codes are available at \href{https://github.com/wyk-nku/Distinguishing-VQA.git}{Distinguishing-VQA}.Comment: Published in COLING 202

    A Bayesian Updating Scheme for Pandemics: Estimating the Infection Dynamics of COVID-19

    Get PDF
    Epidemic models play a key role in understanding and responding to the emerging COVID-19 pandemic. Widely used compartmental models are static and are of limited use to evaluate intervention strategies of combatting the pandemic. Applying the technology of data assimilation, we propose a Bayesian updating approach for estimating epidemiological parameters using observable information to assess the impacts of different intervention strategies. We adopt a concise renewal model and propose new parameters by disentangling the reduction of instantaneous reproduction number R_t into mitigation and suppression factors to quantify intervention impacts at a finer granularity. A data assimilation framework is developed to estimate these parameters including constructing an observation function and developing a Bayesian updating scheme. A statistical analysis framework is built to quantify the impacts of intervention strategies by monitoring the evolution of the estimated parameters. We reveal the intervention impacts in European countries and Wuhan and the resurgence risk in the United States

    MMBench: Is Your Multi-modal Model an All-around Player?

    Full text link
    Large vision-language models have recently achieved remarkable progress, exhibiting great perception and reasoning abilities concerning visual information. However, how to effectively evaluate these large vision-language models remains a major obstacle, hindering future model development. Traditional benchmarks like VQAv2 or COCO Caption provide quantitative performance measurements but suffer from a lack of fine-grained ability assessment and non-robust evaluation metrics. Recent subjective benchmarks, such as OwlEval, offer comprehensive evaluations of a model's abilities by incorporating human labor, but they are not scalable and display significant bias. In response to these challenges, we propose MMBench, a novel multi-modality benchmark. MMBench methodically develops a comprehensive evaluation pipeline, primarily comprised of two elements. The first element is a meticulously curated dataset that surpasses existing similar benchmarks in terms of the number and variety of evaluation questions and abilities. The second element introduces a novel CircularEval strategy and incorporates the use of ChatGPT. This implementation is designed to convert free-form predictions into pre-defined choices, thereby facilitating a more robust evaluation of the model's predictions. MMBench is a systematically-designed objective benchmark for robustly evaluating the various abilities of vision-language models. We hope MMBench will assist the research community in better evaluating their models and encourage future advancements in this domain. Project page: https://opencompass.org.cn/mmbench

    LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

    Full text link
    We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method achieving state-of-the-art performance on various lyrics transcription datasets, even in challenging genres such as rock and metal. Our novel, training-free approach utilizes Whisper, a weakly supervised robust speech recognition model, and GPT-4, today's most performant chat-based large language model. In the proposed method, Whisper functions as the "ear" by transcribing the audio, while GPT-4 serves as the "brain," acting as an annotator with a strong performance for contextualized output selection and correction. Our experiments show that LyricWhiz significantly reduces Word Error Rate compared to existing methods in English and can effectively transcribe lyrics across multiple languages. Furthermore, we use LyricWhiz to create the first publicly available, large-scale, multilingual lyrics transcription dataset with a CC-BY-NC-SA copyright license, based on MTG-Jamendo, and offer a human-annotated subset for noise level estimation and evaluation. We anticipate that our proposed method and dataset will advance the development of multilingual lyrics transcription, a challenging and emerging task.Comment: 9 pages, 2 figures, 5 tables, accepted by ISMIR 202

    On the Effectiveness of Speech Self-supervised Learning for Music

    Full text link
    Self-supervised learning (SSL) has shown promising results in various speech and natural language processing applications. However, its efficacy in music information retrieval (MIR) still remains largely unexplored. While previous SSL models pre-trained on music recordings may have been mostly closed-sourced, recent speech models such as wav2vec2.0 have shown promise in music modelling. Nevertheless, research exploring the effectiveness of applying speech SSL models to music recordings has been limited. We explore the music adaption of SSL with two distinctive speech-related models, data2vec1.0 and Hubert, and refer to them as music2vec and musicHuBERT, respectively. We train 1212 SSL models with 95M parameters under various pre-training configurations and systematically evaluate the MIR task performances with 13 different MIR tasks. Our findings suggest that training with music data can generally improve performance on MIR tasks, even when models are trained using paradigms designed for speech. However, we identify the limitations of such existing speech-oriented designs, especially in modelling polyphonic information. Based on the experimental results, empirical suggestions are also given for designing future musical SSL strategies and paradigms

    MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

    Full text link
    Self-supervised learning (SSL) has recently emerged as a promising paradigm for training generalisable models on large-scale data in the fields of vision, text, and speech. Although SSL has been proven effective in speech and audio, its application to music audio has yet to be thoroughly explored. This is primarily due to the distinctive challenges associated with modelling musical knowledge, particularly its tonal and pitched characteristics of music. To address this research gap, we propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training. In our exploration, we identified a superior combination of teacher models, which outperforms conventional speech and audio approaches in terms of performance. This combination includes an acoustic teacher based on Residual Vector Quantization - Variational AutoEncoder (RVQ-VAE) and a musical teacher based on the Constant-Q Transform (CQT). These teachers effectively guide our student model, a BERT-style transformer encoder, to better model music audio. In addition, we introduce an in-batch noise mixture augmentation to enhance the representation robustness. Furthermore, we explore a wide range of settings to overcome the instability in acoustic language model pre-training, which allows our designed paradigm to scale from 95M to 330M parameters. Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attains state-of-the-art (SOTA) overall scores. The code and models are online: https://github.com/yizhilll/MERT

    Phosphorus Release and Adsorption Properties of Polyurethane–Biochar Crosslinked Material as a Filter Additive in Bioretention Systems

    No full text
    Bioretention systems are frequently employed in stormwater treatment to reduce phosphorus pollution and prevent eutrophication. To enhance their efficiency, filter additives are required but the currently used traditional materials cannot meet the primary requirements of excellent hydraulic properties as well as outstanding release and adsorption capacities at the same time. In this research, a polyurethane-biochar crosslinked material was produced by mixing the hardwood biochar (HB) with polyurethane to improve the performance of traditional filter additives. Through basic parameter tests, the saturated water content of polyurethane-biochar crosslinked material (PCB) was doubled and the permeability coefficient of PCB increased by two orders of magnitude. Due to the polyurethane, the leaching speed of phosphorus slowed down in the batching experiments and fewer metal cations leached. Moreover, PCB could adsorb 93–206 mg/kg PO43− at a typical PO43− concentration in stormwater runoff, 1.32–1.58 times more than HB, during isothermal adsorption experiments. In the simulating column experiments, weaker hydropower reduced the PO43− leaching quantities of PCB and had a stable removal rate of 93.84% in phosphate treatment. This study demonstrates the potential use of PCB as a filter additive in a bioretention system to achieve hydraulic goals and improve phosphate adsorption capacities

    Phosphorus Release and Adsorption Properties of Polyurethane–Biochar Crosslinked Material as a Filter Additive in Bioretention Systems

    No full text
    Bioretention systems are frequently employed in stormwater treatment to reduce phosphorus pollution and prevent eutrophication. To enhance their efficiency, filter additives are required but the currently used traditional materials cannot meet the primary requirements of excellent hydraulic properties as well as outstanding release and adsorption capacities at the same time. In this research, a polyurethane-biochar crosslinked material was produced by mixing the hardwood biochar (HB) with polyurethane to improve the performance of traditional filter additives. Through basic parameter tests, the saturated water content of polyurethane-biochar crosslinked material (PCB) was doubled and the permeability coefficient of PCB increased by two orders of magnitude. Due to the polyurethane, the leaching speed of phosphorus slowed down in the batching experiments and fewer metal cations leached. Moreover, PCB could adsorb 93–206 mg/kg PO43− at a typical PO43− concentration in stormwater runoff, 1.32–1.58 times more than HB, during isothermal adsorption experiments. In the simulating column experiments, weaker hydropower reduced the PO43− leaching quantities of PCB and had a stable removal rate of 93.84% in phosphate treatment. This study demonstrates the potential use of PCB as a filter additive in a bioretention system to achieve hydraulic goals and improve phosphate adsorption capacities
    corecore