73 research outputs found

    MMBench: Is Your Multi-modal Model an All-around Player?

    Full text link
    Large vision-language models have recently achieved remarkable progress, exhibiting great perception and reasoning abilities concerning visual information. However, how to effectively evaluate these large vision-language models remains a major obstacle, hindering future model development. Traditional benchmarks like VQAv2 or COCO Caption provide quantitative performance measurements but suffer from a lack of fine-grained ability assessment and non-robust evaluation metrics. Recent subjective benchmarks, such as OwlEval, offer comprehensive evaluations of a model's abilities by incorporating human labor, but they are not scalable and display significant bias. In response to these challenges, we propose MMBench, a novel multi-modality benchmark. MMBench methodically develops a comprehensive evaluation pipeline, primarily comprised of two elements. The first element is a meticulously curated dataset that surpasses existing similar benchmarks in terms of the number and variety of evaluation questions and abilities. The second element introduces a novel CircularEval strategy and incorporates the use of ChatGPT. This implementation is designed to convert free-form predictions into pre-defined choices, thereby facilitating a more robust evaluation of the model's predictions. MMBench is a systematically-designed objective benchmark for robustly evaluating the various abilities of vision-language models. We hope MMBench will assist the research community in better evaluating their models and encourage future advancements in this domain. Project page: https://opencompass.org.cn/mmbench

    LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

    Full text link
    We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method achieving state-of-the-art performance on various lyrics transcription datasets, even in challenging genres such as rock and metal. Our novel, training-free approach utilizes Whisper, a weakly supervised robust speech recognition model, and GPT-4, today's most performant chat-based large language model. In the proposed method, Whisper functions as the "ear" by transcribing the audio, while GPT-4 serves as the "brain," acting as an annotator with a strong performance for contextualized output selection and correction. Our experiments show that LyricWhiz significantly reduces Word Error Rate compared to existing methods in English and can effectively transcribe lyrics across multiple languages. Furthermore, we use LyricWhiz to create the first publicly available, large-scale, multilingual lyrics transcription dataset with a CC-BY-NC-SA copyright license, based on MTG-Jamendo, and offer a human-annotated subset for noise level estimation and evaluation. We anticipate that our proposed method and dataset will advance the development of multilingual lyrics transcription, a challenging and emerging task.Comment: 9 pages, 2 figures, 5 tables, accepted by ISMIR 202

    On the Effectiveness of Speech Self-supervised Learning for Music

    Full text link
    Self-supervised learning (SSL) has shown promising results in various speech and natural language processing applications. However, its efficacy in music information retrieval (MIR) still remains largely unexplored. While previous SSL models pre-trained on music recordings may have been mostly closed-sourced, recent speech models such as wav2vec2.0 have shown promise in music modelling. Nevertheless, research exploring the effectiveness of applying speech SSL models to music recordings has been limited. We explore the music adaption of SSL with two distinctive speech-related models, data2vec1.0 and Hubert, and refer to them as music2vec and musicHuBERT, respectively. We train 1212 SSL models with 95M parameters under various pre-training configurations and systematically evaluate the MIR task performances with 13 different MIR tasks. Our findings suggest that training with music data can generally improve performance on MIR tasks, even when models are trained using paradigms designed for speech. However, we identify the limitations of such existing speech-oriented designs, especially in modelling polyphonic information. Based on the experimental results, empirical suggestions are also given for designing future musical SSL strategies and paradigms

    MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

    Full text link
    Self-supervised learning (SSL) has recently emerged as a promising paradigm for training generalisable models on large-scale data in the fields of vision, text, and speech. Although SSL has been proven effective in speech and audio, its application to music audio has yet to be thoroughly explored. This is primarily due to the distinctive challenges associated with modelling musical knowledge, particularly its tonal and pitched characteristics of music. To address this research gap, we propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training. In our exploration, we identified a superior combination of teacher models, which outperforms conventional speech and audio approaches in terms of performance. This combination includes an acoustic teacher based on Residual Vector Quantization - Variational AutoEncoder (RVQ-VAE) and a musical teacher based on the Constant-Q Transform (CQT). These teachers effectively guide our student model, a BERT-style transformer encoder, to better model music audio. In addition, we introduce an in-batch noise mixture augmentation to enhance the representation robustness. Furthermore, we explore a wide range of settings to overcome the instability in acoustic language model pre-training, which allows our designed paradigm to scale from 95M to 330M parameters. Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attains state-of-the-art (SOTA) overall scores. The code and models are online: https://github.com/yizhilll/MERT

    Record Maximum Oscillation Frequency in C-face Epitaxial Graphene Transistors

    Full text link
    The maximum oscillation frequency (fmax) quantifies the practical upper bound for useful circuit operation. We report here an fmax of 70 GHz in transistors using epitaxial graphene grown on the C-face of SiC. This is a significant improvement over Si-face epitaxial graphene used in the prior high frequency transistor studies, exemplifying the superior electronics potential of C-face epitaxial graphene. Careful transistor design using a high {\kappa} dielectric T-gate and self-aligned contacts, further contributed to the record-breaking fmax

    GmFT2a, a Soybean Homolog of FLOWERING LOCUS T, Is Involved in Flowering Transition and Maintenance

    Get PDF
    BACKGROUND: Flowering reversion can be induced in soybean (Glycine max L. Merr.), a typical short-day (SD) dicot, by switching from SD to long-day (LD) photoperiods. This process may involve florigen, putatively encoded by FLOWERING LOCUS T (FT) in Arabidopsis thaliana. However, little is known about the potential function of soybean FT homologs in flowering reversion. METHODS: A photoperiod-responsive FT homologue GmFT (renamed as GmFT2a hereafter) was cloned from the photoperiod-sensitive cultivar Zigongdongdou. GmFT2a gene expression under different photoperiods was analyzed by real-time quantitative PCR. In situ hybridization showed direct evidence for its expression during flowering-related processes. GmFT2a was shown to promote flowering using transgenic studies in Arabidopsis and soybean. The effects of photoperiod and temperature on GmFT2a expression were also analyzed in two cultivars with different photoperiod-sensitivities. RESULTS: GmFT2a expression is regulated by photoperiod. Analyses of GmFT2a transcripts revealed a strong correlation between GmFT2a expression and flowering maintenance. GmFT2a transcripts were observed continuously within the vascular tissue up to the shoot apex during flowering. By contrast, transcripts decreased to undetectable levels during flowering reversion. In grafting experiments, the early-flowering, photoperiod-insensitive stock Heihe27 promotes the appearance of GmFT2a transcripts in the shoot apex of scion Zigongdongdou under noninductive LD conditions. The photothermal effects of GmFT2a expression diversity in cultivars with different photoperiod-sensitivities and a hypothesis is proposed. CONCLUSION: GmFT2a expression is associated with flowering induction and maintenance. Therefore, GmFT2a is a potential target gene for soybean breeding, with the aim of increasing geographic adaptation of this crop

    Molecular Characterization and Expression Patterns of the HkSVP Gene Reveal Distinct Roles in Inflorescence Structure and Floral Organ Development in Hemerocallis fulva

    No full text
    SHORT VEGETATIVE PHASE (SVP) genes are members of the well-known MADS-box gene family that play a key role in regulating vital developmental processes in plants. Hemerocallis are perennial herbs that exhibit continuous flowering development and have been extensively used in landscaping. However, there are few reports on the regulatory mechanism of flowering in Hemerocallis. To better understand the molecular basis of floral formation of Hemerocallis, we identified and characterized the SVP-like gene HkSVP from the Hemerocallis cultivar ‘Kanai Sensei’. Quantitative RT-PCR (qRT-PCR) indicated that HkSVP transcript was mainly expressed in the vegetative growth stage and had the highest expression in leaves, low expression in petals, pedicels and fruits, and no expression in pistils. The HkSVP encoded protein was localized in the nucleus of Arabidopsis protoplasts and the nucleus of onion epidermal cells. Yeast two hybrid assay revealed that HKSVP interacted with Hemerocallis AP1 and TFL1. Moreover, overexpression of HkSVP in Arabidopsis resulted in delayed flowering and abnormal phenotypes, including enriched trichomes, increased basal inflorescence branches and inhibition of inflorescence formation. These observations suggest that the HkSVP gene may play an important role in maintaining vegetative growth by participating in the construction of inflorescence structure and the development of flower organs

    Scenery deconstruction: a new approach to understanding the historical characteristics of Nanjing cultural landscape

    No full text
    Abstract The “Eight Scenic Views Paintings” represent crucial visual materials for investigating the history of cultural landscapes. However, traditional methods of interpreting materials struggle to discern the inherent connections between different landscape elements. This study proposes an approach for deconstructing historical images, taking the example of the Forty Scenic Views in the Late Ming Dynasty in Nanjing, China. To explore the co-occurrence structure, hierarchical clustering, and correlation features among various elements, various digital humanities quantification methods were applied, including spatial analysis of ArcGIS, co-occurrence and clustering of KH Coder, and correlation analysis of SPSS. This study reveals the characteristics of the landscape construction of Nanjing in the Late Ming: natural landscape as the foundation, artificial landscape as the core, and advocating tradition as the fashion. It also uncovers the landscape order: mountains, waters, and scenic views interweaved and coexisted, as well as nature and culture intertwined and clustered. In addition, multiple information graphs of the subordinate and co-occurrence relationships of 20 landscape elements were constructed, 5 landscape paradigms were extracted, and 36 pairs of related relationships were discovered, deepening the historical understanding of the urban landscape construction of Nanjing in the Late Ming. This paper puts forward the idea of analyzing historical images by digital method, which provides some essential and detailed historical basis for explaining the value of cultural landscape heritage and shaping contemporary urban landscape

    Bilateral Symmetrical Nodules on the Thumbs in a Female Patient: A Quiz

    No full text
    Abstract is missing (Quiz

    Exploring the factors influencing the abrasion resistance of hydraulic concrete based on underwater steel ball test

    No full text
    Hydraulic structures may be subjected to erosion and damage by high-speed sand carrying water flow in the overflow area, seriously weakening the normal operation of the building. Due to the limited research on the wear resistance of ordinary concrete as a drainage structure, this paper took concrete in actual engineering as examples to study the effects of compressive strength, aggregate ppaper size, sand volume fraction, pore size parameters, water flow velocity, bed load and suspended load content. The results showed that when using the underwater steel ball method for testing, no strong correspondence between concrete abrasion resistance and a compressive strength. The strength grade of concrete was C40 - C55, with similar abrasion resistance. When the compressive strength was close, the abrasion resistance of concrete was 1.42–1.68 times that of mortar. The smaller the difference between the volume fraction of sand and the compact porosity of coarse aggregate, the higher the abrasion resistance. The proportion of 50–100 nm pore size in the pore structure was negatively correlated with the abrasion resistance of concrete, while the fractal dimension of pore volume was positively correlated with the abrasion resistance. When the water flow speed decreased from 1.5 m/s to 1.25 m/s, the abrasion resistance has increased by 97%. The influence of suspended load on concrete abrasion damage was minimal compared to bed load. Comprehensively studies the influence of different factors on the abrasion resistance of concrete, providing theoretical guidance and practical experience for improving the abrasion resistance of concrete
    corecore