73 research outputs found
MMBench: Is Your Multi-modal Model an All-around Player?
Large vision-language models have recently achieved remarkable progress,
exhibiting great perception and reasoning abilities concerning visual
information. However, how to effectively evaluate these large vision-language
models remains a major obstacle, hindering future model development.
Traditional benchmarks like VQAv2 or COCO Caption provide quantitative
performance measurements but suffer from a lack of fine-grained ability
assessment and non-robust evaluation metrics. Recent subjective benchmarks,
such as OwlEval, offer comprehensive evaluations of a model's abilities by
incorporating human labor, but they are not scalable and display significant
bias. In response to these challenges, we propose MMBench, a novel
multi-modality benchmark. MMBench methodically develops a comprehensive
evaluation pipeline, primarily comprised of two elements. The first element is
a meticulously curated dataset that surpasses existing similar benchmarks in
terms of the number and variety of evaluation questions and abilities. The
second element introduces a novel CircularEval strategy and incorporates the
use of ChatGPT. This implementation is designed to convert free-form
predictions into pre-defined choices, thereby facilitating a more robust
evaluation of the model's predictions. MMBench is a systematically-designed
objective benchmark for robustly evaluating the various abilities of
vision-language models. We hope MMBench will assist the research community in
better evaluating their models and encourage future advancements in this
domain. Project page: https://opencompass.org.cn/mmbench
LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic
lyrics transcription method achieving state-of-the-art performance on various
lyrics transcription datasets, even in challenging genres such as rock and
metal. Our novel, training-free approach utilizes Whisper, a weakly supervised
robust speech recognition model, and GPT-4, today's most performant chat-based
large language model. In the proposed method, Whisper functions as the "ear" by
transcribing the audio, while GPT-4 serves as the "brain," acting as an
annotator with a strong performance for contextualized output selection and
correction. Our experiments show that LyricWhiz significantly reduces Word
Error Rate compared to existing methods in English and can effectively
transcribe lyrics across multiple languages. Furthermore, we use LyricWhiz to
create the first publicly available, large-scale, multilingual lyrics
transcription dataset with a CC-BY-NC-SA copyright license, based on
MTG-Jamendo, and offer a human-annotated subset for noise level estimation and
evaluation. We anticipate that our proposed method and dataset will advance the
development of multilingual lyrics transcription, a challenging and emerging
task.Comment: 9 pages, 2 figures, 5 tables, accepted by ISMIR 202
On the Effectiveness of Speech Self-supervised Learning for Music
Self-supervised learning (SSL) has shown promising results in various speech
and natural language processing applications. However, its efficacy in music
information retrieval (MIR) still remains largely unexplored. While previous
SSL models pre-trained on music recordings may have been mostly closed-sourced,
recent speech models such as wav2vec2.0 have shown promise in music modelling.
Nevertheless, research exploring the effectiveness of applying speech SSL
models to music recordings has been limited. We explore the music adaption of
SSL with two distinctive speech-related models, data2vec1.0 and Hubert, and
refer to them as music2vec and musicHuBERT, respectively. We train SSL
models with 95M parameters under various pre-training configurations and
systematically evaluate the MIR task performances with 13 different MIR tasks.
Our findings suggest that training with music data can generally improve
performance on MIR tasks, even when models are trained using paradigms designed
for speech. However, we identify the limitations of such existing
speech-oriented designs, especially in modelling polyphonic information. Based
on the experimental results, empirical suggestions are also given for designing
future musical SSL strategies and paradigms
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Self-supervised learning (SSL) has recently emerged as a promising paradigm
for training generalisable models on large-scale data in the fields of vision,
text, and speech. Although SSL has been proven effective in speech and audio,
its application to music audio has yet to be thoroughly explored. This is
primarily due to the distinctive challenges associated with modelling musical
knowledge, particularly its tonal and pitched characteristics of music. To
address this research gap, we propose an acoustic Music undERstanding model
with large-scale self-supervised Training (MERT), which incorporates teacher
models to provide pseudo labels in the masked language modelling (MLM) style
acoustic pre-training. In our exploration, we identified a superior combination
of teacher models, which outperforms conventional speech and audio approaches
in terms of performance. This combination includes an acoustic teacher based on
Residual Vector Quantization - Variational AutoEncoder (RVQ-VAE) and a musical
teacher based on the Constant-Q Transform (CQT). These teachers effectively
guide our student model, a BERT-style transformer encoder, to better model
music audio. In addition, we introduce an in-batch noise mixture augmentation
to enhance the representation robustness. Furthermore, we explore a wide range
of settings to overcome the instability in acoustic language model
pre-training, which allows our designed paradigm to scale from 95M to 330M
parameters. Experimental results indicate that our model can generalise and
perform well on 14 music understanding tasks and attains state-of-the-art
(SOTA) overall scores. The code and models are online:
https://github.com/yizhilll/MERT
Record Maximum Oscillation Frequency in C-face Epitaxial Graphene Transistors
The maximum oscillation frequency (fmax) quantifies the practical upper bound
for useful circuit operation. We report here an fmax of 70 GHz in transistors
using epitaxial graphene grown on the C-face of SiC. This is a significant
improvement over Si-face epitaxial graphene used in the prior high frequency
transistor studies, exemplifying the superior electronics potential of C-face
epitaxial graphene. Careful transistor design using a high {\kappa} dielectric
T-gate and self-aligned contacts, further contributed to the record-breaking
fmax
GmFT2a, a Soybean Homolog of FLOWERING LOCUS T, Is Involved in Flowering Transition and Maintenance
BACKGROUND: Flowering reversion can be induced in soybean (Glycine max L. Merr.), a typical short-day (SD) dicot, by switching from SD to long-day (LD) photoperiods. This process may involve florigen, putatively encoded by FLOWERING LOCUS T (FT) in Arabidopsis thaliana. However, little is known about the potential function of soybean FT homologs in flowering reversion. METHODS: A photoperiod-responsive FT homologue GmFT (renamed as GmFT2a hereafter) was cloned from the photoperiod-sensitive cultivar Zigongdongdou. GmFT2a gene expression under different photoperiods was analyzed by real-time quantitative PCR. In situ hybridization showed direct evidence for its expression during flowering-related processes. GmFT2a was shown to promote flowering using transgenic studies in Arabidopsis and soybean. The effects of photoperiod and temperature on GmFT2a expression were also analyzed in two cultivars with different photoperiod-sensitivities. RESULTS: GmFT2a expression is regulated by photoperiod. Analyses of GmFT2a transcripts revealed a strong correlation between GmFT2a expression and flowering maintenance. GmFT2a transcripts were observed continuously within the vascular tissue up to the shoot apex during flowering. By contrast, transcripts decreased to undetectable levels during flowering reversion. In grafting experiments, the early-flowering, photoperiod-insensitive stock Heihe27 promotes the appearance of GmFT2a transcripts in the shoot apex of scion Zigongdongdou under noninductive LD conditions. The photothermal effects of GmFT2a expression diversity in cultivars with different photoperiod-sensitivities and a hypothesis is proposed. CONCLUSION: GmFT2a expression is associated with flowering induction and maintenance. Therefore, GmFT2a is a potential target gene for soybean breeding, with the aim of increasing geographic adaptation of this crop
Molecular Characterization and Expression Patterns of the HkSVP Gene Reveal Distinct Roles in Inflorescence Structure and Floral Organ Development in Hemerocallis fulva
SHORT VEGETATIVE PHASE (SVP) genes are members of the well-known MADS-box gene family that play a key role in regulating vital developmental processes in plants. Hemerocallis are perennial herbs that exhibit continuous flowering development and have been extensively used in landscaping. However, there are few reports on the regulatory mechanism of flowering in Hemerocallis. To better understand the molecular basis of floral formation of Hemerocallis, we identified and characterized the SVP-like gene HkSVP from the Hemerocallis cultivar ‘Kanai Sensei’. Quantitative RT-PCR (qRT-PCR) indicated that HkSVP transcript was mainly expressed in the vegetative growth stage and had the highest expression in leaves, low expression in petals, pedicels and fruits, and no expression in pistils. The HkSVP encoded protein was localized in the nucleus of Arabidopsis protoplasts and the nucleus of onion epidermal cells. Yeast two hybrid assay revealed that HKSVP interacted with Hemerocallis AP1 and TFL1. Moreover, overexpression of HkSVP in Arabidopsis resulted in delayed flowering and abnormal phenotypes, including enriched trichomes, increased basal inflorescence branches and inhibition of inflorescence formation. These observations suggest that the HkSVP gene may play an important role in maintaining vegetative growth by participating in the construction of inflorescence structure and the development of flower organs
Scenery deconstruction: a new approach to understanding the historical characteristics of Nanjing cultural landscape
Abstract The “Eight Scenic Views Paintings” represent crucial visual materials for investigating the history of cultural landscapes. However, traditional methods of interpreting materials struggle to discern the inherent connections between different landscape elements. This study proposes an approach for deconstructing historical images, taking the example of the Forty Scenic Views in the Late Ming Dynasty in Nanjing, China. To explore the co-occurrence structure, hierarchical clustering, and correlation features among various elements, various digital humanities quantification methods were applied, including spatial analysis of ArcGIS, co-occurrence and clustering of KH Coder, and correlation analysis of SPSS. This study reveals the characteristics of the landscape construction of Nanjing in the Late Ming: natural landscape as the foundation, artificial landscape as the core, and advocating tradition as the fashion. It also uncovers the landscape order: mountains, waters, and scenic views interweaved and coexisted, as well as nature and culture intertwined and clustered. In addition, multiple information graphs of the subordinate and co-occurrence relationships of 20 landscape elements were constructed, 5 landscape paradigms were extracted, and 36 pairs of related relationships were discovered, deepening the historical understanding of the urban landscape construction of Nanjing in the Late Ming. This paper puts forward the idea of analyzing historical images by digital method, which provides some essential and detailed historical basis for explaining the value of cultural landscape heritage and shaping contemporary urban landscape
Bilateral Symmetrical Nodules on the Thumbs in a Female Patient: A Quiz
Abstract is missing (Quiz
Exploring the factors influencing the abrasion resistance of hydraulic concrete based on underwater steel ball test
Hydraulic structures may be subjected to erosion and damage by high-speed sand carrying water flow in the overflow area, seriously weakening the normal operation of the building. Due to the limited research on the wear resistance of ordinary concrete as a drainage structure, this paper took concrete in actual engineering as examples to study the effects of compressive strength, aggregate ppaper size, sand volume fraction, pore size parameters, water flow velocity, bed load and suspended load content. The results showed that when using the underwater steel ball method for testing, no strong correspondence between concrete abrasion resistance and a compressive strength. The strength grade of concrete was C40 - C55, with similar abrasion resistance. When the compressive strength was close, the abrasion resistance of concrete was 1.42–1.68 times that of mortar. The smaller the difference between the volume fraction of sand and the compact porosity of coarse aggregate, the higher the abrasion resistance. The proportion of 50–100 nm pore size in the pore structure was negatively correlated with the abrasion resistance of concrete, while the fractal dimension of pore volume was positively correlated with the abrasion resistance. When the water flow speed decreased from 1.5 m/s to 1.25 m/s, the abrasion resistance has increased by 97%. The influence of suspended load on concrete abrasion damage was minimal compared to bed load. Comprehensively studies the influence of different factors on the abrasion resistance of concrete, providing theoretical guidance and practical experience for improving the abrasion resistance of concrete
- …