Search CORE

34 research outputs found

Evolving Network With Different Edges

Author: B. Derrida
D. J. Watts
Jie Sun
P. Erdos
Sheng Li
Yizhi Ge
Publication venue: 'American Physical Society (APS)'
Publication date: 30/08/2006
Field of study

We proposed an evolving network model constituted by the same nodes but different edges. The competition between nodes and different links were introduced. Scale free properties have been found in this model by continuum theory. Different network topologies can be generated by some tunable parameters. Simulation results consolidate the prediction.Comment: 14 pages, 9 figures, some contents revised, fluctuation of x degree adde

arXiv.org e-Print Archive

Crossref

Inflammation and cardiac dysfunction during sepsis, muscular dystrophy, and myocarditis

Author: Shuping Ge
Xiongwen Chen
Ying Li
Yizhi Peng
Publication venue: Medknow
Publication date
Field of study

Springer - Publisher Connector

HERB: Measuring Hierarchical Regional Bias in Pre-trained Language Models

Author: Fu Jie
Li Yizhi
Lin Chenghua
Ragni Anton
Wang Shi
Yang Bohao
Zhang Ge
Publication venue
Publication date: 05/11/2022
Field of study

Fairness has become a trending topic in natural language processing (NLP), which addresses biases targeting certain social groups such as genders and religions. However, regional bias in language models (LMs), a long-standing global discrimination problem, still remains unexplored. This paper bridges the gap by analysing the regional bias learned by the pre-trained language models that are broadly used in NLP tasks. In addition to verifying the existence of regional bias in LMs, we find that the biases on regional groups can be strongly influenced by the geographical clustering of the groups. We accordingly propose a HiErarchical Regional Bias evaluation method (HERB) utilising the information from the sub-region clusters to quantify the bias in pre-trained LMs. Experiments show that our hierarchical metric can effectively evaluate the regional bias with respect to comprehensive topics and measure the potential regional bias that can be propagated to downstream tasks. Our codes are available at https://github.com/Bernard-Yang/HERB.Comment: Accepted at AACL 2022 as Long Finding

arXiv.org e-Print Archive

Chinese Open Instruction Generalist: A Preliminary Release

Author: Dong Siwei
Fu Jie
Huang Wenhao
Li Yizhi
Li Zhaoqun
Lin Chenghua
Liu Ruibo
Shi Yemin
Shu Yu
Wang Zekun
Yuan Ruibin
Zhang Ge
Publication venue
Publication date: 18/04/2023
Field of study

Instruction tuning is widely recognized as a key technique for building generalist language models, which has attracted the attention of researchers and the public with the release of InstructGPT~\citep{ouyang2022training} and ChatGPT\footnote{\url{https://chat.openai.com/}}. Despite impressive progress in English-oriented large-scale language models (LLMs), it is still under-explored whether English-based foundation LLMs can perform similarly on multilingual tasks compared to English tasks with well-designed instruction tuning and how we can construct the corpora needed for the tuning. To remedy this gap, we propose the project as an attempt to create a Chinese instruction dataset by various methods adapted to the intrinsic characteristics of 4 sub-tasks. We collect around 200k Chinese instruction tuning samples, which have been manually checked to guarantee high quality. We also summarize the existing English and Chinese instruction corpora and briefly describe some potential applications of the newly constructed Chinese instruction corpora. The resulting \textbf{C}hinese \textbf{O}pen \textbf{I}nstruction \textbf{G}eneralist (\textbf{COIG}) corpora are available in Huggingface\footnote{\url{https://huggingface.co/datasets/BAAI/COIG}} and Github\footnote{\url{https://github.com/FlagOpen/FlagInstruct}}, and will be continuously updated

arXiv.org e-Print Archive

LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

Author: Benetos Emmanouil
Chen Wenhu
Dannenberg Roger
Fu Jie
Guo Yike
LI Yizhi
Lin Chenghua
Liu Si
Ma Yinghao
Pan Jiahao
Xue Wei
Yuan Ruibin
Zhang Ge
Zhuo Le
Publication venue
Publication date: 29/06/2023
Field of study

We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method achieving state-of-the-art performance on various lyrics transcription datasets, even in challenging genres such as rock and metal. Our novel, training-free approach utilizes Whisper, a weakly supervised robust speech recognition model, and GPT-4, today's most performant chat-based large language model. In the proposed method, Whisper functions as the "ear" by transcribing the audio, while GPT-4 serves as the "brain," acting as an annotator with a strong performance for contextualized output selection and correction. Our experiments show that LyricWhiz significantly reduces Word Error Rate compared to existing methods in English and can effectively transcribe lyrics across multiple languages. Furthermore, we use LyricWhiz to create the first publicly available, large-scale, multilingual lyrics transcription dataset with a CC-BY-NC-SA copyright license, based on MTG-Jamendo, and offer a human-annotated subset for noise level estimation and evaluation. We anticipate that our proposed method and dataset will advance the development of multilingual lyrics transcription, a challenging and emerging task.Comment: 9 pages, 2 figures, 5 tables, accepted by ISMIR 202

arXiv.org e-Print Archive

SciMMIR:Benchmarking Scientific Multi-modal Information Retrieval

Author: Chen Wenhu
Fu Jie
Huang Wenhao
Li Yizhi
Liang Yiming
Lin Chenghua
Ma Kaijing
Moubayed Noura Al
Wu Siwei
Xiao Chenghao
Yang Bohao
Zhang Ge
Zhang Haoran
Zhu Kang
Publication venue
Publication date: 24/01/2024
Field of study

Multi-modal information retrieval (MMIR) is a rapidly evolving field, where significant progress, particularly in image-text pairing, has been made through advanced representation learning and cross-modality alignment research. However, current benchmarks for evaluating MMIR performance in image-text pairing within the scientific domain show a notable gap, where chart and table images described in scholarly language usually do not play a significant role. To bridge this gap, we develop a specialised scientific MMIR (SciMMIR) benchmark by leveraging open-access paper collections to extract data relevant to the scientific domain. This benchmark comprises 530K meticulously curated image-text pairs, extracted from figures and tables with detailed captions in scientific documents. We further annotate the image-text pairs with two-level subset-subcategory hierarchy annotations to facilitate a more comprehensive evaluation of the baselines. We conducted zero-shot and fine-tuning evaluations on prominent multi-modal image-captioning and visual language models, such as CLIP and BLIP. Our analysis offers critical insights for MMIR in the scientific domain, including the impact of pre-training and fine-tuning settings and the influence of the visual and textual encoders. All our data and checkpoints are publicly available at https://github.com/Wusiwei0410/SciMMIR

The University of Manchester - Institutional Repository

On the Effectiveness of Speech Self-supervised Learning for Music

Author: Benetos Emmanouil
Chen Xingran
Dannenberg Roger
Fu Jie
Guo Yike
Gyenge Norbert
Li Yizhi
Lin Chenghua
Liu Ruibo
Ma Yinghao
Ragni Anton
Xia Gus
Yin Hanzhi
Yuan Ruibin
Zhang Ge
Publication venue
Publication date: 11/07/2023
Field of study

Self-supervised learning (SSL) has shown promising results in various speech and natural language processing applications. However, its efficacy in music information retrieval (MIR) still remains largely unexplored. While previous SSL models pre-trained on music recordings may have been mostly closed-sourced, recent speech models such as wav2vec2.0 have shown promise in music modelling. Nevertheless, research exploring the effectiveness of applying speech SSL models to music recordings has been limited. We explore the music adaption of SSL with two distinctive speech-related models, data2vec1.0 and Hubert, and refer to them as music2vec and musicHuBERT, respectively. We train

12

SSL models with 95M parameters under various pre-training configurations and systematically evaluate the MIR task performances with 13 different MIR tasks. Our findings suggest that training with music data can generally improve performance on MIR tasks, even when models are trained using paradigms designed for speech. However, we identify the limitations of such existing speech-oriented designs, especially in modelling polyphonic information. Based on the experimental results, empirical suggestions are also given for designing future musical SSL strategies and paradigms

arXiv.org e-Print Archive

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

Author: Benetos Emmanouil
Chen Wenhu
Chen Xingran
Dannenberg Roger
Fu Jie
Guo Yike
Gyenge Norbert
Huang Wenhao
Li Yizhi
Lin Chenghua
Liu Ruibo
Ma Yinghao
Ragni Anton
Shi Yemin
Xia Gus
Yin Hanzhi
Yuan Ruibin
Zhang Ge
Publication venue
Publication date: 31/05/2023
Field of study

Self-supervised learning (SSL) has recently emerged as a promising paradigm for training generalisable models on large-scale data in the fields of vision, text, and speech. Although SSL has been proven effective in speech and audio, its application to music audio has yet to be thoroughly explored. This is primarily due to the distinctive challenges associated with modelling musical knowledge, particularly its tonal and pitched characteristics of music. To address this research gap, we propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training. In our exploration, we identified a superior combination of teacher models, which outperforms conventional speech and audio approaches in terms of performance. This combination includes an acoustic teacher based on Residual Vector Quantization - Variational AutoEncoder (RVQ-VAE) and a musical teacher based on the Constant-Q Transform (CQT). These teachers effectively guide our student model, a BERT-style transformer encoder, to better model music audio. In addition, we introduce an in-batch noise mixture augmentation to enhance the representation robustness. Furthermore, we explore a wide range of settings to overcome the instability in acoustic language model pre-training, which allows our designed paradigm to scale from 95M to 330M parameters. Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attains state-of-the-art (SOTA) overall scores. The code and models are online: https://github.com/yizhilll/MERT

arXiv.org e-Print Archive