Search CORE

42 research outputs found

Med-BERT: pre-trained contextualized embeddings on large-scale structured electronic health records for disease prediction

Author: Rasmy Laila
Tao Cui
Xiang Yang
Xie Ziqian
Zhi Degui
Publication venue
Publication date: 22/05/2020
Field of study

Deep learning (DL) based predictive models from electronic health records (EHR) deliver impressive performance in many clinical tasks. Large training cohorts, however, are often required to achieve high accuracy, hindering the adoption of DL-based models in scenarios with limited training data size. Recently, bidirectional encoder representations from transformers (BERT) and related models have achieved tremendous successes in the natural language processing domain. The pre-training of BERT on a very large training corpus generates contextualized embeddings that can boost the performance of models trained on smaller datasets. We propose Med-BERT, which adapts the BERT framework for pre-training contextualized embedding models on structured diagnosis data from 28,490,650 patients EHR dataset. Fine-tuning experiments are conducted on two disease-prediction tasks: (1) prediction of heart failure in patients with diabetes and (2) prediction of pancreatic cancer from two clinical databases. Med-BERT substantially improves prediction accuracy, boosting the area under receiver operating characteristics curve (AUC) by 2.02-7.12%. In particular, pre-trained Med-BERT substantially improves the performance of tasks with very small fine-tuning training sets (300-500 samples) boosting the AUC by more than 20% or equivalent to the AUC of 10 times larger training set. We believe that Med-BERT will benefit disease-prediction studies with small local training datasets, reduce data collection expenses, and accelerate the pace of artificial intelligence aided healthcare.Comment: L.R., X.Y., and Z.X. share first authorship of this wor

arXiv.org e-Print Archive

PubMed Central

DigitalCommons@The Texas Medical Center

Pseudo-Siamese Network based Timbre-reserved Black-box Adversarial Attack in Speaker Identification

Author: Guo Pengcheng
Wang Qing
Wang Ziqian
Xie Lei
Yao Jixun
Publication venue
Publication date: 30/05/2023
Field of study

In this study, we propose a timbre-reserved adversarial attack approach for speaker identification (SID) to not only exploit the weakness of the SID model but also preserve the timbre of the target speaker in a black-box attack setting. Particularly, we generate timbre-reserved fake audio by adding an adversarial constraint during the training of the voice conversion model. Then, we leverage a pseudo-Siamese network architecture to learn from the black-box SID model constraining both intrinsic similarity and structural similarity simultaneously. The intrinsic similarity loss is to learn an intrinsic invariance, while the structural similarity loss is to ensure that the substitute SID model shares a similar decision boundary to the fixed black-box SID model. The substitute model can be used as a proxy to generate timbre-reserved fake audio for attacking. Experimental results on the Audio Deepfake Detection (ADD) challenge dataset indicate that the attack success rate of our proposed approach yields up to 60.58% and 55.38% in the white-box and black-box scenarios, respectively, and can deceive both human beings and machines.Comment: 5 page

arXiv.org e-Print Archive

Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features

Author: Bi Mengxiao
Ning Ziqian
Wang Zhichao
Xie Lei
Xie Qicong
Xue Liumeng
Yao Jixun
Zhu Pengcheng
Publication venue
Publication date: 09/11/2022
Field of study

Voice conversion for highly expressive speech is challenging. Current approaches struggle with the balancing between speaker similarity, intelligibility and expressiveness. To address this problem, we propose Expressive-VC, a novel end-to-end voice conversion framework that leverages advantages from both neural bottleneck feature (BNF) approach and information perturbation approach. Specifically, we use a BNF encoder and a Perturbed-Wav encoder to form a content extractor to learn linguistic and para-linguistic features respectively, where BNFs come from a robust pre-trained ASR model and the perturbed wave becomes speaker-irrelevant after signal perturbation. We further fuse the linguistic and para-linguistic features through an attention mechanism, where speaker-dependent prosody features are adopted as the attention query, which result from a prosody encoder with target speaker embedding and normalized pitch and energy of source speech as input. Finally the decoder consumes the integrated features and the speaker-dependent prosody feature to generate the converted speech. Experiments demonstrate that Expressive-VC is superior to several state-of-the-art systems, achieving both high expressiveness captured from the source speech and high speaker similarity with the target speaker; meanwhile intelligibility is well maintained

arXiv.org e-Print Archive

Preserving background sound in noise-robust voice conversion via multi-task learning

Author: Guo Pengcheng
Lei Yi
Li Hai
Liu Junhui
Ning Ziqian
Wang Qing
Xie Danming
Xie Lei
Yao Jixun
Publication venue
Publication date: 06/11/2022
Field of study

Background sound is an informative form of art that is helpful in providing a more immersive experience in real-application voice conversion (VC) scenarios. However, prior research about VC, mainly focusing on clean voices, pay rare attention to VC with background sound. The critical problem for preserving background sound in VC is inevitable speech distortion by the neural separation model and the cascade mismatch between the source separation model and the VC model. In this paper, we propose an end-to-end framework via multi-task learning which sequentially cascades a source separation (SS) module, a bottleneck feature extraction module and a VC module. Specifically, the source separation task explicitly considers critical phase information and confines the distortion caused by the imperfect separation process. The source separation task, the typical VC task and the unified task shares a uniform reconstruction loss constrained by joint training to reduce the mismatch between the SS and VC modules. Experimental results demonstrate that our proposed framework significantly outperforms the baseline systems while achieving comparable quality and speaker similarity to the VC models trained with clean data.Comment: Submitted to ICASSP 202

arXiv.org e-Print Archive

A novel image fusion algorithm based on bandelet transform

Author: Bengang Chen
Guofu Xie
Jingwen Yan
Xiaobo Qu
Ziqian Zhu
Publication venue: CHINESE OPTICS LETTERS
Publication date: 01/01/2007
Field of study

A novel image fusion algorithm based on bandelet transform is proposed. Bandelet transform can take advantage of the geometrical regularity of image structure and represent sharp image transitions such as edges efficiently in image fusion. For reconstructing the fused image, the maximum rule is used to select source images’ geometric flow and bandelet coefficients. Experimental results indicate that the bandelet-based fusion algorithm represents the edge and detailed information well and outperforms the wavelet-based and Laplacian pyramid-based fusion algorithms, especially when the abundant texture and edges are contained in the source images.Navigation Science Foundation (No. 05F07001) and the National Natural Science Foundation of China (No. 60472081)

CiteSeerX

Xiamen University Institutional Repository

PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts

Author: Hu Yanni
Lei Yi
Lu Heng
Ning Ziqian
Pan Yu
Xie Lei
Yang Yuguang
Yao Jixun
Yin Jingjing
Zhou Hongbin
Publication venue
Publication date: 17/09/2023
Field of study

Style voice conversion aims to transform the style of source speech to a desired style according to real-world application demands. However, the current style voice conversion approach relies on pre-defined labels or reference speech to control the conversion process, which leads to limitations in style diversity or falls short in terms of the intuitive and interpretability of style representation. In this study, we propose PromptVC, a novel style voice conversion approach that employs a latent diffusion model to generate a style vector driven by natural language prompts. Specifically, the style vector is extracted by a style encoder during training, and then the latent diffusion model is trained independently to sample the style vector from noise, with this process being conditioned on natural language prompts. To improve style expressiveness, we leverage HuBERT to extract discrete tokens and replace them with the K-Means center embedding to serve as the linguistic content, which minimizes residual style information. Additionally, we deduplicate the same discrete token and employ a differentiable duration predictor to re-predict the duration of each token, which can adapt the duration of the same linguistic content to different styles. The subjective and objective evaluation results demonstrate the effectiveness of our proposed system.Comment: Submitted to ICASSP 202

arXiv.org e-Print Archive

Unsupervised Deep Representation Learning Enables Phenotype Discovery For Genetic association Studies of Brain Imaging

Author: Chen Han
Fletcher Evan
Fornage Myriam
Giancardo Luca
Gottlieb Assaf
He Wei
Islam Sheikh Muhammad Saiful
Ji Shuiwang
Knaack Alexander
Patel Khush
Xie Yaochen
Xie Ziqian
Yuan Hao
Zhang Wanheng
Zhi Degui
Publication venue: DigitalCommons@TMC
Publication date: 05/04/2024
Field of study

Understanding the genetic architecture of brain structure is challenging, partly due to difficulties in designing robust, non-biased descriptors of brain morphology. Until recently, brain measures for genome-wide association studies (GWAS) consisted of traditionally expert-defined or software-derived image-derived phenotypes (IDPs) that are often based on theoretical preconceptions or computed from limited amounts of data. Here, we present an approach to derive brain imaging phenotypes using unsupervised deep representation learning. We train a 3-D convolutional autoencoder model with reconstruction loss on 6130 UK Biobank (UKBB) participants\u27 T1 or T2-FLAIR (T2) brain MRIs to create a 128-dimensional representation known as Unsupervised Deep learning derived Imaging Phenotypes (UDIPs). GWAS of these UDIPs in held-out UKBB subjects (n = 22,880 discovery and n = 12,359/11,265 replication cohorts for T1/T2) identified 9457 significant SNPs organized into 97 independent genetic loci of which 60 loci were replicated. Twenty-six loci were not reported in earlier T1 and T2 IDP-based UK Biobank GWAS. We developed a perturbation-based decoder interpretation approach to show that these loci are associated with UDIPs mapped to multiple relevant brain regions. Our results established unsupervised deep learning can derive robust, unbiased, heritable, and interpretable brain imaging phenotypes

DigitalCommons@The Texas Medical Center

Disentangling the effects of vapor pressure deficit on northern terrestrial vegetation productivity

Author: Chen Deliang
Chen Hans
Chen Yaning
Deng Ying
Fu Yongshuo H.
Guo Lanlan
Hao Xingmin
He Bin
Huang Ling
Liu Huiming
Sun Liying
Tang Rui
Wang Ying Ping
Xie Xiaoming
Yuan Wenping
Zhang Yafeng
Zhong Ziqian
Publication venue
Publication date: 01/01/2023
Field of study

The impact of atmospheric vapor pressure deficit (VPD) on plant photosynthesis has long been acknowledged, but large interactions with air temperature (T) and soil moisture (SM) still hinder a complete understanding of the influence of VPD on vegetation production across various climate zones. Here, we found a diverging response of productivity to VPD in the Northern Hemisphere by excluding interactive effects of VPD with T and SM. The interactions between VPD and T/SM not only offset the potential positive impact of warming on vegetation productivity but also amplifies the negative effect of soil drying. Notably, for high-latitude ecosystems, there occurs a pronounced shift in vegetation productivity\u27s response to VPD during the growing season when VPD surpasses a threshold of 3.5 to 4.0 hectopascals. These results yield previously unknown insights into the role of VPD in terrestrial ecosystems and enhance our comprehension of the terrestrial carbon cycle\u27s response to global warming

Chalmers Research

Recommended from our members

Deep Learning Approach for Brain Machine Interface

Author: Xie Ziqian
Publication venue: Scholarly Repository
Publication date: 30/03/2018
Field of study

Objective: Brain machine interface (BMI) or Brain Computer Interface (BCI) provides a direct pathway between the brain and an external device to help people suffering from severely impaired motor function by decoding brain activities and translating human intentions into control signals. Conventionally, the decoding pipeline for BMIs consists of chained different stages of feature extraction, time-frequency analysis and statistical learning models. Each of these stages uses a different algorithm trained in a sequential manner, which makes the whole system difficult to be adaptive. Our goal is to create differentiable signal processing modules and plug them together to build an adaptive online system. The system could be trained with a single objective function and a single learning algorithm so that each component can be updated in parallel to increase the performance in a robust manner. We use deep neural networks to address these needs. Main Results: We predicted the finger trajectory using Electrocorticography (ECoG) signals and compared results for the Least Angle Regression (LARS), Convolutional Long Short Term Memory Network (Conv-LSTM), Random Forest (RF), and a pipeline consisting of band-pass filtering, energy extraction, feature selection and linear regression. The results showed that the deep learning models performed better than the commonly used linear model. The deep learning models not only gave smoother and more realistic trajectories but also learned the transition between movement and rest state. We also estimated the source connectivity of the brain signals using a Recurrent Neural Network (RNN) and it correctly estimated the order and sparsity level of the underlying Multivariate Auto-regressive process (MVAR). The time course of the source connectivity was also recovered. Significance: We replace the conventional signal processing pipeline with differentiable modules so that the whole BMI system is adaptive. The study of the decoding system demonstrated a model for BMI that involved a convolutional and recurrent neural network. It integrated the feature extraction pipeline into the convolution and pooling layer and used Long Short Term Memory (LSTM) layer to capture the state transitions. The decoding network eliminated the need to separately train the model at each step in the decoding pipeline. The whole system can be jointly optimized using stochastic gradient descent and is capable of online learning. The study of the source connectivity estimation demonstrated a generative RNN model that can estimate the un-mixing matrix and the MVAR coefficients of the source activity at the same time. Our method addressed the issue of estimation and inference of the non-stationary MVAR coefficients and the un-mixing matrix in the presence of non-gaussian noise. More importantly, this model can be easily plugged into the BMI decoding system as a differentiable feature extraction module

University of Miami: Scholarship Miami

Decoding of finger trajectory from ECoG using deep learning

Author: Prasad Abhishek
Schwartz Odelia
Xie Ziqian
Publication venue: IOP Publishing
Publication date: 28/02/2018
Field of study

Objective. Conventional decoding pipeline for brain-machine interfaces (BMIs) consists of chained different stages of feature extraction, time-frequency analysis and statistical learning models. Each of these stages uses a different algorithm trained in a sequential manner, which makes it difficult to make the whole system adaptive. The goal was to create an adaptive online system with a single objective function and a single learning algorithm so that the whole system can be trained in parallel to increase the decoding performance. Here, we used deep neural networks consisting of convolutional neural networks (CNN) and a special kind of recurrent neural network (RNN) called long short term memory (LSTM) to address these needs. Approach. We used electrocorticography (ECoG) data collected by Kubanek et al. The task consisted of individual finger flexions upon a visual cue. Our model combined a hierarchical feature extractor CNN and a RNN that was able to process sequential data and recognize temporal dynamics in the neural data. CNN was used as the feature extractor and LSTM was used as the regression algorithm to capture the temporal dynamics of the signal. Main results. We predicted the finger trajectory using ECoG signals and compared results for the least angle regression (LARS), CNN-LSTM, random forest, LSTM model (LSTM_HC, for using hard-coded features) and a decoding pipeline consisting of band-pass filtering, energy extraction, feature selection and linear regression. The results showed that the deep learning models performed better than the commonly used linear model. The deep learning models not only gave smoother and more realistic trajectories but also learned the transition between movement and rest state. Significance. This study demonstrated a decoding network for BMI that involved a convolutional and recurrent neural network model. It integrated the feature extraction pipeline into the convolution and pooling layer and used LSTM layer to capture the state transitions. The discussed network eliminated the need to separately train the model at each step in the decoding pipeline. The whole system can be jointly optimized using stochastic gradient descent and is capable of online learning

Crossref

University of Miami: Scholarship Miami