Search CORE

549 research outputs found

Large-scale Multiple Testing: Fundamental Limits of False Discovery Rate Control and Compound Oracle

Author: Nie Yutong
Wu Yihong
Publication venue
Publication date: 13/02/2023
Field of study

The false discovery rate (FDR) and the false non-discovery rate (FNR), defined as the expected false discovery proportion (FDP) and the false non-discovery proportion (FNP), are the most popular benchmarks for multiple testing. Despite the theoretical and algorithmic advances in recent years, the optimal tradeoff between the FDR and the FNR has been largely unknown except for certain restricted class of decision rules, e.g., separable rules, or for other performance metrics, e.g., the marginal FDR and the marginal FNR (mFDR and mFNR). In this paper we determine the asymptotically optimal FDR-FNR tradeoff under the two-group random mixture model when the number of hypotheses tends to infinity. Distinct from the optimal mFDR-mFNR tradeoff, which is achieved by separable decision rules, the optimal FDR-FNR tradeoff requires compound rules and randomization even in the large-sample limit. A data-driven version of the oracle rule is proposed and shown to outperform existing methodologies on simulated data for models as simple as the normal mean model. Finally, to address the limitation of the FDR and FNR which only control the expectations but not the fluctuations of the FDP and FNP, we also determine the optimal tradeoff when the FDP and FNP are controlled with high probability and show it coincides with that of the mFDR and the mFNR.Comment: 39 page

arXiv.org e-Print Archive

Bayesian Semiparametric Markov Renewal Mixed Models for Vocalization Syntax

Author: Jarvis Erich D.
Sarkar Abhra
Wu Yutong
Publication venue
Publication date: 14/12/2022
Field of study

Studying the neurological, genetic and evolutionary basis of human vocal communication mechanisms is an important field of neuroscience. In the absence of high quality data on humans, mouse vocalization experiments in laboratory settings have been proven to be useful in providing valuable insights into mammalian vocal development and evolution, including especially the impact of certain genetic mutations. Data sets from mouse vocalization experiments usually consist of categorical syllable sequences along with continuous inter-syllable interval times for mice of different genotypes vocalizing under various contexts. Few statistical models have considered the inference for both transition probabilities and inter-state intervals. The latter is of particular importance as increased inter-state intervals can be an indication of possible vocal impairment. In this paper, we propose a class of novel Markov renewal mixed models that capture the stochastic dynamics of both state transitions and inter-state interval times. Specifically, we model the transition dynamics and the inter-state intervals using Dirichlet and gamma mixtures, respectively, allowing the mixture probabilities in both cases to vary flexibly with fixed covariate effects as well as random individual-specific effects. We apply our model to analyze the impact of a mutation in the Foxp2 gene on mouse vocal behavior. We find that genotypes and social contexts significantly affect the inter-state interval times but, compared to previous analyses, the influences of genotype and social context on the syllable transition dynamics are weaker.Comment: 40 pages, 7 figure

arXiv.org e-Print Archive

Backdooring Textual Inversion for Concept Censorship

Author: Kerschbaum Florian
wu Yutong
Zhang Jie
Zhang Tianwei
Publication venue
Publication date: 21/08/2023
Field of study

Recent years have witnessed success in AIGC (AI Generated Content). People can make use of a pre-trained diffusion model to generate images of high quality or freely modify existing pictures with only prompts in nature language. More excitingly, the emerging personalization techniques make it feasible to create specific-desired images with only a few images as references. However, this induces severe threats if such advanced techniques are misused by malicious users, such as spreading fake news or defaming individual reputations. Thus, it is necessary to regulate personalization models (i.e., concept censorship) for their development and advancement. In this paper, we focus on the personalization technique dubbed Textual Inversion (TI), which is becoming prevailing for its lightweight nature and excellent performance. TI crafts the word embedding that contains detailed information about a specific object. Users can easily download the word embedding from public websites like Civitai and add it to their own stable diffusion model without fine-tuning for personalization. To achieve the concept censorship of a TI model, we propose leveraging the backdoor technique for good by injecting backdoors into the Textual Inversion embeddings. Briefly, we select some sensitive words as triggers during the training of TI, which will be censored for normal use. In the subsequent generation stage, if the triggers are combined with personalized embeddings as final prompts, the model will output a pre-defined target image rather than images including the desired malicious concept. To demonstrate the effectiveness of our approach, we conduct extensive experiments on Stable Diffusion, a prevailing open-sourced text-to-image model. Our code, data, and results are available at https://concept-censorship.github.io

arXiv.org e-Print Archive

Decoding Social Sentiment in DAO: A Comparative Analysis of Blockchain Governance Communities

Author: Deng Wanlin
Quan Yutong
Wu Xintong
Zhang Luyao
Publication venue
Publication date: 31/10/2023
Field of study

Blockchain technology is leading a revolutionary transformation across diverse industries, with effective governance standing as a critical determinant for the success and sustainability of blockchain projects. Community forums, pivotal in engaging decentralized autonomous organizations (DAOs), wield a substantial impact on blockchain governance decisions. Concurrently, Natural Language Processing (NLP), particularly sentiment analysis, provides powerful insights from textual data. While prior research has explored the potential of NLP tools in social media sentiment analysis, a gap persists in understanding the sentiment landscape of blockchain governance communities. The evolving discourse and sentiment dynamics on the forums of top DAOs remain largely unknown. This paper delves deep into the evolving discourse and sentiment dynamics on the public forums of leading DeFi projects -- Aave, Uniswap, Curve Dao, Aragon, Yearn.finance, Merit Circle, and Balancer -- placing a primary focus on discussions related to governance issues. Despite differing activity patterns, participants across these decentralized communities consistently express positive sentiments in their Discord discussions, indicating optimism towards governance decisions. Additionally, our research suggests a potential interplay between discussion intensity and sentiment dynamics, indicating that higher discussion volumes may contribute to more stable and positive emotions. The insights gained from this study are valuable for decision-makers in blockchain governance, underscoring the pivotal role of sentiment analysis in interpreting community emotions and its evolving impact on the landscape of blockchain governance. This research significantly contributes to the interdisciplinary exploration of the intersection of blockchain and society, with a specific emphasis on the decentralized blockchain governance ecosystem

arXiv.org e-Print Archive

Human Papillomavirus Infection in Relation to Vaginal Microflora and Immune Factors

Author: Huang Xiaoling
Jia Ying
Li Sijing
Li Xiaoge
Wu Yutong
Publication venue: Universe Scientific Publishing Pte. Ltd.
Publication date: 01/06/2023
Field of study

Objective:Â Clarify the vaginal microflora and immune factors in women with human papilloma virus (HPV) infection, and explore its association with HPV infection.Â Methods:Â This study collected vaginal secretions and blood from 160 women initially diagnosed as HPV positive in our hospital from June 2020 to December 2020Â and 80 healthy women with HPV negative physical examination in the same period. The vaginal microfloraÂ of the patients were detected by 16S rDNA sequencing and the expression of immune factors was measured by a high-performance liquid phase chip.Â Results: The different types of HPV were HPV mix (64,40%), HPV52 (39,24.375%), HPV16 (30,18.750%), HPV58 (18,11.250%), HPV18 (6,3.750%), HPV53 (1,0.625%), HPV55 (1,0.625%), and HPV68 (1,0.625%).Î± diversity analysis showed that there was no significant difference in vaginal microflora between different HPV types (P=0.733). The genus level abundance of vaginal microflora in each group was mainly Lactobacillus, followed by GardnerellaÂ and Prevotella. LEfSe Analysis showed that the mix group was GardnerellaÂ and the type HPV16 group was Streptococcus. The immune comparison showed that MIP-1Î² was significantly upregulated in the HPV-positive group, but EGF in the HPV-negative group.Â Conclusion:Â This study revealed that HPV infection can change the proportion of vaginal microbial bacteria and the expression of immune factors, which provides a basis for local vaginal treatment and prevention of HPV infection after HPV infection

Advanced Emergency Medicine (E-Journal)

Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning

Author: Chen Ziyang
Wu Qi
Xia Yong
Xie Yutong
Ye Yiwen
Zhang Jianpeng
Publication venue
Publication date: 29/11/2023
Field of study

Self-supervised learning is an efficient pre-training method for medical image analysis. However, current research is mostly confined to specific-modality data pre-training, consuming considerable time and resources without achieving universality across different modalities. A straightforward solution is combining all modality data for joint self-supervised pre-training, which poses practical challenges. Firstly, our experiments reveal conflicts in representation learning as the number of modalities increases. Secondly, multi-modal data collected in advance cannot cover all real-world scenarios. In this paper, we reconsider versatile self-supervised learning from the perspective of continual learning and propose MedCoSS, a continuous self-supervised learning approach for multi-modal medical data. Unlike joint self-supervised learning, MedCoSS assigns different modality data to different training stages, forming a multi-stage pre-training process. To balance modal conflicts and prevent catastrophic forgetting, we propose a rehearsal-based continual learning method. We introduce the k-means sampling strategy to retain data from previous modalities and rehearse it when learning new modalities. Instead of executing the pretext task on buffer data, a feature distillation strategy and an intra-modal mixup strategy are applied to these data for knowledge retention. We conduct continuous self-supervised pre-training on a large-scale multi-modal unlabeled dataset, including clinical reports, X-rays, CT scans, MRI scans, and pathological images. Experimental results demonstrate MedCoSS's exceptional generalization ability across nine downstream datasets and its significant scalability in integrating new modality data. Code and pre-trained weight are available at https://github.com/yeerwen/MedCoSS

arXiv.org e-Print Archive