549 research outputs found
Large-scale Multiple Testing: Fundamental Limits of False Discovery Rate Control and Compound Oracle
The false discovery rate (FDR) and the false non-discovery rate (FNR),
defined as the expected false discovery proportion (FDP) and the false
non-discovery proportion (FNP), are the most popular benchmarks for multiple
testing. Despite the theoretical and algorithmic advances in recent years, the
optimal tradeoff between the FDR and the FNR has been largely unknown except
for certain restricted class of decision rules, e.g., separable rules, or for
other performance metrics, e.g., the marginal FDR and the marginal FNR (mFDR
and mFNR). In this paper we determine the asymptotically optimal FDR-FNR
tradeoff under the two-group random mixture model when the number of hypotheses
tends to infinity. Distinct from the optimal mFDR-mFNR tradeoff, which is
achieved by separable decision rules, the optimal FDR-FNR tradeoff requires
compound rules and randomization even in the large-sample limit. A data-driven
version of the oracle rule is proposed and shown to outperform existing
methodologies on simulated data for models as simple as the normal mean model.
Finally, to address the limitation of the FDR and FNR which only control the
expectations but not the fluctuations of the FDP and FNP, we also determine the
optimal tradeoff when the FDP and FNP are controlled with high probability and
show it coincides with that of the mFDR and the mFNR.Comment: 39 page
Bayesian Semiparametric Markov Renewal Mixed Models for Vocalization Syntax
Studying the neurological, genetic and evolutionary basis of human vocal
communication mechanisms is an important field of neuroscience. In the absence
of high quality data on humans, mouse vocalization experiments in laboratory
settings have been proven to be useful in providing valuable insights into
mammalian vocal development and evolution, including especially the impact of
certain genetic mutations. Data sets from mouse vocalization experiments
usually consist of categorical syllable sequences along with continuous
inter-syllable interval times for mice of different genotypes vocalizing under
various contexts. Few statistical models have considered the inference for both
transition probabilities and inter-state intervals. The latter is of particular
importance as increased inter-state intervals can be an indication of possible
vocal impairment. In this paper, we propose a class of novel Markov renewal
mixed models that capture the stochastic dynamics of both state transitions and
inter-state interval times. Specifically, we model the transition dynamics and
the inter-state intervals using Dirichlet and gamma mixtures, respectively,
allowing the mixture probabilities in both cases to vary flexibly with fixed
covariate effects as well as random individual-specific effects. We apply our
model to analyze the impact of a mutation in the Foxp2 gene on mouse vocal
behavior. We find that genotypes and social contexts significantly affect the
inter-state interval times but, compared to previous analyses, the influences
of genotype and social context on the syllable transition dynamics are weaker.Comment: 40 pages, 7 figure
Backdooring Textual Inversion for Concept Censorship
Recent years have witnessed success in AIGC (AI Generated Content). People
can make use of a pre-trained diffusion model to generate images of high
quality or freely modify existing pictures with only prompts in nature
language. More excitingly, the emerging personalization techniques make it
feasible to create specific-desired images with only a few images as
references. However, this induces severe threats if such advanced techniques
are misused by malicious users, such as spreading fake news or defaming
individual reputations. Thus, it is necessary to regulate personalization
models (i.e., concept censorship) for their development and advancement.
In this paper, we focus on the personalization technique dubbed Textual
Inversion (TI), which is becoming prevailing for its lightweight nature and
excellent performance. TI crafts the word embedding that contains detailed
information about a specific object. Users can easily download the word
embedding from public websites like Civitai and add it to their own stable
diffusion model without fine-tuning for personalization. To achieve the concept
censorship of a TI model, we propose leveraging the backdoor technique for good
by injecting backdoors into the Textual Inversion embeddings. Briefly, we
select some sensitive words as triggers during the training of TI, which will
be censored for normal use. In the subsequent generation stage, if the triggers
are combined with personalized embeddings as final prompts, the model will
output a pre-defined target image rather than images including the desired
malicious concept.
To demonstrate the effectiveness of our approach, we conduct extensive
experiments on Stable Diffusion, a prevailing open-sourced text-to-image model.
Our code, data, and results are available at
https://concept-censorship.github.io
Decoding Social Sentiment in DAO: A Comparative Analysis of Blockchain Governance Communities
Blockchain technology is leading a revolutionary transformation across
diverse industries, with effective governance standing as a critical
determinant for the success and sustainability of blockchain projects.
Community forums, pivotal in engaging decentralized autonomous organizations
(DAOs), wield a substantial impact on blockchain governance decisions.
Concurrently, Natural Language Processing (NLP), particularly sentiment
analysis, provides powerful insights from textual data. While prior research
has explored the potential of NLP tools in social media sentiment analysis, a
gap persists in understanding the sentiment landscape of blockchain governance
communities. The evolving discourse and sentiment dynamics on the forums of top
DAOs remain largely unknown. This paper delves deep into the evolving discourse
and sentiment dynamics on the public forums of leading DeFi projects -- Aave,
Uniswap, Curve Dao, Aragon, Yearn.finance, Merit Circle, and Balancer --
placing a primary focus on discussions related to governance issues. Despite
differing activity patterns, participants across these decentralized
communities consistently express positive sentiments in their Discord
discussions, indicating optimism towards governance decisions. Additionally,
our research suggests a potential interplay between discussion intensity and
sentiment dynamics, indicating that higher discussion volumes may contribute to
more stable and positive emotions. The insights gained from this study are
valuable for decision-makers in blockchain governance, underscoring the pivotal
role of sentiment analysis in interpreting community emotions and its evolving
impact on the landscape of blockchain governance. This research significantly
contributes to the interdisciplinary exploration of the intersection of
blockchain and society, with a specific emphasis on the decentralized
blockchain governance ecosystem
Human Papillomavirus Infection in Relation to Vaginal Microflora and Immune Factors
Objective: Clarify the vaginal microflora and immune factors in women with human papilloma virus (HPV) infection, and explore its association with HPV infection. Methods: This study collected vaginal secretions and blood from 160 women initially diagnosed as HPV positive in our hospital from June 2020 to December 2020 and 80 healthy women with HPV negative physical examination in the same period. The vaginal microflora of the patients were detected by 16S rDNA sequencing and the expression of immune factors was measured by a high-performance liquid phase chip. Results: The different types of HPV were HPV mix (64,40%), HPV52 (39,24.375%), HPV16 (30,18.750%), HPV58 (18,11.250%), HPV18 (6,3.750%), HPV53 (1,0.625%), HPV55 (1,0.625%), and HPV68 (1,0.625%).α diversity analysis showed that there was no significant difference in vaginal microflora between different HPV types (P=0.733). The genus level abundance of vaginal microflora in each group was mainly Lactobacillus, followed by Gardnerella and Prevotella. LEfSe Analysis showed that the mix group was Gardnerella and the type HPV16 group was Streptococcus. The immune comparison showed that MIP-1β was significantly upregulated in the HPV-positive group, but EGF in the HPV-negative group. Conclusion: This study revealed that HPV infection can change the proportion of vaginal microbial bacteria and the expression of immune factors, which provides a basis for local vaginal treatment and prevention of HPV infection after HPV infection
Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning
Self-supervised learning is an efficient pre-training method for medical
image analysis. However, current research is mostly confined to
specific-modality data pre-training, consuming considerable time and resources
without achieving universality across different modalities. A straightforward
solution is combining all modality data for joint self-supervised pre-training,
which poses practical challenges. Firstly, our experiments reveal conflicts in
representation learning as the number of modalities increases. Secondly,
multi-modal data collected in advance cannot cover all real-world scenarios. In
this paper, we reconsider versatile self-supervised learning from the
perspective of continual learning and propose MedCoSS, a continuous
self-supervised learning approach for multi-modal medical data. Unlike joint
self-supervised learning, MedCoSS assigns different modality data to different
training stages, forming a multi-stage pre-training process. To balance modal
conflicts and prevent catastrophic forgetting, we propose a rehearsal-based
continual learning method. We introduce the k-means sampling strategy to retain
data from previous modalities and rehearse it when learning new modalities.
Instead of executing the pretext task on buffer data, a feature distillation
strategy and an intra-modal mixup strategy are applied to these data for
knowledge retention. We conduct continuous self-supervised pre-training on a
large-scale multi-modal unlabeled dataset, including clinical reports, X-rays,
CT scans, MRI scans, and pathological images. Experimental results demonstrate
MedCoSS's exceptional generalization ability across nine downstream datasets
and its significant scalability in integrating new modality data. Code and
pre-trained weight are available at https://github.com/yeerwen/MedCoSS
- …