19 research outputs found
How Private Are Commonly-Used Voting Rules?
Differential privacy has been widely applied to provide privacy guarantees by
adding random noise to the function output. However, it inevitably fails in
many high-stakes voting scenarios, where voting rules are required to be
deterministic. In this work, we present the first framework for answering the
question: "How private are commonly-used voting rules?" Our answers are
two-fold. First, we show that deterministic voting rules provide sufficient
privacy in the sense of distributional differential privacy (DDP). We show that
assuming the adversarial observer has uncertainty about individual votes, even
publishing the histogram of votes achieves good DDP. Second, we introduce the
notion of exact privacy to compare the privacy preserved in various
commonly-studied voting rules, and obtain dichotomy theorems of exact DDP
within a large subset of voting rules called generalized scoring rules
Learning the heterogeneous representation of brain's structure from serial SEM images using a masked autoencoder
IntroductionThe exorbitant cost of accurately annotating the large-scale serial scanning electron microscope (SEM) images as the ground truth for training has always been a great challenge for brain map reconstruction by deep learning methods in neural connectome studies. The representation ability of the model is strongly correlated with the number of such high-quality labels. Recently, the masked autoencoder (MAE) has been shown to effectively pre-train Vision Transformers (ViT) to improve their representational capabilities.MethodsIn this paper, we investigated a self-pre-training paradigm for serial SEM images with MAE to implement downstream segmentation tasks. We randomly masked voxels in three-dimensional brain image patches and trained an autoencoder to reconstruct the neuronal structures.Results and discussionWe tested different pre-training and fine-tuning configurations on three different serial SEM datasets of mouse brains, including two public ones, SNEMI3D and MitoEM-R, and one acquired in our lab. A series of masking ratios were examined and the optimal ratio for pre-training efficiency was spotted for 3D segmentation. The MAE pre-training strategy significantly outperformed the supervised learning from scratch. Our work shows that the general framework of can be a unified approach for effective learning of the representation of heterogeneous neural structural features in serial SEM images to greatly facilitate brain connectome reconstruction
PPSP: prediction of PK-specific phosphorylation site with Bayesian decision theory
BACKGROUND: As a reversible and dynamic post-translational modification (PTM) of proteins, phosphorylation plays essential regulatory roles in a broad spectrum of the biological processes. Although many studies have been contributed on the molecular mechanism of phosphorylation dynamics, the intrinsic feature of substrates specificity is still elusive and remains to be delineated. RESULTS: In this work, we present a novel, versatile and comprehensive program, PPSP (Prediction of PK-specific Phosphorylation site), deployed with approach of Bayesian decision theory (BDT). PPSP could predict the potential phosphorylation sites accurately for ~70 PK (Protein Kinase) groups. Compared with four existing tools Scansite, NetPhosK, KinasePhos and GPS, PPSP is more accurate and powerful than these tools. Moreover, PPSP also provides the prediction for many novel PKs, say, TRK, mTOR, SyK and MET/RON, etc. The accuracy of these novel PKs are also satisfying. CONCLUSION: Taken together, we propose that PPSP could be a potentially powerful tool for the experimentalists who are focusing on phosphorylation substrates with their PK-specific sites identification. Moreover, the BDT strategy could also be a ubiquitous approach for PTMs, such as sumoylation and ubiquitination, etc
Differential Privacy for Eye-Tracking Data
As large eye-tracking datasets are created, data privacy is a pressing concern for the eye-tracking community. De-identifying data does not guarantee privacy because multiple datasets can be linked for inferences. A common belief is that aggregating individuals' data into composite representations such as heatmaps protects the individual. However, we analytically examine the privacy of (noise-free) heatmaps and show that they do not guarantee privacy. We further propose two noise mechanisms that guarantee privacy and analyze their privacy-utility tradeoff. Analysis reveals that our Gaussian noise mechanism is an elegant solution to preserve privacy for heatmaps. Our results have implications for interdisciplinary research to create differentially private mechanisms for eye tracking
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
The rapid development of single-modal pre-training has prompted researchers
to pay more attention to cross-modal pre-training methods. In this paper, we
propose a unified-modal speech-unit-text pre-training model, SpeechUT, to
connect the representations of a speech encoder and a text decoder with a
shared unit encoder. Leveraging hidden-unit as an interface to align speech and
text, we can decompose the speech-to-text model into a speech-to-unit model and
a unit-to-text model, which can be jointly pre-trained with unpaired speech and
text data respectively. Our proposed SpeechUT is fine-tuned and evaluated on
automatic speech recognition (ASR) and speech translation (ST) tasks.
Experimental results show that SpeechUT gets substantial improvements over
strong baselines, and achieves state-of-the-art performance on both the
LibriSpeech ASR and MuST-C ST tasks. To better understand the proposed
SpeechUT, detailed analyses are conducted. The code and pre-trained models are
available at https://aka.ms/SpeechUT.Comment: 14 pages, accepted by EMNLP 202
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data
This paper studies a novel pre-training technique with unpaired speech data,
Speech2C, for encoder-decoder based automatic speech recognition (ASR). Within
a multi-task learning framework, we introduce two pre-training tasks for the
encoder-decoder network using acoustic units, i.e., pseudo codes, derived from
an offline clustering model. One is to predict the pseudo codes via masked
language modeling in encoder output, like HuBERT model, while the other lets
the decoder learn to reconstruct pseudo codes autoregressively instead of
generating textual scripts. In this way, the decoder learns to reconstruct
original speech information with codes before learning to generate correct
text. Comprehensive experiments on the LibriSpeech corpus show that the
proposed Speech2C can relatively reduce the word error rate (WER) by 19.2% over
the method without decoder pre-training, and also outperforms significantly the
state-of-the-art wav2vec 2.0 and HuBERT on fine-tuning subsets of 10h and 100h.Comment: Submitted to INTERSPEECH 202
Near-Neighbor Methods in Random Preference Completion
This paper studies a stylized, yet natural, learning-to-rank problem and points out the critical incorrectness of a widely used nearest neighbor algorithm. We consider a model with n agents (users) {xi}iâ[n] and m alternatives (items) {yl}lâ[m], each of which is associated with a latent feature vector. Agents rank items nondeterministically according to the Plackett-Luce model, where the higher the utility of an item to the agent, the more likely this item will be ranked high by the agent. Our goal is to identify near neighbors of an arbitrary agent in the latent space for prediction.We first show that the Kendall-tau distance based kNN produces incorrect results in our model. Next, we propose a new anchor-based algorithm to find neighbors of an agent. A salient feature of our algorithm is that it leverages the rankings of many other agents (the so-called âanchorsâ) to determine the closeness/similarities of two agents. We provide a rigorous analysis for one-dimensional latent space, and complement the theoretical results with experiments on synthetic and real datasets. The experiments confirm that the new algorithm is robust and practical
Accelerating Voting by Quantum Computation
Studying the computational complexity of determining winners under voting
rules and designing fast algorithms are classical and fundamental questions in
computational social choice. In this paper, we accelerate voting by leveraging
quantum computing. We propose a quantum-accelerated voting algorithm that can
be applied to any anonymous voting rule. We further show that our algorithm can
be quadratically faster than any classical algorithm (based on sampling with
replacement) under a wide range of common voting rules, including positional
scoring rules, Copeland, and single transferable voting (STV). Precisely, our
quantum-accelerated voting algorithm output the correct winner with runtime
, where is the number of votes and
is margin of victory, the smallest number of voters to change the
winner. On the other hand, any classical voting algorithm based on sampling
with replacement requires runtime
under a large subset of voting rules. Our theoretical results are supported by
experiments under plurality, Borda, Copeland, and STV.Comment: 8 pages main text + 2 pages reference + 6 pages appendi