5 research outputs found

    Gender-tuning: Empowering Fine-tuning for Debiasing Pre-trained Language Models

    Full text link
    Recent studies have revealed that the widely-used Pre-trained Language Models (PLMs) propagate societal biases from the large unmoderated pre-training corpora. Existing solutions require debiasing training processes and datasets for debiasing, which are resource-intensive and costly. Furthermore, these methods hurt the PLMs' performance on downstream tasks. In this study, we propose Gender-tuning, which debiases the PLMs through fine-tuning on downstream tasks' datasets. For this aim, Gender-tuning integrates Masked Language Modeling (MLM) training objectives into fine-tuning's training process. Comprehensive experiments show that Gender-tuning outperforms the state-of-the-art baselines in terms of average gender bias scores in PLMs while improving PLMs' performance on downstream tasks solely using the downstream tasks' dataset. Also, Gender-tuning is a deployable debiasing tool for any PLM that works with original fine-tuning

    Improving Pre-trained Language Models' Generalization

    Full text link
    The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their generalization problem, where their performance drastically decreases when evaluated on examples that differ from the training dataset, known as Out-of-Distribution (OOD)/unseen examples. This limitation arises from PLMs' reliance on spurious correlations, which work well for frequent example types but not for general examples. To address this issue, we propose a training approach called Mask-tuning, which integrates Masked Language Modeling (MLM) training objectives into the fine-tuning process to enhance PLMs' generalization. Comprehensive experiments demonstrate that Mask-tuning surpasses current state-of-the-art techniques and enhances PLMs' generalization on OOD datasets while improving their performance on in-distribution datasets. The findings suggest that Mask-tuning improves the reusability of PLMs on unseen data, making them more practical and effective for real-world applications

    OPTIKS: An Optimized Key Transparency System

    Get PDF
    Key Transparency (KT) refers to a public key distribution system with transparency mechanisms proving its correct operation, i.e., proving that it reports consistent values for each user\u27s public key. While prior work on KT systems have offered new designs to tackle this problem, relatively little attention has been paid on the issue of scalability. Indeed, it is not straightforward to actually build a scalable and practical KT system from existing constructions, which may be too complex, inefficient, or non-resilient against machine failures. In this paper, we present OPTIKS, a full featured and optimized KT system that focuses on scalability. Our system is simpler and more performant than prior work, supporting smaller storage overhead while still meeting strong notions of security and privacy. Our design also incorporates a crash-tolerant and scalable server architecture, which we demonstrate by presenting extensive benchmarks. Finally, we address several real-world problems in deploying KT systems that have received limited attention in prior work, including account decommissioning and user-to-device mapping

    Labeled PSI from Homomorphic Encryption with Reduced Computation and Communication

    Get PDF
    It is known that fully homomorphic encryption (FHE) can be used to build efficient (labeled) Private Set Intersection protocols in the unbalanced setting, where one of the sets is much larger than the other (Chen et al. (CCS\u2717, CCS\u2718)). In this paper we demonstrate multiple algorithmic improvements upon these works. In particular, our protocol has an asymptotically better computation cost, requiring only O(∣X∣)O(\sqrt{|X|}) homomorphic multiplications, and communication complexity sublinear in the larger set size ∣X∣|X|. We demonstrate that our protocol is significantly better than that of Chen et al. (CCS\u2718) for many practical parameters, especially in terms of online communication cost. For example, when intersecting 2282^{28} and 20482048 item sets, our protocol reduces the online computation time by more than 83% and communication by more than 32%. When intersecting 2242^{24} and 40964096 item sets, our protocol reduces the online computation time by 50% and communication by 52%. Our comparison to other state-of-the-art unbalanced PSI protocols shows that our protocol has the best total communication complexity when ∣X∣≥224|X| \geq 2^{24}. For labeled PSI our protocol also outperforms Chen et al. (CCS\u2718). When intersecting 2202^{20} and 256256 item sets, with the larger set having associated 288288-byte labels, our protocol reduces the online computation time by more than 85% and communication by 36%. Finally, we demonstrate a modification that results in nearly constant communication cost in the larger set size ∣X∣|X|, but impractically high computation complexity on today\u27s CPUs. For example, to intersect a 210210-item set with sets of size 2222^{22}, 2242^{24}, or 2262^{26}, our proof-of-concept implementation requires only 0.760.76 MB of online communication, which is more than a 2424-fold improvement over Chen et al. (CCS\u2718)

    image

    No full text
    modeling of human-like characters from images using a geometr
    corecore