14 research outputs found

    New Threats to Privacy-preserving Text Representations

    Get PDF
    The users’ privacy concerns mandate data publishers to protect privacy by anonymizing the data before sharing it with data consumers. Thus, the ultimate goal of privacy-preserving representation learning is to protect user privacy while ensuring the utility, e.g., the accuracy of the published data, for future tasks and usages. Privacy-preserving embeddings are usually functions that are encoded to low-dimensional vectors to protect privacy while preserving important semantic information about an input text. We demonstrate that these embeddings still leak private information, even though the low dimensional embeddings encode generic semantics. We develop two classes of attacks, i.e., adversarial classification attack and adversarial generation attack, to study the threats for these embeddings. In particular, the threats are (1) these embeddings may reveal sensitive attributes letting alone if they explicitly exist in the input text, and (2) the embedding vectors can be partially recovered via generation models. Besides, our experimental results show that our approach can produce higher-performing adversary models than other adversary baselines

    Privacy-Preserving Representation Learning for Text-Attributed Networks with Simplicial Complexes

    No full text
    Although recent network representation learning (NRL) works in text-attributed networks demonstrated superior performance for various graph inference tasks, learning network representations could always raise privacy concerns when nodes represent people or human-related variables. Moreover, standard NRLs that leverage structural information from a graph proceed by first encoding pairwise relationships into learned representations and then analysing its properties. This approach is fundamentally misaligned with problems where the relationships involve multiple points, and topological structure must be encoded beyond pairwise interactions. Fortunately, the machinery of topological data analysis (TDA) and, in particular, simplicial neural networks (SNNs) offer a mathematically rigorous framework to evaluate not only higher-order interactions, but also global invariant features of the observed graph to systematically learn topological structures. It is critical to investigate if the representation outputs from SNNs are more vulnerable compared to regular representation outputs from graph neural networks (GNNs) via pairwise interactions. In my dissertation, I will first study learning the representations with text attributes for simplicial complexes (RT4SC) via SNNs. Then, I will conduct research on two potential attacks on the representation outputs from SNNs: (1) membership inference attack, which infers whether a certain node of a graph is inside the training data of the GNN model; and (2) graph reconstruction attacks, which infer the confidential edges of a text-attributed network. Finally, I will study a privacy-preserving deterministic differentially private alternating direction method of multiplier to learn secure representation outputs from SNNs that capture multi-scale relationships and facilitate the passage from local structure to global invariant features on text-attributed networks

    Towards Fair and Selectively Privacy-Preserving Models Using Negative Multi-Task Learning (Student Abstract)

    No full text
    Deep learning models have shown great performances in natural language processing tasks. While much attention has been paid to improvements in utility, privacy leakage and social bias are two major concerns arising in trained models. In order to tackle these problems, we protect individuals' sensitive information and mitigate gender bias simultaneously. First, we propose a selective privacy-preserving method that only obscures individuals' sensitive information. Then we propose a negative multi-task learning framework to mitigate the gender bias which contains a main task and a gender prediction task. We analyze two existing word embeddings and evaluate them on sentiment analysis and a medical text classification task. Our experimental results show that our negative multi-task learning framework can mitigate the gender bias while keeping models’ utility

    Measuring the Privacy Leakage via Graph Reconstruction Attacks on Simplicial Neural Networks (Student Abstract)

    No full text
    In this paper, we measure the privacy leakage via studying whether graph representations can be inverted to recover the graph used to generate them via graph reconstruction attack (GRA). We propose a GRA that recovers a graph's adjacency matrix from the representations via a graph decoder that minimizes the reconstruction loss between the partial graph and the reconstructed graph. We study three types of representations that are trained on the graph, i.e., representations output from graph convolutional network (GCN), graph attention network (GAT), and our proposed simplicial neural network (SNN) via a higher-order combinatorial Laplacian. Unlike the first two types of representations that only encode pairwise relationships, the third type of representation, i.e., SNN outputs, encodes higher-order interactions (e.g., homological features) between nodes. We find that the SNN outputs reveal the lowest privacy-preserving ability to defend the GRA, followed by those of GATs and GCNs, which indicates the importance of building more private representations with higher-order node information that could defend the potential threats, such as GRAs

    Complete chloroplast genome assembly and phylogenetic analysis of blackcurrant (Ribes nigrum), red and white currant (Ribes rubrum), and gooseberry (Ribes uva-crispa) provide new insights into the phylogeny of Grossulariaceae

    No full text
    Background Blackcurrant (Ribes nigrum), red currant (R. rubrum), white currant (R. rubrum), and gooseberry (R. uva-crispa) belong to Grossulariaceae and are popular small-berry crops worldwide. The lack of genomic data has severely limited their systematic classification and molecular breeding. Methods The complete chloroplast (cp) genomes of these four taxa were assembled for the first time using MGI-DNBSEQ reads, and their genome structures, repeat elements and protein-coding genes were annotated. By genomic comparison of the present four and previous released five Ribes cp genomes, the genomic variations were identified. By phylogenetic analysis based on maximum-likelihood and Bayesian methods, the phylogeny of Grossulariaceae and the infrageneric relationships of the Ribes were revealed. Results The four cp genomes have lengths ranging from 157,450 to 157,802 bp and 131 shared genes. A total of 3,322 SNPs and 485 Indels were identified from the nine released Ribes cp genomes. Red currant and white currant have 100% identical cp genomes partially supporting the hypothesis that white currant (R. rubrum) is a fruit color variant of red currant (R. rubrum). The most polymorphic genic and intergenic region is ycf1 and trnT-psbD, respectively. The phylogenetic analysis demonstrated the monophyly of Grossulariaceae in Saxifragales and the paraphyletic relationship between Saxifragaceae and Grossulariaceae. Notably, the Grossularia subgenus is well nested within the Ribes subgenus and shows a paraphyletic relationship with the co-ancestor of Calobotrya and Coreosma sections, which challenges the dichotomous subclassification of the Ribes genus based on morphology (subgenus Ribes and subgenus Grossularia). These data, results, and insights lay a foundation for the phylogenetic research and breeding of Ribes species

    Cytochrome P450 promiscuity leads to a bifurcating biosynthetic pathway for tanshinones

    Get PDF
    Cytochromes P450 (CYPs) play key role in generating the structural diversity of terpenoids, the largest group of plant natural products. However, functional characterization of CYPs has been challenging because of the expansive families found in plant genomes, diverse reactivity and inaccessibility of their substrates and products. • Here we present the characterization of two CYPs, CYP76AH3 and CYP76AK1, that act sequentially to form a bifurcating pathway for the biosynthesis of tanshinones, the oxygenated diterpenoids from the Chinese medicinal plant Danshen. • These CYPs had similar transcription profiles to that of the known gene responsible for tanshinone production in elicited Danshen hairy roots. Biochemical and RNA interference studies demonstrated that both CYPs are promiscuous. CYP76AH3 oxidizes ferruginol at two different carbon centers, and CYP76AK1 hydroxylates C-20 of two of the resulting intermediates. Together, these convert ferruginol into 11,20-dihydroxy ferruginol and 11,20-dihydroxy sugiol en route to tanshinones. Moreover, we demonstrate the utility of these CYPs by engineering yeast for heterologous production of six oxygenated diterpenoids, which in turn enabled structural characterization of three novel compounds produced by CYP-mediated oxidation. • Our results highlight the incorporation of multiple CYPs in diterpenoids metabolic engineering, and a continuing trend of CYPs promiscuity generating complex networks in terpenoid biosynthesis.This is the peer reviewed version of the following article: Guo, J., Ma, X., Cai, Y., Ma, Y., Zhan, Z., Zhou, Y. J., Liu, W., Guan, M., Yang, J., Cui, G., Kang, L., Yang, L., Shen, Y., Tang, J., Lin, H., Ma, X., Jin, B., Liu, Z., Peters, R. J., Zhao, Z. K. and Huang, L. (2016), Cytochrome P450 promiscuity leads to a bifurcating biosynthetic pathway for tanshinones. New Phytol, 210: 525–534, which has been published in final form at doi:10.1111/nph.13790. This article may be used for non-commercial purposes in accordance With Wiley Terms and Conditions for self-archiving.</p
    corecore