50 research outputs found

    Watermarking Classification Dataset for Copyright Protection

    Full text link
    Substantial research works have shown that deep models, e.g., pre-trained models, on the large corpus can learn universal language representations, which are beneficial for downstream NLP tasks. However, these powerful models are also vulnerable to various privacy attacks, while much sensitive information exists in the training dataset. The attacker can easily steal sensitive information from public models, e.g., individuals' email addresses and phone numbers. In an attempt to address these issues, particularly the unauthorized use of private data, we introduce a novel watermarking technique via a backdoor-based membership inference approach named TextMarker, which can safeguard diverse forms of private information embedded in the training text data. Specifically, TextMarker only requires data owners to mark a small number of samples for data copyright protection under the black-box access assumption to the target model. Through extensive evaluation, we demonstrate the effectiveness of TextMarker on various real-world datasets, e.g., marking only 0.1% of the training dataset is practically sufficient for effective membership inference with negligible effect on model utility. We also discuss potential countermeasures and show that TextMarker is stealthy enough to bypass them

    OptIForest: Optimal Isolation Forest for Anomaly Detection

    Full text link
    Anomaly detection plays an increasingly important role in various fields for critical tasks such as intrusion detection in cybersecurity, financial risk detection, and human health monitoring. A variety of anomaly detection methods have been proposed, and a category based on the isolation forest mechanism stands out due to its simplicity, effectiveness, and efficiency, e.g., iForest is often employed as a state-of-the-art detector for real deployment. While the majority of isolation forests use the binary structure, a framework LSHiForest has demonstrated that the multi-fork isolation tree structure can lead to better detection performance. However, there is no theoretical work answering the fundamentally and practically important question on the optimal tree structure for an isolation forest with respect to the branching factor. In this paper, we establish a theory on isolation efficiency to answer the question and determine the optimal branching factor for an isolation tree. Based on the theoretical underpinning, we design a practical optimal isolation forest OptIForest incorporating clustering based learning to hash which enables more information to be learned from data for better isolation quality. The rationale of our approach relies on a better bias-variance trade-off achieved by bias reduction in OptIForest. Extensive experiments on a series of benchmarking datasets for comparative and ablation studies demonstrate that our approach can efficiently and robustly achieve better detection performance in general than the state-of-the-arts including the deep learning based methods.Comment: This paper has been accepted by International Joint Conference on Artificial Intelligence (IJCAI-23

    Mutation screening of the SLC26A4 gene in a cohort of 192 Chinese patients with congenital hypothyroidism

    Get PDF
    ABSTRACT Objective: Pendred syndrome (PS) is an autosomal recessive disorder characterised by sensorineural hearing loss and thyroid dyshormonogenesis. It is caused by biallelic mutations in the SLC26A4 gene encoding for pendrin. Hypothyroidism in PS can be present from birth and therefore diagnosed by neonatal screening. The aim of this study was to examine the SLC26A4 mutation spectrum and prevalence among congenital hypothyroidism (CH) patients in the Guangxi Zhuang Autonomous Region of China and to establish how frequently PS causes hearing impairment in our patients with CH. Subjects and methods: Blood samples were collected from 192 CH patients in Guangxi Zhuang Autonomous Region, China, and genomic DNA was extracted from peripheral blood leukocytes. All exons of the SLC26A4 gene together with their exon-intron boundaries were screened by nextgeneration sequencing. Patients with SLC26A4 mutations underwent a complete audiological evaluation including otoscopic examination, audiometry and morphological evaluation of the inner ear. Results: Next generation sequencing analysis of SLC26A4 in 192 CH patients revealed five different heterozygous variations in eight individuals (8/192, 4%). The prevalence of SLC26A4 mutations was 4% among studied Chinese CH. Three of the eight were diagnosed as enlargement of the vestibular aqueduct (EVA), no PS were found in our 192 CH patients. The mutations included one novel missense variant p.P469S, as well as four known missense variants, namely p.V233L, p.M147I, p.V609G and p.D661E. Of the eight patients identified with SLC26A4 variations in our study, seven patients showed normal size/location of thyroid gland, and one patients showed a decreased size one. Conclusions: The prevalence of SLC26A4 pathogenic variants was 4% among studied Chinese patients with CH. Our study expanded the SLC26A4 mutation spectrum, provided the best estimation of SLC26A4 mutation rate for Chinese CH patients and indicated the rarity of PS as a cause of CH. Arch Endocrinol Metab. 2016;60(4):323-

    Mutation screening of the SLC26A4 gene in a cohort of 192 Chinese patients with congenital hypothyroidism

    Get PDF
    ABSTRACT Objective: Pendred syndrome (PS) is an autosomal recessive disorder characterised by sensorineural hearing loss and thyroid dyshormonogenesis. It is caused by biallelic mutations in the SLC26A4 gene encoding for pendrin. Hypothyroidism in PS can be present from birth and therefore diagnosed by neonatal screening. The aim of this study was to examine the SLC26A4 mutation spectrum and prevalence among congenital hypothyroidism (CH) patients in the Guangxi Zhuang Autonomous Region of China and to establish how frequently PS causes hearing impairment in our patients with CH. Subjects and methods: Blood samples were collected from 192 CH patients in Guangxi Zhuang Autonomous Region, China, and genomic DNA was extracted from peripheral blood leukocytes. All exons of the SLC26A4 gene together with their exon-intron boundaries were screened by next-generation sequencing. Patients with SLC26A4 mutations underwent a complete audiological evaluation including otoscopic examination, audiometry and morphological evaluation of the inner ear. Results: Next generation sequencing analysis of SLC26A4 in 192 CH patients revealed five different heterozygous variations in eight individuals (8/192, 4%). The prevalence of SLC26A4 mutations was 4% among studied Chinese CH. Three of the eight were diagnosed as enlargement of the vestibular aqueduct (EVA), no PS were found in our 192 CH patients. The mutations included one novel missense variant p.P469S, as well as four known missense variants, namely p.V233L, p.M147I, p.V609G and p.D661E. Of the eight patients identified with SLC26A4 variations in our study, seven patients showed normal size/location of thyroid gland, and one patients showed a decreased size one. Conclusions: The prevalence of SLC26A4 pathogenic variants was 4% among studied Chinese patients with CH. Our study expanded the SLC26A4 mutation spectrum, provided the best estimation of SLC26A4 mutation rate for Chinese CH patients and indicated the rarity of PS as a cause of CH

    capture targets bed file of customized short stature panel by NimbleDesign

    No full text
    capture targets bed file of customized short stature panel by NimbleDesig

    primary targets bed file of customized short stature panel performed on NimbleDesign

    No full text
    primary targets bed file of customized short stature panel performed on NimbleDesig
    corecore