8 research outputs found

    Many-to-One Boundary Labeling with Backbones

    Full text link
    In this paper we study \emph{many-to-one boundary labeling with backbone leaders}. In this new many-to-one model, a horizontal backbone reaches out of each label into the feature-enclosing rectangle. Feature points that need to be connected to this label are linked via vertical line segments to the backbone. We present dynamic programming algorithms for label number and total leader length minimization of crossing-free backbone labelings. When crossings are allowed, we aim to obtain solutions with the minimum number of crossings. This can be achieved efficiently in the case of fixed label order, however, in the case of flexible label order we show that minimizing the number of leader crossings is NP-hard.Comment: 23 pages, 10 figures, this is the full version of a paper that is about to appear in GD'1

    Who Wrote this Code? Watermarking for Code Generation

    Full text link
    Large language models for code have recently shown remarkable performance in generating executable code. However, this rapid advancement has been accompanied by many legal and ethical concerns, such as code licensing issues, code plagiarism, and malware generation, making watermarking machine-generated code a very timely problem. Despite such imminent needs, we discover that existing watermarking and machine-generated text detection methods for LLMs fail to function with code generation tasks properly. Hence, in this work, we propose a new watermarking method, SWEET, that significantly improves upon previous approaches when watermarking machine-generated code. Our proposed method selectively applies watermarking to the tokens with high enough entropy, surpassing a defined threshold. The experiments on code generation benchmarks show that our watermarked code has superior quality compared to code produced by the previous state-of-the-art LLM watermarking method. Furthermore, our watermark method also outperforms DetectGPT for the task of machine-generated code detection

    KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Application

    Full text link
    Large language models (LLMs) learn not only natural text generation abilities but also social biases against different demographic groups from real-world data. This poses a critical risk when deploying LLM-based applications. Existing research and resources are not readily applicable in South Korea due to the differences in language and culture, both of which significantly affect the biases and targeted demographic groups. This limitation requires localized social bias datasets to ensure the safe and effective deployment of LLMs. To this end, we present KO SB I, a new social bias dataset of 34k pairs of contexts and sentences in Korean covering 72 demographic groups in 15 categories. We find that through filtering-based moderation, social biases in generated content can be reduced by 16.47%p on average for HyperCLOVA (30B and 82B), and GPT-3.Comment: 17 pages, 8 figures, 12 tables, ACL 202

    SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration

    Full text link
    The potential social harms that large language models pose, such as generating offensive content and reinforcing biases, are steeply rising. Existing works focus on coping with this concern while interacting with ill-intentioned users, such as those who explicitly make hate speech or elicit harmful responses. However, discussions on sensitive issues can become toxic even if the users are well-intentioned. For safer models in such scenarios, we present the Sensitive Questions and Acceptable Response (SQuARe) dataset, a large-scale Korean dataset of 49k sensitive questions with 42k acceptable and 46k non-acceptable responses. The dataset was constructed leveraging HyperCLOVA in a human-in-the-loop manner based on real news headlines. Experiments show that acceptable response generation significantly improves for HyperCLOVA and GPT-3, demonstrating the efficacy of this dataset.Comment: 19 pages, 10 figures, ACL 202

    A Partially Distributed Intrusion Detection System for Wireless Sensor Networks

    Get PDF
    The increasing use of wireless sensor networks, which normally comprise several very small sensor nodes, makes their security an increasingly important issue. They can be practically and efficiently secured using intrusion detection systems. Conventional security mechanisms are not usually applicable due to the sensor nodes having limitations of computational power, memory capacity, and battery power. Therefore, specific security systems should be designed to function under constraints of energy or memory. A partially distributed intrusion detection system with low memory and power demands is proposed here. It employs a Bloom filter, which allows reduced signature code size. Multiple Bloom filters can be combined to reduce the signature code for each Bloom filter array. The mechanism could then cope with potential denial of service attacks, unlike many previous detection systems with Bloom filters. The mechanism was evaluated and validated through analysis and simulation
    corecore