Search CORE

8 research outputs found

Many-to-One Boundary Labeling with Backbones

Author: Bekos Michael A.
Cornelsen Sabine
Fink Martin
Hong Seokhee
Kaufmann Michael
Nöllenburg Martin
Rutter Ignaz
Symvonis Antonios
Publication venue
Publication date: 01/01/2013
Field of study

In this paper we study \emph{many-to-one boundary labeling with backbone leaders}. In this new many-to-one model, a horizontal backbone reaches out of each label into the feature-enclosing rectangle. Feature points that need to be connected to this label are linked via vertical line segments to the backbone. We present dynamic programming algorithms for label number and total leader length minimization of crossing-free backbone labelings. When crossings are allowed, we aim to obtain solutions with the minimum number of crossings. This can be achieved efficiently in the case of fixed label order, however, in the case of flexible label order we show that minimizing the number of leader crossings is NP-hard.Comment: 23 pages, 10 figures, this is the full version of a paper that is about to appear in GD'1

arXiv.org e-Print Archive

Crossref

Graph Drawing E-print Archive

Who Wrote this Code? Watermarking for Code Generation

Author: Ahn Jaewoo
Hong Ilgee
Hong Seokhee
Kim Gunhee
Lee Hwaran
Lee Taehyun
Shin Jamin
Yun Sangdoo
Publication venue
Publication date: 24/05/2023
Field of study

Large language models for code have recently shown remarkable performance in generating executable code. However, this rapid advancement has been accompanied by many legal and ethical concerns, such as code licensing issues, code plagiarism, and malware generation, making watermarking machine-generated code a very timely problem. Despite such imminent needs, we discover that existing watermarking and machine-generated text detection methods for LLMs fail to function with code generation tasks properly. Hence, in this work, we propose a new watermarking method, SWEET, that significantly improves upon previous approaches when watermarking machine-generated code. Our proposed method selectively applies watermarking to the tokens with high enough entropy, surpassing a defined threshold. The experiments on code generation benchmarks show that our watermarked code has superior quality compared to code produced by the previous state-of-the-art LLM watermarking method. Furthermore, our watermark method also outperforms DetectGPT for the task of machine-generated code detection

arXiv.org e-Print Archive

KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Application

Author: Ha Jung-Woo
Hong Seokhee
Kim Gunhee
Kim Takyoung
Lee Hwaran
Park Joonsuk
Publication venue
Publication date: 29/05/2023
Field of study

Large language models (LLMs) learn not only natural text generation abilities but also social biases against different demographic groups from real-world data. This poses a critical risk when deploying LLM-based applications. Existing research and resources are not readily applicable in South Korea due to the differences in language and culture, both of which significantly affect the biases and targeted demographic groups. This limitation requires localized social bias datasets to ensure the safe and effective deployment of LLMs. To this end, we present KO SB I, a new social bias dataset of 34k pairs of contexts and sentences in Korean covering 72 demographic groups in 15 categories. We find that through filtering-based moderation, social biases in generated content can be reduced by 16.47%p on average for HyperCLOVA (30B and 82B), and GPT-3.Comment: 17 pages, 8 figures, 12 tables, ACL 202

arXiv.org e-Print Archive

SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration

Author: Cha Meeyoung
Choi Yejin
Ha Jung-Woo
Hong Seokhee
Kim Byoung Pil
Kim Gunhee
Kim Takyoung
Lee Eun-Ju
Lee Hwaran
Lim Yong
Oh Alice
Park Joonsuk
Park Sangchul
Publication venue
Publication date: 28/05/2023
Field of study

The potential social harms that large language models pose, such as generating offensive content and reinforcing biases, are steeply rising. Existing works focus on coping with this concern while interacting with ill-intentioned users, such as those who explicitly make hate speech or elicit harmful responses. However, discussions on sensitive issues can become toxic even if the users are well-intentioned. For safer models in such scenarios, we present the Sensitive Questions and Acceptable Response (SQuARe) dataset, a large-scale Korean dataset of 49k sensitive questions with 42k acceptable and 46k non-acceptable responses. The dataset was constructed leveraging HyperCLOVA in a human-in-the-loop manner based on real news headlines. Experiments show that acceptable response generation significantly improves for HyperCLOVA and GPT-3, demonstrating the efficacy of this dataset.Comment: 19 pages, 10 figures, ACL 202

arXiv.org e-Print Archive

A Partially Distributed Intrusion Detection System for Wireless Sensor Networks

Author: Choong Seon Hong
Eung Jun Cho
Seokhee Jeon
Sungwon Lee
Publication venue: 'MDPI AG'
Publication date: 01/11/2013
Field of study

The increasing use of wireless sensor networks, which normally comprise several very small sensor nodes, makes their security an increasingly important issue. They can be practically and efficiently secured using intrusion detection systems. Conventional security mechanisms are not usually applicable due to the sensor nodes having limitations of computational power, memory capacity, and battery power. Therefore, specific security systems should be designed to function under constraints of energy or memory. A partially distributed intrusion detection system with low memory and power demands is proposed here. It employs a Bloom filter, which allows reduced signature code size. Multiple Bloom filters can be combined to reduce the signature code for each Bloom filter array. The mechanism could then cope with potential denial of service attacks, unlike many previous detection systems with Bloom filters. The mechanism was evaluated and validated through analysis and simulation

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

PubMed Central