Search CORE

19 research outputs found

A Self-Supervised Automatic Post-Editing Data Generation Tool

Author: Eo Sugyeong
Lee SeungJun
Lim Heuiseok
Moon Hyeonseok
Park Chanjun
Seo Jaehyung
Publication venue
Publication date: 09/06/2022
Field of study

Data building for automatic post-editing (APE) requires extensive and expert-level human effort, as it contains an elaborate process that involves identifying errors in sentences and providing suitable revisions. Hence, we develop a self-supervised data generation tool, deployable as a web application, that minimizes human supervision and constructs personalized APE data from a parallel corpus for several language pairs with English as the target language. Data-centric APE research can be conducted using this tool, involving many language pairs that have not been studied thus far owing to the lack of suitable data.Comment: Accepted for DataPerf workshop at ICML 202

arXiv.org e-Print Archive

Simpler and Faster BFV Bootstrapping for Arbitrary Plaintext Modulus from CKKS

Author: Jaehyung Kim
Jinyeong Seo
Yongsoo Song
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 26/01/2024
Field of study

Bootstrapping is a key operation in fully homomorphic encryption schemes that enables the evaluation of arbitrary multiplicative depth circuits. In the BFV scheme, bootstrapping corresponds to reducing the size of accumulated noise in lower bits while preserving the plaintext in the upper bits. The previous instantiation of BFV bootstrapping is achieved through the digit extraction procedure. However, its performance is highly dependent on the plaintext modulus, so only a limited form of the plaintext modulus, a power of a small prime number, was used for the efficiency of bootstrapping. In this paper, we present a novel approach to instantiate BFV bootstrapping, distinct from the previous digit extraction-based method. The core idea of our bootstrapping is to utilize CKKS bootstrapping as a subroutine, so the performance of our method mainly depends on the underlying CKKS bootstrapping rather than the plaintext modulus. We implement our method at a proof-of-concept level to provide concrete benchmark results. When performing the bootstrapping operation for a 51-bits plaintext modulus, our method improves the previous digit extraction-based method by a factor of 37.9 in latency and 29.4 in throughput. Additionally, we achieve viable bootstrapping performance for large plaintext moduli, such as 144-bits and 234-bits, which has never been measured before

Cryptology ePrint Archive

QUAK: A Synthetic Quality Estimation Dataset for Korean-English Neural Machine Translation

Author: Eo Sugyeong
Kim Gyeongmin
Lee Jungseob
Lim Heuiseok
Moon Hyeonseok
Park Chanjun
Seo Jaehyung
Publication venue
Publication date: 30/09/2022
Field of study

With the recent advance in neural machine translation demonstrating its importance, research on quality estimation (QE) has been steadily progressing. QE aims to automatically predict the quality of machine translation (MT) output without reference sentences. Despite its high utility in the real world, there remain several limitations concerning manual QE data creation: inevitably incurred non-trivial costs due to the need for translation experts, and issues with data scaling and language expansion. To tackle these limitations, we present QUAK, a Korean-English synthetic QE dataset generated in a fully automatic manner. This consists of three sub-QUAK datasets QUAK-M, QUAK-P, and QUAK-H, produced through three strategies that are relatively free from language constraints. Since each strategy requires no human effort, which facilitates scalability, we scale our data up to 1.58M for QUAK-P, H and 6.58M for QUAK-M. As an experiment, we quantitatively analyze word-level QE results in various ways while performing statistical analysis. Moreover, we show that datasets scaled in an efficient way also contribute to performance improvements by observing meaningful performance gains in QUAK-M, P when adding data up to 1.58M

arXiv.org e-Print Archive

Ice Velocity Mapping of Ross Ice Shelf, Antarctica by Matching Surface Undulations Measured by Icesat Laser Altimetry

Author: Han Shin-Chan
Lee Choon-Ki
Scambos Ted A.
Seo Ki-Weon
Yu Jaehyung
Publication venue
Publication date
Field of study

We present a novel method for estimating the surface horizontal velocity on ice shelves using laser altimetrydata from the Ice Cloud and land Elevation Satellite (ICESat; 20032009). The method matches undulations measured at crossover points between successive campaigns

NASA Technical Reports Server

25th annual computational neuroscience meeting: CNS-2016

Author: Abbott L.F.
Abeysuriya Romesh G.
Aertsen Ad
Agnes Everton J.
Ahamed Tosif
Ahmadabadi Majid Nili
Ahn Sora
Aihara Kazuyuki
Aihara Kazuyuki
Andreassen Ole A.
Andreassen Ole A.
Ardestani Mohammad Hovaidi
Ardestani Mohammad Hovaidi
Arroyo David
Aton Sara J.
Babichev Andrey
Bachmann Claudia
Badel Laurent
Baek Hyeon-Man
Baek JeongHun
Baek Kwangyeol
Bahuguna Jyotika
Bak Ji Hyun
Baker Chris I.
Bakker Rembrandt
Balaguer‑Ballester Emili
Bard G.
Barnett William H.
Baroni Fabiano
Basnayake Kanishka
Baysal Velt
Bennett Matthew R.
Bernard Christophe
Berry Hugues
Beuth Frederick
Bezgin` Gleb
Bill Johannes
Birgolias Justas
Blackwell Justin
Bohnenkamp Lisa
Bojak Ingo
Borisyuk Roman
Bos Hannah
Bradley Samual P.
Breakspear Michael
Breitwieser Oliver
Briaire` Jeroen J.
Briggman Kevin L
Brinkman Braden A.
Brown John
Brown Ritchie E.
Brunel Nicolas
Buhry Laure
Buice Michael
Burkitt Anthony N.
Burton Shawn D.
Buttler Simone
Bytschok Ilja
Cantarelli Matteo
Chakravarthy V.Srinivasa
Chan Ho Ka
Chapman Phillip D.
Chatzikalymniou Alexandra Pierri
Chavane Frédéric
Chen Liang
Chen Weiliang
Cheung Chung Ching
Chhabria Karishma
Chintaluri Chaitanya
Choe Yoonsuck
Choi Hannah
Choi Hansol
Choi Ilhwan
Choi Jee Hyun
Choi Woochul
Choi Yun Seo
Choung Oh‑hyeon
Chung SueYeon
Clarke Eric F.
Clements Katie
Cloherty Shaun L.
Clopath Claudia
Cocchi Luca
Cohen Yale E.
Cook Mark
Crook Sharon M.
Cserpán Dorottya
Culmone Viviana
Dabaghian Yuri
Dabaghian Yuri
Dale Anders M.
Daly Kevin C.
Dasgupta Sakyasingha
Davey Neil
Davey Neil
Davison Andrew
de Weerd Peter
Deco Gustavo
Demkó László
Demutz Harald
Denk Cornelia
Destexhe Alain
Devor Anna
DeVuti Justin
Diamond Alan
Diesmann Markus
Dillen Kim
Doya Kenji
Dragoi Valentin
Draguljić Daniel
Drew Jordan
Drysdale Peter M.
Duarte Renato
Dura‑Bernal Salvador
Dura‑Bernal Salvador
Dura‑Bernal Salvador
Edwards Andy
Einevoll Gaute T.
Elices Irene
Elnevoll Gaute T.
Ernst Udo A.
Esler Timothy B.
Esposito Elric
Faraji Mohammad Java
Fedorov Leonid A.
Fenk Lisa M.
Ferguson Katie
Ferrario Andrea
Filipovi Marko
Fink Christian G.
Fink Gereon R.
Fishman Yonatan I.
Fornito Alex
Forrow Csaba
Fouquet Coralie
Frangou Sophia
Freestone Dean R.
Frijns Johan H. M.
Fulcher Ben D.
Fung Felix
Gajic N. Alex Cayco
Gallimore Andrew R.
Gallinaro Júlia
Gerkin Richard C.
Gerstner Wulfram
Giaffar Hamza
Giese Martin
Giese Martin
Giese Martin A.
Gilson Matthieu
Gips Bart
Gleeson Padraig
Gliske Stephen V.
Glomb Katharina
Goetze Felix
Goldsworthy Mitchell R.
Gollo Leonardo L.
Goncharenko Julia
Goodarzinic Abdorreza
Graham Bruce P.
Grayden David B.
Grayden David B.
Grewe Jan
Hadrava Michal
Hagen Espen
Halnes Geir
Halnes Geir
Hamade Khaldoun
Hamker Fred H.
Han Hio-Been
Han Seung Kee
Hansen Mads
Harper Zachary J.
He Hu
Helias Moritz
Hermann Christoph S.
Hilgetag` Claus‑Christian
Hines Michael L.
Hlinka Jaroslav
Hof Patrick R.
Holman Katherine A.
Hong Sungho
Hordacre Brenton
Howard Jr. James H.
Huang Guang-Bin
Huang Haiping
Huerta Ramon
Huh Dongsung
Hutt Axel
Hwang Dong‑Uk
Hwang Eunjin
Hye Jr. Eoon
Iannella Nicolangelo
Iannella Nicolangelo
Ibbotson Michael R.
Ionta Silvlo
Ishii Shin
Issa Fadi A.
Iyer Ramakrishnan
Jacobs Heidi
Jang Hyun Jae
Jang Jaeson
Jang Jaeson
Jensen Ole
Jeong Jaeseung
Jeong Jaesung
Jeong Yong
Jirsa Viktor K.
Jo Sumin
Joo Pangyu
Josić Kresimir
Ju Huiwen
Jun Eunji
Jun Sang Beom
Jung Nam
Jung Woo-Sung
Jung Younginha
Kahng B.
Kale Penelope J.
Kalkman Randy K.
Kameneva Tatiana
Kameneva Tatiana
Kang Jiyoung
Karoly Philippa J.
Kasumi Ohta
Kavalali Enge T.
Kawato Mitsuo
Kazama Hokto
Kedziora David J.
Kekona Tyler
Keller Daniel
Kennedy Henry
Kepple Daniel
Kerr Cliff C.
Kerr Robert R.
Kilpatrick Zachary P.
Kim Ammo J.
Kim Bowon
Kim Bowon
Kim Chang Sub
Kim DaeEun
Kim Hojeong
Kim Hoon-Hee
Kim Hyoungkyu
Kim Jae Kyoung
Kim Jimin
Kim Jinseop
Kim Juhee
Kim Minjung
Kim Seongkyun
Kim Su Hyun
Kim Sung-Phil
Kim Sung-Phil
Kim Tae
Kim Taegyo
Kim Won Sup
Kim Youngsoo
Kiser Seth A.
Klanner Felix
Kleberg Florence I.
Klingbeil Guido
Knösche Thomas
Koren Veronika
Koren Veronika
Kotaleski Jeanette Hellgren
Koulakov Alex
Kralik Jerald D.
Kringelbach Morten L.
Kruscha Alexandra
Kuhlmann Levin
Kukolja Juraj
Kumar Arvind
Kumar Arvind
Kundu Prantik
Kunze Tim
Kuravi Pradeep
Kwag Jeehyun
Kwon Jaehyung
Lai Pik‑Yin
Lakatos Peter
Latorre Roberto
Leahy Will
Lee Changju
Lee Chungho
Lee Dan D.
Lee Do-won
Lee Heonsoo
Lee Hyang Jung
Lee Hyang Woon
Lee Hyeonsu
Lee Jae Woo
Lee Jaejin
Lee Jeungmin
Lee Joonwon
Lee Jung H.
Lee Sang Wan
Lee Sang-Hun
Lee Seungjun
Lee Soohyun
Lee Sue-Hyun
Lee Tae Ho
Lee Won Hee
Lee Yong‑il
Lefebvre Baptiste
Lefebvre Jérémie
Leleu Timothée
Leng Luziwei
Levi Rafael
Levina Anna
Levy Brandon A.
Li Luozheng
Liang Guangsheng
Lidner Benjamin
Liedtke Joscha
Lim Daeseob
Lim Sewoong
Lin Xiahoan
Linder Benjamin
Lines Glenn T.
Lizler Joseph T.
Lochmann Timm
Lowet Eric
Luebke Jennifer
Lytton William W.
Lytton William W.
Lyu Cheng
Ma Hailin
Maeng Seung Eu
Malmon Gabby
Mandall Alekhya
Maouene M.
Marcelli Angelo
Marin Boris
Markin Sergey
Markram Henry
Marre Olivier
Marsalek Petr
Marsat Gary
Martel Roman
Marucci Lucia
Maturana Matias I.
McCarley Robert W.
McDonnell Mark D.
McDonnell Mark D.
McKenna James T.
McLauchlan Campbell
Meffin Hamish
Meffin Hamish
Mehta Hima
Meier Karlheinz
Meijas Jorge F.
Mellen Nick
Memmeshei Raol-Martin
Menzies Rosemary J.
Merriosn-Hort Robert
Metzner Christoph
Mi Yuanyuan
Mi Yuanyuan
Mihalas Stefan
Miller Thomas
Moezzi Bahar
Moezzi Bahar
Molkov Yaroslav I.
Moon Jangsup
Moon Seok-hun
Morris Laurel S.
Morrison Abigail
Mosqueiro Thiago S
Mu Shang
Muler Eilif
Muralidharan Vignesh
Murray John D.
Murray Micha M.
Mäki‑Marttunen Tuomo
Neymotin Samuel
Neymotin Samuel A.
Niry Mohammad
Nishikawa Isao
Nolte Max
Nowotny Thomas
Oba Shigeyuki
Obermayer Klaus
Obermayer Klaus
Ognjanovski Nicolette
Ouyang Guang
Ozer Mahmut
Paik Se-Bum
Paik Se‑Bum
Palmer S.E.
Palva Matias J.
Paninski Liam
Pariz Aref
Park Chang-hyun
Park Choongseok
Park Hae‑Jeong
Park Ji Sung
Park Memming
Park Sang-Min
Park Sol
Parsi Shervin S.
Parziale Antonio
Pasupathy Anitha
Perotti Luca
Peterson Andre
Petkoski Spase
Petrovici Mihai A.
Petterson Klas H.
Philips Ryan T.
Phillips Ryan S.
Pillow Jonathan
Pittà Maurizio De
Plogmacher Lukas
Podlaski William
Pollonini Luca
Ponce‑Alvarez Adrián
Popp Pamela Osborn
Preuschoff Kerstin
Priesemann Viola
Priesemann Viola
Priyadharsini B. Praga
Psarrou Maria
Quang Le Anh
Quintana Adrian
Ramsey Julia
Ranjan Rajnish
Rankin James
Rankin James
Rasch Malte J.
Rasuli Nader
Ratnadurai‑Giridharan Shivakeshavan
Reig Ramon
Reimann Michael W.
Rennle Chris J.
Reyes Amy
Richter René
Ridding Michael C.
Rieke Fred
Rinberg Dima
Rinzel John
Ritter Petra
Roach James P.
Robb Daniel T.
Roberts Mark J.
Robinson Peter A.
Robinson Peter A.
Rodriguez Francisco B.
Rotter Stefan
Rubchinsky Leonid L.
Rubinov Mikail
Rumbell Timothy
Rupp André
Rybak Ilya A.
Ryu Juhyoung
Sadeh Sadra
Saggio Maria L.
Sander Leonard M.
Sanger Terence D.
Sanz-Leon Paula
Sanz‑Leon Paula
Saska Daniel
Schaworonkow Natalie
Schemmel Johannes
Scheutz Matthias
Schiff Steven J.
Schilstra Maria
Schilstra Marla
Schmidt Maximilian
Schmidt Robert
Schottdorf Manual
Schutter Erik De
Schwikard Achim
Seeholzer Alexander
Seidenstein Alexandra
Sejnowski Terrence J.
Sekulić Vladisla
Senatore Rosa
Senk Johanna
Seo Sat Byul
Seung H. Sebastian
Sharpee Tatyana O.
Shea Steven
Shea-Brown Eric
Shea‑Brown Eric
Shen Kelly
Shiau LieJune
Shimazaki Hideaki
Shin Hee‑sup
Shin In-Seob
Shivkumar Sabyasach
Shlizerman Eli
Shomali Safura Rashid
Siep Silvan F.
Silberberg Gilad
Silver Angus
Silver R. Angus
Skiker K.
Skilling Quinton M.
Skinner Frances K.
Skinner Frances K.
Smit Daniel
Smith Brian
Smith Jeffrey
Soh Jaehyun
Soman Karthik
Somogyvári Zoltán
Sompolinsky Haim
Song Min
Song Min-Ho
Song Youngjo
Soundry Daniel
Sourina Olga
Spampinato Giulia Lia Beatrice
Spiegler Andreas
Spinney Richard E.
Sprecher Simon
Stacey William C.
Stacey William C.
Stephens Greg
Stern Merav
Steuber Volker
Steyn-Ross D. Alistair
Steyn-Ross Moira L.
Stimberg Marcel
Strube‑Bloss Martin F.
Stöckel David
Su Jianzhong
Sun Haoqi
Sweeney Yann
Tabas Alejandro
Tahayori Bahman
Takashima Akira
Tam Nicoladie D.
Tamagnini Francesco
Tang Rongxiang
Tang Yi-Yuan
Tang Yi-Yuan
Teka Wondimu
Tetzlaff Tom
Tezuka Taro
Toporikova Natalia
Torres Joaquin J.
Toyoizumi Taro
Tran Patricia H. P.
Trembleau Alain
Triesch Jochen
Trisch Jochen
Tsaneva‑Atanasova Krasimira
Tsuchimoto Yoshiko
Tuomo Maki-Martun
Tveito Aslak
Valizadeh Alireza
Valizadeh Alireza
van Albada Sacha J
van Albada Sacha J.
van der Eerden Jan
Varona Pablo
Varona Pablo
Veale Richard
Viriyopase Atthaphon
Vitay Julien
Vogels Rufin
Vogels Tim
Vogels Tim P.
Vogt Simon M.
Voon Valerie
Voronenko Sergej O.
Vuust Peter
Vörös János
Wallentin Mikkel
Wang Dahui
Wang Jisung
Wang Sheng-Ju
Wang Yuzhe
Warburton Julia M.
Weaver Christina M.
Wegener Detlef
Weidel Philipp
Welzig Charles M.
Werdt Stephen Van
Wibral Michael
Wickens Jeffery R.
Widmer Yves
Witek Maria A. G.
Witting Jens
Wolf Fred
Wong Michael
Wu Si
Wu Sl
Wójcik Daniel K.
Xu Zhiheng
Yamada Yasnori
Yamamura Yorkio
Yang Huei-Fang
Yang Xu
Yeon Ji Won
Yger Pierre
Yilmaz Ergin
Yoo Minsu
Yoon Sangsup
Yoshimoto Junichiro
Young-Ah Rho
Yu Suin
Zaho Yuan
Zamora Criseida
Zaptocky Martin
Zhang Mingsha
Zhang Wenhao
Zhao Chang
Zhao Xiaochen
Zhao Xuelong
Zhou Changsong
Zochowski Michal
Zochowski Michal R.
Zouridakis George
Zurowski Bartosz
Publication venue: BMC
Publication date: 01/01/2016
Field of study

The same neuron may play different functional roles in the neural circuits to which it belongs. For example, neurons in the Tritonia pedal ganglia may participate in variable phases of the swim motor rhythms [1]. While such neuronal functional variability is likely to play a major role the delivery of the functionality of neural systems, it is difficult to study it in most nervous systems. We work on the pyloric rhythm network of the crustacean stomatogastric ganglion (STG) [2]. Typically network models of the STG treat neurons of the same functional type as a single model neuron (e.g. PD neurons), assuming the same conductance parameters for these neurons and implying their synchronous firing [3, 4]. However, simultaneous recording of PD neurons shows differences between the timings of spikes of these neurons. This may indicate functional variability of these neurons. Here we modelled separately the two PD neurons of the STG in a multi-neuron model of the pyloric network. Our neuron models comply with known correlations between conductance parameters of ionic currents. Our results reproduce the experimental finding of increasing spike time distance between spikes originating from the two model PD neurons during their synchronised burst phase. The PD neuron with the larger calcium conductance generates its spikes before the other PD neuron. Larger potassium conductance values in the follower neuron imply longer delays between spikes, see Fig. 17.Neuromodulators change the conductance parameters of neurons and maintain the ratios of these parameters [5]. Our results show that such changes may shift the individual contribution of two PD neurons to the PD-phase of the pyloric rhythm altering their functionality within this rhythm. Our work paves the way towards an accessible experimental and computational framework for the analysis of the mechanisms and impact of functional variability of neurons within the neural circuits to which they belong

HAL AMU

ScholarWorks@UNIST

Juelich Shared Electronic Resources

Central Archive at the University of Reading

Crossref

IUPUIScholarWorks

Springer - Publisher Connector

Harvard University - DASH

Heidelberger Dokumentenserver

PubMed Central

Archivio della Ricerca - Università di Salerno

Apollo (Cambridge)

Repository@Napier

University of Hertfordshire Research Archive

DSpace at Rice University

Deep Blue Documents at the University of Michigan

Dense-to-Question and Sparse-to-Answer: Hybrid Retriever System for Industrial Frequently Asked Questions

Author: Jaehyung Seo
Publication venue: 'MDPI AG'
Publication date: 18/04/2022
Field of study

The term “Frequently asked questions” (FAQ) refers to a query that is asked repeatedly and produces a manually constructed response. It is one of the most important factors influencing customer repurchase and brand loyalty; thus, most industry domains invest heavily in it. This has led to deep-learning-based retrieval models being studied. However, training a model and creating a database specializing in each industry domain comes at a high cost, especially when using a chatbot-based conversation system, as a large amount of resources must be continuously input for the FAQ system’s maintenance. It is also difficult for small- and medium-sized companies and national institutions to build individualized training data and databases and obtain satisfactory results. As a result, based on the deep learning information retrieval module, we propose a method of returning responses to customer inquiries using only data that can be easily obtained from companies. We hybridize dense embedding and sparse embedding in this work to make it more robust in professional terms, and we propose new functions to adjust the weight ratio and scale the results returned by the two modules

Multidisciplinary Digital Publishing Institute

Comparative Analysis of Current Approaches to Quality Estimation for Neural Machine Translation

Author: Chanjun Park
Heuiseok Lim
Hyeonseok Moon
Jaehyung Seo
Sugyeong Eo
Publication venue: 'MDPI AG'
Publication date: 01/07/2021
Field of study

Quality estimation (QE) has recently gained increasing interest as it can predict the quality of machine translation results without a reference translation. QE is an annual shared task at the Conference on Machine Translation (WMT), and most recent studies have applied the multilingual pretrained language model (mPLM) to address this task. Recent studies have focused on the performance improvement of this task using data augmentation with finetuning based on a large-scale mPLM. In this study, we eliminate the effects of data augmentation and conduct a pure performance comparison between various mPLMs. Separate from the recent performance-driven QE research involved in competitions addressing a shared task, we utilize the comparison for sub-tasks from WMT20 and identify an optimal mPLM. Moreover, we demonstrate QE using the multilingual BART model, which has not yet been utilized, and conduct comparative experiments and analyses with cross-lingual language models (XLMs), multilingual BERT, and XLM-RoBERTa

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

The ASR Post-Processor Performance Challenges of BackTranScription (BTS): Data-Centric and Model-Centric Approaches

Author: Chanhee Lee
Chanjun Park
Heuiseok Lim
Jaehyung Seo
Seolhwa Lee
Publication venue: MDPI AG
Publication date: 01/01/2022
Field of study

Training an automatic speech recognition (ASR) post-processor based on sequence-to-sequence (S2S) requires a parallel pair (e.g., speech recognition result and human post-edited sentence) to construct the dataset, which demands a great amount of human labor. BackTransScription (BTS) proposes a data-building method to mitigate the limitations of the existing S2S based ASR post-processors, which can automatically generate vast amounts of training datasets, reducing time and cost in data construction. Despite the emergence of this novel approach, the BTS-based ASR post-processor still has research challenges and is mostly untested in diverse approaches. In this study, we highlight these challenges through detailed experiments by analyzing the data-centric approach (i.e., controlling the amount of data without model alteration) and the model-centric approach (i.e., model modification). In other words, we attempt to point out problems with the current trend of research pursuing a model-centric approach and alert against ignoring the importance of the data. Our experiment results show that the data-centric approach outperformed the model-centric approach by +11.69, +17.64, and +19.02 in the F1-score, BLEU, and GLEU tests

Directory of Open Access Journals

Copenhagen University Research Information System

BERTOEIC: Solving TOEIC Problems Using Simple and Efficient Data Augmentation Techniques with Pretrained Transformer Encoders

Author: Chanjun Park
Heuiseok Lim
Hyeonseok Moon
Jaehyung Seo
Jeongwoo Lee
Sugyeong Eo
Publication venue: MDPI AG
Publication date: 01/07/2022
Field of study

Recent studies have attempted to understand natural language and infer answers. Machine reading comprehension is one of the representatives, and several related datasets have been opened. However, there are few official open datasets for the Test of English for International Communication (TOEIC), which is widely used for evaluating people’s English proficiency, and research for further advancement is not being actively conducted. We consider that the reason why deep learning research for TOEIC is difficult is due to the data scarcity problem, so we therefore propose two data augmentation methods to improve the model in a low resource environment. Considering the attributes of the semantic and grammar problem type in TOEIC, the proposed methods can augment the data similar to the real TOEIC problem by using POS-tagging and Lemmatizing. In addition, we confirmed the importance of understanding semantics and grammar in TOEIC through experiments on each proposed methodology and experiments according to the amount of data. The proposed methods address the data shortage problem of TOEIC and enable an acceptable human-level performance

Directory of Open Access Journals

Uncovering the Risks and Drawbacks Associated With the Use of Synthetic Data for Grammatical Error Correction

Author: Chanjun Park
Heuiseok Lim
Hyeonseok Moon
Jaehyung Seo
Seolhwa Lee
Seonmin Koo
Sugyeong Eo
Publication venue: IEEE
Publication date: 01/01/2023
Field of study

In a Data-Centric AI paradigm, the model performance is enhanced without altering the model architecture, as evidenced by real-world and benchmark dataset demonstrations. With the advancements of large language models (LLM), it has become increasingly feasible to generate high-quality synthetic data, while considering the need to construct fully synthetic datasets for real-world data containing numerous personal information. However, in-depth validation of the solely synthetic data setting has yet to be conducted, despite the increased possibility of models trained exclusively on fully synthetic data emerging in the future. Therefore, we examined the question, “Do data quality control techniques (known to positively impact data-centric AI) consistently aid models trained exclusively on synthetic datasets?”. To explore this query, we performed detailed analyses using synthetic datasets generated for speech recognition postprocessing using the BackTranScription (BTS) approach. Our study primarily addressed the potential adverse effects of data quality control measures (e.g., noise injection and balanced data) and training strategies in the context of synthetic-only experiments. As a result of the experiment, we observed the negative effect that the data-centric methodology drops by a maximum of 44.03 points in the fully synthetic data setting

Directory of Open Access Journals