Search CORE

43 research outputs found

Analyse syntaxique de langues faiblement dotées à partir de plongements de mots multilingues: Application au same du nord et au komi-zyriène

Author: Lim KyungTae
Partanen Niko
Poibeau Thierry
Publication venue: 'Associacio catalana de Salut Laboral'
Publication date: 01/01/2018
Field of study

International audienceThis article presents an attempt to apply efficient parsing methods based on recur- sive neural networks to languages for which very few resources are available. We propose an original approach based on multilingual word embeddings acquired from different languages so as to determine the best language combination for learning. The approach yields competitive results in contexts considered as linguistically difficult.Cet article présente une tentative pour appliquer des méthodes d'analyse syntaxique performantes, à base de réseaux de neurones récursifs, à des langues pour lesquelles on dispose de très peu de ressources. Nous proposons une méthode originale à base de plongements de mots multilingues obtenus à partir de langues plus ou moins proches typologiquement, afin de déterminer la meilleure combinaison de langues possibles pour l'apprentissage. L'approche a permis d'obtenir des résultats encourageants dans des contextes considérés comme linguisti-quement difficiles. Le code source est disponible en ligne (voir https://github.com/jujbob)

Yet Another Format of Universal Dependencies for Korean

Author: Chen Yige
Jo Eunkyul Leah
Lim KyungTae
Park Jungyeul
Silfverberg Miikka
Tyers Francis M.
Yao Yundong
Publication venue
Publication date: 20/09/2022
Field of study

In this study, we propose a morpheme-based scheme for Korean dependency parsing and adopt the proposed scheme to Universal Dependencies. We present the linguistic rationale that illustrates the motivation and the necessity of adopting the morpheme-based format, and develop scripts that convert between the original format used by Universal Dependencies and the proposed morpheme-based format automatically. The effectiveness of the proposed format for Korean dependency parsing is then testified by both statistical and neural models, including UDPipe and Stanza, with our carefully constructed morpheme-based word embedding for Korean. morphUD outperforms parsing results for all Korean UD treebanks, and we also present detailed error analyses.Comment: COLING2022, Poste

arXiv.org e-Print Archive

The first Komi-Zyrian Universal Dependencies treebanks

Author: Blokland Rogier
Lim KyungTae
Partanen Niko
Poibeau Thierry
Rießler Michael
Publication venue
Publication date: 01/01/2018
Field of study

Partanen N, Blokland R, Lim KT, Poibeau T, Rießler M. The first Komi-Zyrian Universal Dependencies treebanks. Presented at the 2018 Conference on Empirical Methods in Natural Language Processing (Universal Dependencies Workshop 2018), Brussels

Crossref

Publikationer från Uppsala Universitet

Publications at Bielefeld University

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Recommended from our members

A functional genetic toolbox for human tissue-derived organoids.

Author: Evans Lewis
Lim Kyungtae
Lutolf Matthias
Perrone Francesca
Rawlins Emma L
Rezakhani Saba
Sokleva Vanesa
Sun Dawei
Zilbauer Matthias
Publication venue: eLife
Publication date: 29/10/2021
Field of study

Funder: Alzheimers Research UK Stem Cell Research CentreHuman organoid systems recapitulate key features of organs offering platforms for modelling developmental biology and disease. Tissue-derived organoids have been widely used to study the impact of extrinsic niche factors on stem cells. However, they are rarely used to study endogenous gene function due to the lack of efficient gene manipulation tools. Previously, we established a human foetal lung organoid system (Nikolić et al., 2017). Here, using this organoid system as an example we have systematically developed and optimised a complete genetic toolbox for use in tissue-derived organoids. This includes 'Organoid Easytag' our efficient workflow for targeting all types of gene loci through CRISPR-mediated homologous recombination followed by flow cytometry for enriching correctly-targeted cells. Our toolbox also incorporates conditional gene knock-down or overexpression using tightly-inducible CRISPR interference and CRISPR activation which is the first efficient application of these techniques to tissue-derived organoids. These tools will facilitate gene perturbation studies in tissue-derived organoids facilitating human disease modelling and providing a functional counterpart to many on-going descriptive studies, such as the Human Cell Atlas Project

Apollo (Cambridge)

Dependency parsing of code-switching data with cross-lingual feature representations

Author: KyungTae Lim
Partanen Niko
Pirinen Tommi A.
Poibeau Thierry
Rießler Michael
Rießler Michael
Rueter Jack
Trosterud Trond
Tyers Francis M.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

Partanen N, KyungTae L, Rießler M, Poibeau T. Dependency parsing of code-switching data with cross-lingual feature representations. In: Pirinen TA, Rießler M, Rueter J, Trosterud T, Tyers FM, eds. Proceedings of the 4th International Workshop for Computational Linguistics for Uralic Languages. Helsinki: Association for Computational Linguistics; 2018: 1-17

Publications at Bielefeld University

Relatório de estágio em farmácia comunitária

Author: Abrams Mitchell
Ackermann Elia
Aepli Noëmi
Aghaei Hamid
Agić Željko
Ahmadi Amir
Ahrenberg Lars
Ajede Chika Kennedy
Aleksandravičiūtė Gabrielė
Alfina Ika
Antonsen Lene
Aplonova Katya
Aquino Angelina
Aragon Carolina
Aranzabe Maria Jesus
Arnardóttir Þórunn
Arutie Gashaw
Arwidarasti Jessica Naraiswari
Asahara Masayuki
Ateyah Luma
Atmaca Furkan
Attia Mohammed
Atutxa Aitziber
Augustinus Liesbeth
Badmaeva Elena
Balasubramani Keerthana
Ballesteros Miguel
Banerjee Esha
Bank Sebastian
Barbu Mititelu Verginica
Basmov Victoria
Batchelor Colin
Bauer John
Bedir Seyyit Talha
Bengoetxea Kepa
Berk Gözde
Berzak Yevgeni
Bhat Irshad Ahmad
Bhat Riyaz Ahmad
Biagetti Erica
Bick Eckhard
Bielinskienė Agnė
Bjarnadóttir Kristín
Blokland Rogier
Bobicev Victoria
Boizou Loïc
Borges Völker Emanuel
Bosco Cristina
Bouma Gosse
Bowman Sam
Boyd Adriane
Brokaitė Kristina
Burchardt Aljoscha
Börstell Carl
Candito Marie
Caron Bernard
Caron Gauthier
Cavalcanti Tatiana
Cebiroğlu Eryiğit Gülşen
Cecchini Flavio Massimiliano
Celano Giuseppe G. A.
Cetin Savas
Chalub Fabricio
Chi Ethan
Cho Yongseok
Choi Jinho
Chun Jayeol
Cignarella Alessandra T.
Cinková Silvie
Collomb Aurélie
Connor Miriam
Courtin Marine
Davidson Elizabeth
de Marneffe Marie-Catherine
de Paiva Valeria
de Souza Elvis
Derin Mehmet Oguz
Diaz de Ilarraza Arantza
Dickerson Carly
Dinakaramani Arawinda
Dione Bamba
Dirix Peter
Dobrovoljc Kaja
Dozat Timothy
Droganova Kira
Dwivedi Puneet
Eckhoff Hanne
Eli Marhaba
Elkahky Ali
Ephrem Binyam
Erina Olga
Erjavec Tomaž
Etienne Aline
Evelyn Wograine
Facundes Sidney
Farkas Richárd
Fernanda Marília
Fernandez Alcalde Hector
Foster Jennifer
Freitas Cláudia
Fujita Kazunori
Gajdošová Katarína
Galbraith Daniel
Garcia Marcos
Garza Sebastian
Gerardi Fabrício Ferraz
Gerdes Kim
Ginter Filip
Goenaga Iakes
Gojenola Koldo
Goldberg Yoav
González Saavedra Berta
Griciūtė Bernadeta
Grioni Matias
Grobol Loïc
Grūzītis Normunds
Guillaume Bruno
Guillot-Barbance Céline
Gärdenfors Moa
Gómez Guinovart Xavier
Gökırmak Memduh
Güngör Tunga
Habash Nizar
Hafsteinsson Hinrik
Hajič jr. Jan
Hajič Jan
Han Na-Rae
Hanifmuti Muhammad Yudistira
Hardwick Sam
Harris Kim
Haug Dag
Heinecke Johannes
Hellwig Oliver
Hennig Felix
Hladká Barbora
Hlaváčová Jaroslava
Hociung Florinel
Hohle Petter
Huber Eva
Hwang Jena
Hà Mỹ Linh
Hämäläinen Mika
Ikeda Takumi
Ingason Anton Karl
Ion Radu
Irimia Elena
Ishola Ọlájídé
Jelínek Tomáš
Johannsen Anders
Juutinen Markus
Jónsdóttir Hildur
Jørgensen Fredrik
K Sarveswaran
Kaasen Andre
Kabaeva Nadezhda
Kahane Sylvain
Kanayama Hiroshi
Kanerva Jenna
Katz Boris
Kayadelen Tolga
Kaşıkara Hüner
Kenney Jessica
Kettnerová Václava
Kirchner Jesse
Klementieva Elena
Kopacewicz Kamil
Korkiakangas Timo
Kotsyba Natalia
Kovalevskaitė Jolanta
Krek Simon
Krishnamurthy Parameswari
Kwak Sookyoung
Köhn Arne
Köksal Abdullatif
Laippala Veronika
Lam Lucia
Lambertino Lorenzo
Lando Tatiana
Larasati Septina Dian
Lavrentiev Alexei
Lee John
Lenci Alessandro
Lertpradit Saran
Leung Herman
Levina Maria
Li Cheuk Ying
Li Josie
Li Keying
Li Yuan
Lim KyungTae
Lindén Krister
Ljubešić Nikola
Loginova Olga
Luthfi Andry
Luukko Mikko
Lyashevskaya Olga
Lynn Teresa
Lê Hồng Phương
Macketanz Vivien
Makazhanov Aibek
Mandl Michael
Manning Christopher
Manurung Ruli
Mareček David
Marheinecke Katrin
Martins André
Martínez Alonso Héctor
Matsuda Hiroshi
Matsumoto Yuji
Mašek Jan
McDonald Ryan
McGuinness Sarah
Mendonça Gustavo
Miekka Niko
Mischenkova Karina
Misirpashayeva Margarita
Missilä Anna
Mititelu Cătălin
Mitrofan Maria
Miyao Yusuke
Mojiri Foroushani AmirHossein
Moloodi Amirsaeid
Montemagni Simonetta
More Amir
Moreno Romero Laura
Mori Keiko Sophie
Mori Shinsuke
Morioka Tomohiko
Moro Shigeki
Mortensen Bjartur
Moskalevskyi Bohdan
Muischnek Kadri
Munro Robert
Murawaki Yugo
Müürisep Kaili
Mărănduc Cătălina
Nainwani Pinkey
Nakhlé Mariam
Navarro Horñiacek Juan Ignacio
Nedoluzhko Anna
Nešpore-Bērzkalne Gunta
Nguyễn Thị Minh Huyền
Nguyễn Thị Lương
Nikaido Yoshihiro
Nikolaev Vitaly
Nitisaroj Rattima
Nivre Joakim
Nourian Alireza
Nurmi Hanna
Ojala Stina
Ojha Atul Kr.
Olúòkun Adédayọ̀
Omura Mai
Onwuegbuzia Emeka
Osenova Petya
Partanen Niko
Pascual Elena
Passarotti Marco
Patejuk Agnieszka
Paulino-Passos Guilherme
Peljak-Łapińska Angelika
Peng Siyao
Perez Cenel-Augusto
Perkova Natalia
Perrier Guy
Petrov Slav
Petrova Daria
Phelan Jason
Piitulainen Jussi
Pirinen Tommi A
Pitler Emily
Plank Barbara
Poibeau Thierry
Ponomareva Larisa
Popel Martin
Pretkalniņa Lauma
Prokopidis Prokopis
Przepiórkowski Adam
Prévost Sophie
Puolakainen Tiina
Pyysalo Sampo
Qi Peng
Rademaker Alexandre
Rama Taraka
Ramasamy Loganathan
Ramisch Carlos
Rashel Fam
Rasooli Mohammad Sadegh
Ravishankar Vinit
Real Livy
Rebeja Petru
Reddy Siva
Rehm Georg
Riabov Ivan
Rießler Michael
Rimkutė Erika
Rinaldi Larissa
Rituma Laura
Rocha Luisa
Romanenko Mykhailo
Rosa Rudolf
Rovati Davide
Roșca Valentin
Rudina Olga
Rueter Jack
Rääbis Andriela
Rögnvaldsson Eiríkur
Rúnarsson Kristján
Sadde Shoval
Safari Pegah
Sagot Benoît
Sahala Aleksi
Saleh Shadi
Salomoni Alessio
Samardžić Tanja
Samson Stephanie
Sanguinetti Manuela
Saulīte Baiba
Sawanakunanon Yanin
Scannell Kevin
Scarlata Salvatore
Schneider Nathan
Schuster Sebastian
Seddah Djamé
Seeker Wolfgang
Seraji Mojgan
Shen Mo
Shimada Atsuko
Shirasu Hiroyuki
Shohibussirri Muh
Sichinava Dmitry
Sigurðsson Einar Freyr
Silveira Aline
Silveira Natalia
Simi Maria
Simionescu Radu
Simkó Katalin
Simov Kiril
Skachedubova Maria
Smith Aaron
Soares-Bastos Isabela
Spadine Carolyn
Steingrímsson Steinþór
Stella Antonio
Straka Milan
Strickland Emmett
Strnadová Jana
Suhr Alane
Sulestio Yogi Lesmana
Sulubacak Umut
Suzuki Shingo
Szántó Zsolt
Särg Dage
Taji Dima
Takahashi Yuta
Tamburini Fabio
Tan Mary Ann C.
Tanaka Takaaki
Tella Samson
Tellier Isabelle
Thomas Guillaume
Torga Liisi
Toska Marsida
Trosterud Trond
Trukhina Anna
Tsarfaty Reut
Tyers Francis
Türk Utku
Uematsu Sumire
Untilov Roman
Urešová Zdeňka
Uria Larraitz
Uszkoreit Hans
Utka Andrius
Vajjala Sowmya
van Niekerk Daniel
van Noord Gertjan
Varga Viktor
Villemonte de la Clergerie Eric
Vincze Veronika
Wakasa Aya
Wallenberg Joel C.
Wallin Lars
Walsh Abigail
Wang Jing Xian
Washington Jonathan North
Wendt Maximilan
Widmer Paul
Williams Seyi
Wirén Mats
Wittern Christian
Woldemariam Tsegay
Wong Tak-sum
Wróblewska Alina
Yako Mary
Yamashita Kayo
Yamazaki Naoki
Yan Chunxiao
Yasuoka Koichi
Yavrumyan Marat M.
Yu Zhuoran
Zahra Shorouq
Zeldes Amir
Zeman Daniel
Zhu Hanzhi
Zhuravleva Anna
Çetinoğlu Özlem
Çöltekin Çağrı
Östling Robert
Özateş Şaziye Betül
Özgür Arzucan
Öztürk Başaran Balkız
Øvrelid Lilja
Čéplö Slavomír
Šimková Mária
Žabokrtský Zdeněk
Publication venue
Publication date: 01/09/2016
Field of study

Relatório de estágio realizado no âmbito do Mestrado Integrado em Ciências Farmacêuticas, apresentado à Faculdade de Farmácia da Universidade de Coimbr

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

Méthodes d’amorçage pour l’analyse en dépendances de langues peu dotées

Author: Lim Kyungtae
Publication venue: HAL CCSD
Publication date: 24/02/2020
Field of study

Dependency parsing is an essential component of several NLP applications owing its ability to capture complex relational information in a sentence. Due to the wider availability of dependency treebanks, most dependency parsing systems are built using supervised learning techniques. These systems require a significant amount of annotated data and are thus targeted toward specific languages for which this type of data are available. Unfortunately, producing sufficient annotated data for low-resource languages is time- and resource-consuming. To address the aforementioned issue, the present study investigates three bootstrapping methods, namely, (1) multi-lingual transfer learning, (2) deep contextualized embedding, and (3) Co-training. Multi-lingual transfer learning is a typical supervised learning approach that can transfer dependency knowledge using multi-lingual training data based on multi-lingual lexical representations. Deep contextualized embedding maximizes the use of lexical features during supervised learning based on enhanced sub-word representations and language model (LM). Lastly, co-training is a semi-supervised learning method that leverages parsing accuracies using unlabeled data. Our approaches have the advantage of requiring only a small bilingual dictionary or easily obtainable unlabeled resources (e.g., Wikipedia) to improve parsing accuracy in low-resource conditions. We evaluated our parser on 57 official CoNLL shared task languages as well as on Komi, which is a language we developed as a training and evaluation corpora for low-resource scenarios. The evaluation results demonstrated outstanding performances of our approaches in both low- and high-resource dependency parsing in the 2017 and 2018 CoNLL shared tasks. A survey of both model transfer learning and semi-supervised methods for low-resource dependency parsing was conducted, where the effect of each method under different conditions was extensively investigated.L'analyse en dépendances est une composante essentielle de nombreuses applications de TAL (Traitement Automatique des Langues), dans la mesure où il s'agit de fournir une analyse des relations entre les principaux éléments de la phrase. La plupart des systèmes d'analyse en dépendances sont issus de techniques d'apprentissage supervisées, à partir de grands corpus annotés. Ce type d'analyse est dès lors limité à quelques langues seulement, qui disposent des ressources adéquates. Pour les langues peu dotées, la production de données annotées est une tâche impossible le plus souvent, faute de moyens et d'annotateurs disponibles. Afin de résoudre ce problème, la thèse examine trois méthodes d’amorçage, à savoir (1) l’apprentissage par transfert multilingue, (2) les plongements vectoriels contextualisés profonds et (3) le co-entrainement. La première idée, l'apprentissage par transfert multilingue, permet de transférer des connaissances d'une langue pour laquelle on dispose de nombreuses ressources, et donc de traitements efficaces, vers une langue peu dotée. Les plongements vectoriels contextualisés profonds, quant à eux, permettent une représentation optimale du sens des mots en contexte, grâce à la notion de modèle de langage. Enfin, le co-entrainement est une méthode d'apprentissage semi-supervisée, qui permet d'améliorer les performances des systèmes en utilisant les grandes quantités de données non annotées souvent disponibles pour les différentes langues visées. Nos approches ne nécessitent qu'un petit dictionnaire bilingue ou des ressources non étiquetées faciles à obtenir (à partir de Wikipedia par exemple) pour améliorer la précision de l'analyse pour des langues où les ressources disponibles sont insuffisantes. Nous avons évalué notre analyseur syntaxique sur 57 langues à travers la participation aux campagnes d'évaluation proposées dans le cadre de la conférence CoNLL. Nous avons également mené des expériences sur d'autres langues, comme le komi, une langue finno-ougrienne parlée en Russie : le komi offre un scénario réaliste pour tester les idées mises en avant dans la thèse. Notre système a obtenu des résultats très compétitifs lors de campagnes d'évaluation officielles, notamment lors des campagnes CoNLL 2017 et 2018. Cette thèse offre donc des perspectives intéressantes pour le traitement automatique des langues peu dotées, un enjeu majeur pour le TAL dans les années à venir

Thèses en Ligne

Theses.fr

Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian

Author: Lim KyungTae
Partanen Niko
Poibeau Thierry
Publication venue: HAL CCSD
Publication date: 07/05/2018
Field of study

International audienceThe paper presents a method for parsing low-resource languages with very small training corpora using multilingual word embeddings and annotated corpora of larger languages. The study demonstrates that specific language combinations enable improved dependency parsing when compared to previous work, allowing for wider reuse of pre-existing resources when parsing low-resource languages. The study also explores the question of whether contemporary contact languages or genetically related languages would be the most fruitful starting point for multilingual parsing scenarios

Analyse syntaxique de langues faiblement dotées à partir de plongements de mots multilingues: Application au same du nord et au komi-zyriène

Author: Lim KyungTae
Partanen Niko
Poibeau Thierry
Publication venue: 'Associacio catalana de Salut Laboral'
Publication date: 01/01/2018
Field of study

Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian

Author: Lim KyungTae
Partanen Niko
Poibeau Thierry
Publication venue: HAL CCSD
Publication date: 07/05/2018
Field of study