Search CORE

29 research outputs found

Automatic alignment of the Psalterium Sinaiticum and the Septuagint Psalms

Author: Eckhoff Hanne
Publication venue: Cyrillo-Methodian Research Center at the Bulgarian Academy of Sciences
Publication date: 01/12/2021
Field of study

This paper describes the work on automatically aligning the Psalterium Sinaiticum with the Septuagint psalms in the Tromsø Old Russian and OCS Treebank (TOROT). It briefly accounts for the transcription, text processing and manual annotation of the Psalterium Sinaiticum itself. It then explains the choice of Greek text, describes the automatic lemmatisation and morphological tagging of the Greek text and calculates and analyses the success rate in a small sample. Next the algorithm for automatic token-level alignment of texts is briefly described, and the success rate calculated and analysed. The results seem quite good from a quantitative perspective (over 90% accuracy in most cases), and it may seem tempting to try to use the data directly. However, a pilot study of aspect in the Greek and OCS text shows that the automatically processed Greek parallel leads to considerable data loss, and that much manual sifting of apparent mismatch examples is necessary to arrive at a preliminary analysis. In a lowresourced historical language such as Old Church Slavonic we cannot afford working with this amount of noise and data loss. We can use automatic tagging and alignment to ease our workload, but we have to manually post-correct the output

Oxford University Research Archive

OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data,

Author: Hanne Eckhoff
Nilo Pedrazzini
Publication venue: 'Modern Language Association'
Publication date: 01/01/2021
Field of study

Historical languages are increasingly being modelled computationally. Syntactically annotated texts are often a sine-qua-non in their modelling, but parsing of pre-modern language varieties faces great data sparsity, intensified by high levels of orthographic variation. In this paper we present a good-quality Early Slavic dependency parser, attained via manipulation of modern Slavic data to resemble the orthography and morphosyntax of pre-modern varieties. The tool can be deployed to expand historical treebanks, which are crucial for data collection and quantification, and beneficial to downstream NLP tasks and historical text mining

Humanities Commons

Relatório de estágio em farmácia comunitária

Author: Abrams Mitchell
Ackermann Elia
Aepli Noëmi
Aghaei Hamid
Agić Željko
Ahmadi Amir
Ahrenberg Lars
Ajede Chika Kennedy
Aleksandravičiūtė Gabrielė
Alfina Ika
Antonsen Lene
Aplonova Katya
Aquino Angelina
Aragon Carolina
Aranzabe Maria Jesus
Arnardóttir Þórunn
Arutie Gashaw
Arwidarasti Jessica Naraiswari
Asahara Masayuki
Ateyah Luma
Atmaca Furkan
Attia Mohammed
Atutxa Aitziber
Augustinus Liesbeth
Badmaeva Elena
Balasubramani Keerthana
Ballesteros Miguel
Banerjee Esha
Bank Sebastian
Barbu Mititelu Verginica
Basmov Victoria
Batchelor Colin
Bauer John
Bedir Seyyit Talha
Bengoetxea Kepa
Berk Gözde
Berzak Yevgeni
Bhat Irshad Ahmad
Bhat Riyaz Ahmad
Biagetti Erica
Bick Eckhard
Bielinskienė Agnė
Bjarnadóttir Kristín
Blokland Rogier
Bobicev Victoria
Boizou Loïc
Borges Völker Emanuel
Bosco Cristina
Bouma Gosse
Bowman Sam
Boyd Adriane
Brokaitė Kristina
Burchardt Aljoscha
Börstell Carl
Candito Marie
Caron Bernard
Caron Gauthier
Cavalcanti Tatiana
Cebiroğlu Eryiğit Gülşen
Cecchini Flavio Massimiliano
Celano Giuseppe G. A.
Cetin Savas
Chalub Fabricio
Chi Ethan
Cho Yongseok
Choi Jinho
Chun Jayeol
Cignarella Alessandra T.
Cinková Silvie
Collomb Aurélie
Connor Miriam
Courtin Marine
Davidson Elizabeth
de Marneffe Marie-Catherine
de Paiva Valeria
de Souza Elvis
Derin Mehmet Oguz
Diaz de Ilarraza Arantza
Dickerson Carly
Dinakaramani Arawinda
Dione Bamba
Dirix Peter
Dobrovoljc Kaja
Dozat Timothy
Droganova Kira
Dwivedi Puneet
Eckhoff Hanne
Eli Marhaba
Elkahky Ali
Ephrem Binyam
Erina Olga
Erjavec Tomaž
Etienne Aline
Evelyn Wograine
Facundes Sidney
Farkas Richárd
Fernanda Marília
Fernandez Alcalde Hector
Foster Jennifer
Freitas Cláudia
Fujita Kazunori
Gajdošová Katarína
Galbraith Daniel
Garcia Marcos
Garza Sebastian
Gerardi Fabrício Ferraz
Gerdes Kim
Ginter Filip
Goenaga Iakes
Gojenola Koldo
Goldberg Yoav
González Saavedra Berta
Griciūtė Bernadeta
Grioni Matias
Grobol Loïc
Grūzītis Normunds
Guillaume Bruno
Guillot-Barbance Céline
Gärdenfors Moa
Gómez Guinovart Xavier
Gökırmak Memduh
Güngör Tunga
Habash Nizar
Hafsteinsson Hinrik
Hajič jr. Jan
Hajič Jan
Han Na-Rae
Hanifmuti Muhammad Yudistira
Hardwick Sam
Harris Kim
Haug Dag
Heinecke Johannes
Hellwig Oliver
Hennig Felix
Hladká Barbora
Hlaváčová Jaroslava
Hociung Florinel
Hohle Petter
Huber Eva
Hwang Jena
Hà Mỹ Linh
Hämäläinen Mika
Ikeda Takumi
Ingason Anton Karl
Ion Radu
Irimia Elena
Ishola Ọlájídé
Jelínek Tomáš
Johannsen Anders
Juutinen Markus
Jónsdóttir Hildur
Jørgensen Fredrik
K Sarveswaran
Kaasen Andre
Kabaeva Nadezhda
Kahane Sylvain
Kanayama Hiroshi
Kanerva Jenna
Katz Boris
Kayadelen Tolga
Kaşıkara Hüner
Kenney Jessica
Kettnerová Václava
Kirchner Jesse
Klementieva Elena
Kopacewicz Kamil
Korkiakangas Timo
Kotsyba Natalia
Kovalevskaitė Jolanta
Krek Simon
Krishnamurthy Parameswari
Kwak Sookyoung
Köhn Arne
Köksal Abdullatif
Laippala Veronika
Lam Lucia
Lambertino Lorenzo
Lando Tatiana
Larasati Septina Dian
Lavrentiev Alexei
Lee John
Lenci Alessandro
Lertpradit Saran
Leung Herman
Levina Maria
Li Cheuk Ying
Li Josie
Li Keying
Li Yuan
Lim KyungTae
Lindén Krister
Ljubešić Nikola
Loginova Olga
Luthfi Andry
Luukko Mikko
Lyashevskaya Olga
Lynn Teresa
Lê Hồng Phương
Macketanz Vivien
Makazhanov Aibek
Mandl Michael
Manning Christopher
Manurung Ruli
Mareček David
Marheinecke Katrin
Martins André
Martínez Alonso Héctor
Matsuda Hiroshi
Matsumoto Yuji
Mašek Jan
McDonald Ryan
McGuinness Sarah
Mendonça Gustavo
Miekka Niko
Mischenkova Karina
Misirpashayeva Margarita
Missilä Anna
Mititelu Cătălin
Mitrofan Maria
Miyao Yusuke
Mojiri Foroushani AmirHossein
Moloodi Amirsaeid
Montemagni Simonetta
More Amir
Moreno Romero Laura
Mori Keiko Sophie
Mori Shinsuke
Morioka Tomohiko
Moro Shigeki
Mortensen Bjartur
Moskalevskyi Bohdan
Muischnek Kadri
Munro Robert
Murawaki Yugo
Müürisep Kaili
Mărănduc Cătălina
Nainwani Pinkey
Nakhlé Mariam
Navarro Horñiacek Juan Ignacio
Nedoluzhko Anna
Nešpore-Bērzkalne Gunta
Nguyễn Thị Minh Huyền
Nguyễn Thị Lương
Nikaido Yoshihiro
Nikolaev Vitaly
Nitisaroj Rattima
Nivre Joakim
Nourian Alireza
Nurmi Hanna
Ojala Stina
Ojha Atul Kr.
Olúòkun Adédayọ̀
Omura Mai
Onwuegbuzia Emeka
Osenova Petya
Partanen Niko
Pascual Elena
Passarotti Marco
Patejuk Agnieszka
Paulino-Passos Guilherme
Peljak-Łapińska Angelika
Peng Siyao
Perez Cenel-Augusto
Perkova Natalia
Perrier Guy
Petrov Slav
Petrova Daria
Phelan Jason
Piitulainen Jussi
Pirinen Tommi A
Pitler Emily
Plank Barbara
Poibeau Thierry
Ponomareva Larisa
Popel Martin
Pretkalniņa Lauma
Prokopidis Prokopis
Przepiórkowski Adam
Prévost Sophie
Puolakainen Tiina
Pyysalo Sampo
Qi Peng
Rademaker Alexandre
Rama Taraka
Ramasamy Loganathan
Ramisch Carlos
Rashel Fam
Rasooli Mohammad Sadegh
Ravishankar Vinit
Real Livy
Rebeja Petru
Reddy Siva
Rehm Georg
Riabov Ivan
Rießler Michael
Rimkutė Erika
Rinaldi Larissa
Rituma Laura
Rocha Luisa
Romanenko Mykhailo
Rosa Rudolf
Rovati Davide
Roșca Valentin
Rudina Olga
Rueter Jack
Rääbis Andriela
Rögnvaldsson Eiríkur
Rúnarsson Kristján
Sadde Shoval
Safari Pegah
Sagot Benoît
Sahala Aleksi
Saleh Shadi
Salomoni Alessio
Samardžić Tanja
Samson Stephanie
Sanguinetti Manuela
Saulīte Baiba
Sawanakunanon Yanin
Scannell Kevin
Scarlata Salvatore
Schneider Nathan
Schuster Sebastian
Seddah Djamé
Seeker Wolfgang
Seraji Mojgan
Shen Mo
Shimada Atsuko
Shirasu Hiroyuki
Shohibussirri Muh
Sichinava Dmitry
Sigurðsson Einar Freyr
Silveira Aline
Silveira Natalia
Simi Maria
Simionescu Radu
Simkó Katalin
Simov Kiril
Skachedubova Maria
Smith Aaron
Soares-Bastos Isabela
Spadine Carolyn
Steingrímsson Steinþór
Stella Antonio
Straka Milan
Strickland Emmett
Strnadová Jana
Suhr Alane
Sulestio Yogi Lesmana
Sulubacak Umut
Suzuki Shingo
Szántó Zsolt
Särg Dage
Taji Dima
Takahashi Yuta
Tamburini Fabio
Tan Mary Ann C.
Tanaka Takaaki
Tella Samson
Tellier Isabelle
Thomas Guillaume
Torga Liisi
Toska Marsida
Trosterud Trond
Trukhina Anna
Tsarfaty Reut
Tyers Francis
Türk Utku
Uematsu Sumire
Untilov Roman
Urešová Zdeňka
Uria Larraitz
Uszkoreit Hans
Utka Andrius
Vajjala Sowmya
van Niekerk Daniel
van Noord Gertjan
Varga Viktor
Villemonte de la Clergerie Eric
Vincze Veronika
Wakasa Aya
Wallenberg Joel C.
Wallin Lars
Walsh Abigail
Wang Jing Xian
Washington Jonathan North
Wendt Maximilan
Widmer Paul
Williams Seyi
Wirén Mats
Wittern Christian
Woldemariam Tsegay
Wong Tak-sum
Wróblewska Alina
Yako Mary
Yamashita Kayo
Yamazaki Naoki
Yan Chunxiao
Yasuoka Koichi
Yavrumyan Marat M.
Yu Zhuoran
Zahra Shorouq
Zeldes Amir
Zeman Daniel
Zhu Hanzhi
Zhuravleva Anna
Çetinoğlu Özlem
Çöltekin Çağrı
Östling Robert
Özateş Şaziye Betül
Özgür Arzucan
Öztürk Başaran Balkız
Øvrelid Lilja
Čéplö Slavomír
Šimková Mária
Žabokrtský Zdeněk
Publication venue
Publication date: 01/09/2016
Field of study

Relatório de estágio realizado no âmbito do Mestrado Integrado em Ciências Farmacêuticas, apresentado à Faculdade de Farmácia da Universidade de Coimbr

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

Verbal constructional profiles: reliability, distinction power and practical applications

Author: Berdicevskis Aleksandrs
Eckhoff Hanne
Publication venue: University of Tübingen
Publication date: 04/12/2014
Field of study

<p>In this paper we explore the notion of constructional profiles (the frequency distribution of a given linguistic item across syntactic environments) from two angles, methodological and applied. We concentrate on verbal constructional profiles, using Russian argument frame data in two different dependency formats. We first test the profiles' stability and distinction power across sample sizes, and then use the profiles in two tasks concerning Russian aspect: to identify the aspectual partner of a given verb and to guess whether a given verb is perfective or imperfective.</p&gt

ZENODO

Corpus (based) research and Slavic

Author: Divjak Dagmar
Eckhoff Hanne
Publication venue: Brill
Publication date: 03/04/2020
Field of study

Slavic linguistics has a long history with corpus linguistics in the widest sense of the term. Large collections of authentic texts are a prerequisite for historical linguistics, which is where Slavic linguistics started, and more recently this strand of research has yielded large historical corpora. Slavic l

Oxford University Research Archive

Diachronic Treebanks for Historical Linguistics

Author: Eckhoff Hanne Martine
Luraghi Silvia
Passarotti Marco
Silvia Luraghi Hanne Martine Eckhoff, Marco Passarotti
Publication venue: place:Amsterdam
Publication date: 01/01/2020
Field of study

Over the last few decades, the widespread diffusion of digital technology has increased availability of primary textual sources, radically changing the everyday life of scholars in the humanities, who are now able to access, query and process a wealth of empirical evidence in ways not possible before. Also for ancient languages, corpora enhanced with increasingly complex layers of metalinguistic information, such as part-of-speech tagging and syntactic annotation (called 'treebanks') are now available. In particular, diachronic treebanks, which provide data for a language across several historical stages of a given language, allow for a new approach to diachronic studies of syntactic phenomena where scholars previously had to content themselves with empirical work on a much smaller scale. This volume brings together a set of papers that report research on various diachronic matters supported by evidence from diachronic treebanks. The contents of the papers cover a wide range of languages, including English, French, Russian, Old Church Slavonic, Latin and Ancient Greek

Archivio Istituzionale della Ricerca - Università degli Studi di Pavia

Linguistics vs. digital editions: The Tromsø Old Russian and OCS Treebank

Author: Berdicevskis Aleksandrs
Eckhoff Hanne Martine
Publication venue: Institute for Literature, Bulgarian Academy of Sciences
Publication date: 01/01/2015
Field of study

The Tromsø Old Russian and OCS Treebank (TOROT, nestor.uit.no)1 is, along with its parent treebank, the PROIEL corpus (foni.uio.no), the only existing treebank of Old Church Slavonic (OCS), Old East Slavic and Middle Russian texts. There are other tagged resources, such as the Old Russian subcorpus of the Russian National Corpus2 and the Manuskript corpus,3 but none of them, to our knowledge, currently provide syntactic annotation. The TOROT presently contains approximately 160,000 word tokens of fully annotated OCS (Codex Marianus4 and Codex Suprasliensis), 85,000 word tokens of fully annotated Kiev-era Old East Slavic, and 60,000 word tokens of fully annotated 15th–17th-century Middle Russian. In addition, it contains the Codex Zographensis with automatic and partially hand-corrected morphological annotation and lemmatisation (sections of the Gospels missing in the Codex Marianus also have full syntactic annotation), and the PROIEL version of the Greek Gospels, with which the Codex Marianus and the Codex Zographensis are both aligned at token level (automatically, then hand-corrected)

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

Automatic Identification of Shared Arguments in Verbal Coordinations

Author: Berdicevskis Aleksandrs
Eckhoff Hanne Martine
Publication venue
Publication date: 01/01/2015
Field of study

We describe automatic conversion of the SynTagRus dependency treebank of Russian to the PROIEL format (with the ultimate purpose of obtaining a single-format diachronic treebank spanning more than a thousand years), focusing on analysis of shared arguments in verbal coordinations. Whether arguments are shared or private is not marked in the SynTagRus native format, but the PROIEL format indicates sharing by means of secondary dependencies. In order to recover missing information and insert secondary dependencies into the converted SynTagRus, we create a simple guessing algorithm based on four probabilistic features: how likely a given argument type is to be shared; how likely an argument in a given position is to be shared; how likely a given verb is to have a given argument; how likely a given verb is to have a given argument frame. Boosted with a few deterministic rules and trained on a small manually annotated sample (346 sentences), the guesser very successfully inserts shared subjects (F-score 0.97), which results in excellent overall performance (F-score 0.92). Non-subject arguments are shared much more rarely, and for them the results are poorer (0.31 for objects; 0.22 for obliques). We show, however, that there are strong reasons to believe that performance can be increased if a larger training sample is used and the guesser gets to see enough positive examples. Apart from describing a useful practical solution, the paper also provides quantitative data about and offers non-trivial insights into Russian verbal coordination

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

Why doing Russian and Slavic linguistics matters

Author: Eckhoff Hanne
Fortuin Egbert
Sonnenhauser Barbara
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2021
Field of study

With the journal Russian Linguistics entering the third decade of the twentieth century and a new team of editors—assisted by a new editorial board of renowned experts—taking over, it is a good time to look back and to the future

ZORA

Special issue: “Ukrainian Linguistics”

Author: Eckhoff Hanne
Fortuin Egbert
Sonnenhauser Barbara
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2022
Field of study

Russia’s war against Ukraine threatens its people and its existence as an independent state. Russia’s war threatens Ukraine’s cultural heritage by denying its history and its language. Language is being used as a weapon. In face of this aggression, a journal such as Russian Linguistics cannot remain silent. As linguists we are committed to the principles of science, based on empirical observations. For us as Slavic linguists, it is a truism to consider Russian as one among a wealth of Slavic languages and as a member of the East Slavic branch, together with Ukrainian and Belarusian. Recognizing this diversity is what the journal’s subtitle stands for: International Journal for the Study of Russian and other Slavic Languages

ZORA