Search CORE

8 research outputs found

On the development of a tagset for Northern Sotho with special reference to the issue of standardisation

Author: D.J. Prinsloo
E. Taljard
G. Faaß
U. Heid
Publication venue: AOSIS
Publication date: 01/07/2008
Field of study

Crossref

Directory of Open Access Journals

Die ontwikkeling van 'n stel annoteringsmerkers vir Noord-Sotho, met spesiale verwysing na standaardiseringsaangeleenthede

Author: Faaß G.
Heid U.
Prinsloo D.J.
Taljard E.
Publication venue: Mannheim : Leibniz-Institut für Deutsche Sprache (IDS) [Zweitveröffentlichung]
Publication date: 20/11/2023
Field of study

Working with corpora in the South African Bantu languages has up till now been limited to the utilisation of raw corpora. Such corpora, however, have limited functionality. Thus the next logical step in any NLP application is the development of software for automatic tagging of electronic texts. The development of a tagset is one of the first steps in corpus annotation. The authors of this article argue that the design of a tagset cannot be isolated from the purpose of the tagset, or from the place of the tagset and its design within the bigger picture of the architecture of corpus annotation. Usage-related aspects therefore feature prominently in the design of the tagset for Northern Sotho. It is explained why this proposed tagset is biased towards human readability, rather than machine readability; this choice of a stochastic tagger is motivated, and the relationship between tokenising, tagging, morphological analysis and parsing is discussed. In order to account at least to some extent for the morphological complexity of Northern Sotho at the tagging level, a multilevel annotation is opted for: the first level comprising obligatory information and the second optional and recommended information. Finally, aspects of standardisation are considered against the background of reuse, of sharing of resources, and of possible adaptation for use by other disjunctively written South African Bantu languages. It is not the aim of this article to evaluate the results of any tagging procedure using the proposed tagset. It only describes the design and motivates the choices made with regard to the tagset design. However, an evaluation is in process and results will be published in the near future (cf. Faaß et al., s.a.).Tot dusver was die gebruik van korpora in die Suid-Afrikaanse Bantoetale beperk tot die ontginning van rou korpora. Die gebruiksmoontlikhede van hierdie tipe korpora is egter beperk. Die volgende logiese stap in enige toepassing van natuurlike-taalprosessering is dus die ontwikkeling van sagteware vir outomatiese teksannotering. Die ontwikkeling van 'n stel annoteringsmerkers is een van die eerste stappe in korpusannotering. Die outeurs van hierdie artikel meen dat die ontwerp van 'n annoteringstel direk verband hou met die doel van so 'n stel, en die posisie daarvan binne die groter raamwerk van die argitektuur van korpusannotasie. Gebruiksaspekte staan daarom sentraal in die ontwerp van 'n annoteringstel vir Noord-Sotho. Daar word verduidelik waarom hierdie stel eerder vir menslike leesbaarheid as vir masjienleesbaarheid voorsiening maak; die keuse van 'n stokastiese annoteerder word gemotiveer, en die verhouding tussen tokenisering, annotasie, en morfologiese en sintaktiese analise word bespreek. Ten einde op annoteringsvlak gedeeltelik voorsiening te maak vir die morfologiese kompleksiteit van Noord-Sotho, is 'n veelvlakkige annotasie verkies waar die eerste annotasievlak verpligte inligting bevat, en die tweede vlak opsionele en aanbevole inligting. Ten slotte word aspekte rondom standaardisering beskou teen die agtergrond van herbruikbaarheid, die deel van hulpbronne en moontlike aanpassing vir gebruik deur ander disjunktief-geskrewe Suid-Afrikaanse Bantoetale. Dit is nie die doel van hierdie artikel om enige annoteringsproses waarin hierdie stel annoteringsmerkers gebruik word, te evalueer nie. Dit beskryf slegs die ontwerp en motiveer die keuses wat tydens die ontwerp van die annoteringsmerkstel gemaak is. 'n Evalueringsproses word tans onderneem en die resultate sal in Faaβ et al., (s.a.) gepubliseer word

Publikationsserver des Instituts für Deutsche Sprache

A computational implementation of the Northern Sotho infinitive

Author: Butt Miriam
De Schryver G.-M.
Faaß Gertrud.
Faaß Gertrud.
Faaß Gertrud.
Kaplan R. M.
Kotzé
Lombard D. P.
Louwrens L. J.
Poulos G.
Shapiro L. P.
Taljard E.
Van Wyk E. B.
Ziervogel D.
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

The CALL of Zulu: reflections on the development of a computer-assisted language learning package

Author: Canonici N
Cervatiuc A
Chen Z. H.
Ellis N. C
Faaß G.
Krashen S
Land S
McLaughlin B
Probert T.
Sanders M
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

Multi-Domain Adapted Machine Translation Using Unsupervised Text Clustering

Author: F Pedregosa
G Faaß
J Han
J Karlgren
JR Bellegarda
K Lagus
N Bertoldi
PF Brown
R Iyer
R Rosenfeld
R Rosenfeld
T Kohonen
T Kohonen
T Kohonen
VI Levenshtein
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2015
Field of study

Domain Adaptation in Machine Translation means to take a machine translation system that is restricted to work in a specific context and to enable the system to translate text from a different domain. The paper presents a two-step domain adaptation strategy, by first making use of unlabeled training material through an unsupervised algorithm, the Self-Organizing Map, to create auxiliary language models, and then to include these models dynamically in a machine translation pipelin

Crossref

NORA - Norwegian Open Research Archives

Designing a noun guesser for part of speech tagging in Northern Sotho

Author: Anderson W. N.
De Schryver G-M.
De Schryver G-M.
Faaß G.
Kotzé P. M.
Lombard D. P.
Poulos G.
Prinsloo D. J.
Prinsloo D. J.
Prinsloo D. J.
Schmid H.
Scott M.
Taljard E.
Thobakgale R. M.
Van Wyk E. B.
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

Exploring the sawa corpus: collection and deployment of a parallel corpus English—Swahili

Author: A. Diaz de Ilarraza
A. Fraser
A. Stolcke
B. Gambäck
D. Woodhouse
F. Och
G. De Pauw
G. De Pauw
G. De Pauw
G. Faaß
G. Pauw De
G. Pauw de
G.-M. de Schryver
G.-M. Schryver de
Gilles-Maurice de Schryver
Guy De Pauw
H. Schmid
H.J. Groenewald
P. Resnik
Peter Waiganjo Wagacha
R. Moore
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Grundlagen des Investitionscontrollings

Author: A Hoffjan
A Hoffjan
A Martin
A Niedermayr
A Pearson
A Schön
A Schönbohm
AG Schwellnuss
Arbeitskreis „Finanzierung“ der Schmalenbach-Gesellschaft – Deutsche Gesellschaft für Betriebswirtschaft e. V.
B Baltzer
B Friedl
B Hirsch
B Hirsch
B Rieper
B Rieper
BRA Sierke
BRA Sierke
BW Osterloh
C Binder
C Bliss
C Dahlhaus
C Herrmann
C Huege
C Husmann
C Irle
C Koch
C Lange
C Legenhausen
C Schaefer
C Steinle
D Adam
D Adam
D Borer
D Dvir
D Hahn
D Liebsch
D Müller
D Müller
D Müller
D Müller
D Müller
D Rößl
D Schneider
D Schneider
D Schneider
D Zschocke
E Dane
E Ernst
E Kappler
E Kappler
E Troßmann
E Zayer
E Zayer
EFL Brech
F Baumann
F Ferraro
F Kersten
F Keuper
F Keuper
F Malik
F Rosenkranz
F Wall
F Wall
F-Y Kuo
FEP Wilms
Financial Executives Institute
FM Ott
FP Peffekoven
G Bamberg
G Blickle
G Eilenberger
G Friedl
G Gäfgen
G Pietsch
G Pritsch
G Pritsch
G Schanz
G Schanz
G Schreyögg
G Schätzle
H Ahn
H Ahn
H Ahn
H Bieg
H Blohm
H Bomm
H Corsten
H Cronjäger
H Dechant
H Drews
H Dyckhoff
H Dyckhoff
H Dyckhoff
H Jacob
H Janzen
H Jung
H Krug
H Langen
H Puhl
H Rehkugler
H Schaub
H Schauer
H Schäfer
H Wildemann
H Wildemann
H Willke
H-G Baum
H-U Küpper
H-U Küpper
H-U Küpper
H-U Küpper
H-U Küpper
HE Betham
HJ Richter
I Duscher
I Nonaka
I Sjurts
I Wittenbecher
I Yamaguchi
J Bathe
J Berwanger
J Fischer
J Hauschildt
J Kantowski
J Keller
J Littkemann
J Rehäuser
J Röpke
J Schiller
J Weber
J Weber
J Weber
J Weber
J Weber
J Weber
J Weber
J Weber
J Weber
J Weber
J Weber
J Weber
J Weber
J Weber
J Weibler
J Woiceshyn
J Wolf
K Chmielewicz
K Derfuß
K Hauser
K Lüder
K Lüder
K Lüder
K Müller
K Rösgen
K Rösgen
K Schulte
K Vikas
K-I Voigt
KH Faaß
L Buchholz
L Hans
L Harbert
L Herrmann
L Hinners-Tobrägel
L Kruschwitz
M Bitz
M Buchner
M Christl
M Diederichs
M Faber
M Hahmann
M Hoogen
M Kißler
M Kramer
M Messner
M Nagel
M Nippa
M Reihlen
M Richter
M Riesenhuber
M Schlitt
M Schweitzer
MB Schultz
MJ Matschke
N Boysen
N Luhmann
N Luhmann
N Warkotsch
O Grün
O Weinberger
OB Mäder
P Hammann
P Horváth
P Horváth
P Watzlawick
R Ewert
R Helm
R Kesten
R Klein
R Knollmann
R Rollberg
R Schwarz
R Schütte
RE Meier
RH Schmidt
S Beer
S Leybourne
S Linder
S Schaefer
S Schaefer
S Scheuble
T Borghoff
T Boucoiran
T Hauser
T Hering
T Holtfort
T Knauer
T Rautenstrauch
T Reichmann
T Reichmann
T Reichmann
T Scheytt
T Urigshardt
T Weidlich
TM Fischer
TW Guenther
U Götze
U Heckert
U Schäffer
U Schäffer
U Schäffer
U-C Rücker
V Lingnau
V Lingnau
V Lingnau
V Lingnau
VH Peemöller
VH Peemöller
W Becker
W Berens
W Berens
W Hannig
W Irrek
W Krieg
W Langenbach
W Männel
W Männel
W Männel
W Müller
W Ossadnik
W Pfeiffer
W Schneider
W Stölzle
W Wittmann
W-R Bretzke
WF Fischer-Winkelmann
WF Fischer-Winkelmann
WL Bühl
WR Ashby
WR Ashby
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref