Search CORE

10 research outputs found

An exploratory research on grammar checking of Bangla sentences using statistical language models

Author: Habib M. D. Tarek
Islam Gazi Zahirul
Khan M. D. Abbas Ali
Rahman M. D. Riazur
Rahman M. D. Sadekur
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/06/2020
Field of study

N-gram based language models are very popular and extensively used statistical methods for solving various natural language processing problems including grammar checking. Smoothing is one of the most effective techniques used in building a language model to deal with data sparsity problem. Kneser-Ney is one of the most prominently used and successful smoothing technique for language modelling. In our previous work, we presented a Witten-Bell smoothing based language modelling technique for checking grammatical correctness of Bangla sentences which showed promising results outperforming previous methods. In this work, we proposed an improved method using Kneser-Ney smoothing based n-gram language model for grammar checking and performed a comparative performance analysis between Kneser-Ney and Witten-Bell smoothing techniques for the same purpose. We also provided an improved technique for calculating the optimum threshold which further enhanced the the results. Our experimental results show that, Kneser-Ney outperforms Witten-Bell as a smoothing technique when used with n-gram LMs for checking grammatical correctness of Bangla sentences

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Mii eai leat gal vuollánan – Vi ha neimen ikke gitt opp

Author: Argese Chiara
Gaup Børre
Omma Thomas
Pirinen Flammie
Wiechetek Linda
Publication venue: 'UiT The Arctic University of Norway'
Publication date: 30/08/2022
Field of study

Munin - Open Research Archive

Mii eai leat gal vuollánan -- Vi ha neimen ikke gitt opp: En hybrid grammatikkontroll for å rette kongruensfeil

Author: Argese Chiara
Gaup Børre
Omma Thomas
Pirinen Flammie
Wiechetek Linda
Publication venue: 'UiT The Arctic University of Norway'
Publication date: 30/08/2022
Field of study

Machine learning is the dominating paradigm in natural language processing nowadays. It requires vast amounts of manually annotated or synthetically generated text data. In the GiellaLT infrastructure, on the other hand, we have worked with rule-based methods, where the linguistis have full control over the development the tools. In this article we uncover the myth of machine learning being cheaper than a rule-based approach by showing how much work there is behind data generation, either via corpus annotation or creating tools that automatically mark-up the corpus. Earlier we have shown that the correction of grammatical errors, in particular compound errors, benefit from hybrid methods. Agreement errors, on the other other hand, are to a higher degree dependent on the larger grammatical context. Our experiments show that machine learning methods for this error type, even when supplemented by rule-based methods generating massive data, can not compete with the state-of-the-art rule-based approach.Maskinlæringsteknikker der lingvistisk ekspertise ikke brukes dominerer språkteknologi nå til dags. Dette krever at man merker opp en stor datamengde manuelt på forhånd. I GiellaLT-infrastrukturen har man der- imot jobbet med regelbaserte metoder der lingvisten har kontroll over hvordan verktøyene fungerer. Det er ikke bare tekniske årsaker for metodevalget. Kunnskapsøkning om samisk grammatikk, kvalitetssikring og kontrollerbarhet (verktøyene gjør det de skal gjøre også ifølge menneskelige standard) ligger bak preferansen om å jobbe regelbasert. I denne artikkelen vil vi forsøke å avdekke myten om at maskinlæring blir billigere enn regelbaserte metoder. Likevel tror vi at maskinlæringsmetoder kan være nyttige der vi ønsker større dekning av feilretting. Vi viser at maskinlæringsmodeller som har tilgang til små datameng- der (i dette tilfelle for små språk) er avhengig av gode regelbaserte verktøy som erstatning for manuell oppmerking

Septentrio Academic Publishing

Munin - Open Research Archive

Kesksete lausekomponentide järjestus õppijakeeles: arvutianalüüsi katse

Author
Publication venue: 'Estonian Association for Applied Linguistics'
Publication date: 01/01/2010
Field of study

Crossref

Erroreak automatikoki detektatzeko tekniken azterlana eta euskararentzako aplikazioak

Author: Díaz de Ilarraza Sánchez Arantza
Gojenola Galletebeitia Koldobika
Oronoz Anchordoqui Maite
Publication venue: Servicio Editorial de la Universidad del País Vasco/Euskal Herriko Unibertsitatearen Argitalpen Zerbitzua
Publication date: 01/01/2009
Field of study

In this article, we study the techniques used for detecting errors in Natural Language Processing (NLP). We classify the techniques according to their approach (symbolic or empirical), and then we describe them in depth. Following that, we describe the systems we have developed for detecting syntactic errors in Basque, by using that technique as a criterion for the classification of those systems, and enhancing it with examples

Archivo Digital para la Docencia y la Investigación

Universidad del País Vasco / Euskal Herriko Unibertsitatea: Ciencia - Portal de revistas digitales de la UPV/EHU

Proceedings of the workshop on language technology for normalisation of less-resourced languages (SaLTMiL 8 - AfLaT 2012)

Author: De Pauw Guy
de Schryver Gilles-Maurice
Forcada Mike L
Sarasola Kepa
Tyers Francis M
Wagacha Peter W
Publication venue: European Language Resources Association
Publication date: 01/01/2012
Field of study

Ghent University Academic Bibliography

Inquiries into words, constraints and contexts : Festschrift in the honour of Kimmo Koskenniemi on his 60th birthday

Author: Arppe Antti
Carlson Lauri
Linden Krister
Piitulainen Jussi Olavi
Suominen Mickael
Vainio Martti
Westerlund Hanna
Yli-Jyrä Anssi Mikael
Publication venue: CSLI publications
Publication date: 01/01/2005
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Recommended from our members

Automatic annotation of error types for grammatical error correction

Author: Bryant Christopher Jack
Publication venue: University of Cambridge
Publication date: 18/06/2019
Field of study

Grammatical Error Correction (GEC) is the task of automatically detecting and correcting grammatical errors in text. Although previous work has focused on developing systems that target specific error types, the current state of the art uses machine translation to correct all error types simultaneously. A significant disadvantage of this approach is that machine translation does not produce annotated output and so error type information is lost. This means we can only evaluate a system in terms of overall performance and cannot carry out a more detailed analysis of different aspects of system performance. In this thesis, I develop a system to automatically annotate parallel original and corrected sentence pairs with explicit edits and error types. In particular, I first extend the Damerau- Levenshtein alignment algorithm to make use of linguistic information when aligning parallel sentences, and supplement this alignment with a set of merging rules to handle multi-token edits. The output from this algorithm surpasses other edit extraction approaches in terms of approximating human edit annotations and is the current state of the art. Having extracted the edits, I next classify them according to a new rule-based error type framework that depends only on automatically obtained linguistic properties of the data, such as part-of-speech tags. This framework was inspired by existing frameworks, and human judges rated the appropriateness of the predicted error types as ‘Good’ (85%) or ‘Acceptable’ (10%) in a random sample of 200 edits. The whole system is called the ERRor ANnotation Toolkit (ERRANT) and is the first toolkit capable of automatically annotating parallel sentences with error types. I demonstrate the value of ERRANT by applying it to the system output produced by the participants of the CoNLL-2014 shared task, and carry out a detailed error type analysis of system performance for the first time. I also develop a simple language model based approach to GEC, that does not require annotated training data, and show how it can be improved using ERRANT error types

Apollo (Cambridge)

Nodalida 2005 - proceedings of the 15th NODALIDA conference

Author
Publication venue: University of Joensuu
Publication date
Field of study

UEF Electronic Publications

DEVELOPING A GRAMMAR CHECKER FOR SWEDISH 1

Author
Publication venue
Publication date
Field of study

A grammar checker for Swedish, launched on the market as Grammatifix, has been developed at Lingsoft in 1997-1999. This paper gives first a brief background of grammar checking projects for the Nordic languages, with an emphasis on Swedish. Then, the concept and definition of a grammar checker in general is discussed, followed by an overview of the starting points and limitations that Lingsoft had in setting up the Grammatifix development project. After this, the initial product development process is described, leading to an overview of the error types covered presently by Grammatifix. The error treatment scheme in Grammatifix is presented, with a focus on its relationship with the error detection rules. Finally, the error types included in Grammatifix are compared to those of two other known projects, namely SCARRIE and Granska. 1

CiteSeerX