Search CORE

29 research outputs found

Memory-Based Shallow Parsing

Author: Sang Erik F. Tjong Kim
Publication venue
Publication date: 01/01/2002
Field of study

We present memory-based learning approaches to shallow parsing and apply these to five tasks: base noun phrase identification, arbitrary base phrase recognition, clause detection, noun phrase parsing and full parsing. We use feature selection techniques and system combination methods for improving the performance of the memory-based learner. Our approach is evaluated on standard data sets and the results are compared with that of other systems. This reveals that our approach works well for base phrase identification while its application towards recognizing embedded structures leaves some room for improvement

arXiv.org e-Print Archive

CiteSeerX

Institutional Repository Universiteit Antwerpen

Tilburg University Repository

Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition

Author: Sang Erik F. Tjong Kim
Publication venue
Publication date: 01/01/2002
Field of study

We describe the CoNLL-2002 shared task: language-independent named entity recognition. We give background information on the data sets and the evaluation method, present a general overview of the systems that have taken part in the task and discuss their performance.Comment: 4 page

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen

Tilburg University Repository

Introduction to the CoNLL-2000 Shared Task: Chunking

Author: Buchholz Sabine
Sang Erik F. Tjong Kim
Publication venue
Publication date: 01/01/2000
Field of study

We describe the CoNLL-2000 shared task: dividing text into syntactically related non-overlapping groups of words, so-called text chunking. We give background information on the data sets, present a general overview of the systems that have taken part in the shared task and briefly discuss their performance.Comment: 6 page

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen

Tilburg University Repository

Introduction to the CoNLL-2001 Shared Task: Clause Identification

Author: Dejean Herve
Sang Erik F. Tjong Kim
Publication venue
Publication date: 01/01/2001
Field of study

We describe the CoNLL-2001 shared task: dividing text into clauses. We give background information on the data sets, present a general overview of the systems that have taken part in the shared task and briefly discuss their performance

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen

Tilburg University Repository

Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition

Author: De Meulder Fien
Sang Erik F. Tjong Kim
Publication venue
Publication date: 01/01/2003
Field of study

We describe the CoNLL-2003 shared task: language-independent named entity recognition. We give background information on the data sets (English and German) and the evaluation method, present a general overview of the systems that have taken part in the task and discuss their performance

arXiv.org e-Print Archive

CiteSeerX

Tilburg University Repository

Utilizing a transparency-driven environment toward trusted automatic genre classification: A case study in journalism history

Author: Bilgin A. (Aysenur)
Broersma M. (Marcel)
Harbers F. (Frank)
Hollink L. (Laura)
Ossenbruggen J.R. (Jacco) van
Smeenk K. (Kim)
Tjong Kim Sang E. (Erik)
Publication venue
Publication date: 01/10/2018
Field of study

CWI's Institutional Repository

Towards Transparent Linguistic Analysis of Dutch Newspaper Article Genres using Machine Learning

Author: Bilgin A. (Aysenur)
Broersma M. (Marcel)
Harbers F. (Frank)
Hollink L. (Laura)
Klaver T. (Tom)
Ossenbruggen J.R. (Jacco) van
Smeenk K. (Kim)
Tjong Kim Sang E. (Erik)
Publication venue
Publication date: 31/01/2019
Field of study

Systematic study of genre in newspapers sheds light on the development of journalism discourse. The genre conventions that can be discerned in a newspaper text signal the underlying discursive norms and practices of journalism as a profession. Historical newspapers are increasingly becoming available thanks to digital newspaper archives (in the Netherlands available through Delpher.nl), providing the opportunity for large-scale empirical research. However, the digital archives do not contain fine-grained genre information that is required for this purpose. Therefore, we use machine learning to automatically assign genre labels to newspaper articles.Machine learning facilitates substantial improvements to the outcomes of existing research by providing increased amounts of enriched data. However, the decision-making process of the machine learning pipeline needs to be verified. Our previous findings (Bilgin et al., 2018) show that accuracy scores alone are not enough to assess the performance of these pipelines and that making an informed choice not only empowers optimal study of the historical development of genre, but also increases the trustworthiness of the results. This work shows that employing a transparent approach driven by model interpretability facilitates fair comparison as well as validation of the underlying decision-making criteria of the machine learning pipelines. The criteria are presented in the form of important features, creating insights on interactions between genre-related linguistic features and bag-of-words features.</p

CWI's Institutional Repository