Search CORE

10 research outputs found

Introduction to the CoNLL-2001 Shared Task: Clause Identification

Author: Dejean Herve
Sang Erik F. Tjong Kim
Publication venue
Publication date: 01/01/2001
Field of study

We describe the CoNLL-2001 shared task: dividing text into clauses. We give background information on the data sets, present a general overview of the systems that have taken part in the shared task and briefly discuss their performance

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen

Tilburg University Repository

Introduction to the CoNLL-2000 Shared Task: Chunking

Author: Buchholz Sabine
Sang Erik F. Tjong Kim
Publication venue
Publication date: 01/01/2000
Field of study

We describe the CoNLL-2000 shared task: dividing text into syntactically related non-overlapping groups of words, so-called text chunking. We give background information on the data sets, present a general overview of the systems that have taken part in the shared task and briefly discuss their performance.Comment: 6 page

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen

Tilburg University Repository

Memory-Based Shallow Parsing

Author: Sang Erik F. Tjong Kim
Publication venue
Publication date: 01/01/2002
Field of study

We present memory-based learning approaches to shallow parsing and apply these to five tasks: base noun phrase identification, arbitrary base phrase recognition, clause detection, noun phrase parsing and full parsing. We use feature selection techniques and system combination methods for improving the performance of the memory-based learner. Our approach is evaluated on standard data sets and the results are compared with that of other systems. This reveals that our approach works well for base phrase identification while its application towards recognizing embedded structures leaves some room for improvement

arXiv.org e-Print Archive

CiteSeerX

Institutional Repository Universiteit Antwerpen

Tilburg University Repository

Dataset Enhancement and Multilingual Transfer for Named Entity Recognition in the Indonesian Language

Author: Khairunnisa Siti Oryza
カイルニサシティオリザ
Publication venue
Publication date: 30/09/2021
Field of study

東京都立大

Tokyo Metropolitan University Institutional Repository Miyako-Dori / 首都大学東京機関リポジトリ

Institutional Repositories DataBase (IRDB)

Nodalida 2005 - proceedings of the 15th NODALIDA conference

Author
Publication venue: University of Joensuu
Publication date
Field of study

UEF Electronic Publications

Automatic identification and translation of multiword expressions

Author: Taslimipoor Shiva
Publication venue
Publication date: 30/06/2018
Field of study

A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.Multiword Expressions (MWEs) belong to a class of phraseological phenomena that is ubiquitous in the study of language. They are heterogeneous lexical items consisting of more than one word and feature lexical, syntactic, semantic and pragmatic idiosyncrasies. Scholarly research on MWEs benefits both natural language processing (NLP) applications and end users. This thesis involves designing new methodologies to identify and translate MWEs. In order to deal with MWE identification, we first develop datasets of annotated verb-noun MWEs in context. We then propose a method which employs word embeddings to disambiguate between literal and idiomatic usages of the verb-noun expressions. Existence of expression types with various idiomatic and literal distributions leads us to re-examine their modelling and evaluation. We propose a type-aware train and test splitting approach to prevent models from overfitting and avoid misleading evaluation results. Identification of MWEs in context can be modelled with sequence tagging methodologies. To this end, we devise a new neural network architecture, which is a combination of convolutional neural networks and long-short term memories with an optional conditional random field layer on top. We conduct extensive evaluations on several languages demonstrating a better performance compared to the state-of-the-art systems. Experiments show that the generalisation power of the model in predicting unseen MWEs is significantly better than previous systems. In order to find translations for verb-noun MWEs, we propose a bilingual distributional similarity approach derived from a word embedding model that supports arbitrary contexts. The technique is devised to extract translation equivalents from comparable corpora which are an alternative resource to costly parallel corpora. We finally conduct a series of experiments to investigate the effects of size and quality of comparable corpora on automatic extraction of translation equivalents

Wolverhampton Intellectual Repository and E-theses

Text Chunking by System Combination

Author: Cardie C.
Daelemans W.
Nedellec C.
Tjong Kim Sang E.F.
Tjong Kim Sang E.F.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2000
Field of study

Text chunking by system combination

Author: Tjong Kim Sang Erik
Publication venue
Publication date: 01/01/2000
Field of study

Crossref

Institutional Repository Universiteit Antwerpen

Tilburg University Repository

Text Chunking by System Combination

Author: Tjong Kim Sang E.F.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2000
Field of study

Tilburg University Repository