OpenTagger: A flexible and user-friendly linguistic tagger

Abstract

Linguistic annotation adds valuable information to a corpus. Annotated corpora are highly useful for linguists since they increase the range of linguistic phenomena that may be registered, categorised and retrieved. In addition, they are also significant for machines, as Natural Language Processing applications involve working with well-annotated data (e.g. Imran, Mitra and Castillo 2016) and some machine learning classifiers employ annotated data to test or train new language annotation tools, among other uses. In this regard, Pustejovsky and Stubbs (2012) report on stages for building annotated corpora to train machine learning algorithms. This paper describes OpenTagger, a new linguistic tagger that allows users to include any type of information to the different paragraphs, sentences, or words that compose a text. OpenTagger is characterised by its high usability and flexibility. It is a web application that allows users to manually annotate texts using their own predefined tag set or creating a new one. Thus, it offers an answer to any need for a tailor-made annotation system. This tagset may include nested categories. In addition, multiple layers of annotation are possible. The annotation process is very easy and provides two options: i) Selecting text and tagging; ii) Selecting a tag and annotating as much text as precissed. OpenTagger also includes a search box to query the text and retrieve relevant sections for tagging. In sum, the open character of this tool and its user-friendliness allows extending the benefits of annotation to a wider variety of research questions. OpenTagger differs from others well-known taggers such as Nooj (Silberztein, 2005) because of its simplicity and web access, as it is not specialised for grammar construction or other complex processes. Potential users range from novel linguist researchers to experts. Last, it should be mentioned that a further integration within the corpus analysis software ACTRES Corpus Manager (Sanjurjo-González, 2017) is planned for the future. OpenTagger will make the process of building and querying custom annotated corpora more straightforward using ACM.ACTRES, TRALIMA/ITZULIK, GIU19/067, Gobierno Vasco IT1209/1

    Similar works