7 research outputs found
Technologies in computerized lexicography
Since the early eighties, computer technology has become increasingly relevant to lexicography. Computer science will probably not be the only technological discipline which may have implications for future computerized lexicography. Some developments in the fields of language technology, information technology and knowledge engineering, may support lexicographical practice and enhance the quality of the resulting dictionary. The present paper discusses how the analysis and interpretation of electronic corpus data by the lexicographer may be improved by automatic linguistic analysis, by better access to the corpus, and by a more flexible communication with the computer system. As a frame of reference, first an indication of the state of the art in computerized lexicography will be given, by a concise discussion of three projects at the Institute for Dutch Lexicology INL considered in an international context: the conversion of the Woordenboek der Nederlandsche Taal WNT (Dictionary of the Dutch Language Based on Historical Principles) to electronic form, the compilation of the Vroegmiddelnederlands Woordenboek (Dictionary of Early Middle Dutch) in a computerized lexicographer's workbench, and the INL Taalbank (INL Language Database). Although the topic of this paper is technology, focus is on functional rather than technical aspects of computerized lexicography.Keywords: computerized lexicography, electronic dictionary, electronic text corpus, lexicographer's workbench, integrated language database, automatic linguistic analysis, information retrieval, user interfac
A 38 million words Dutch text corpus and its users
The use of text corpora has increased considerably in the past few years, not only in the field of lexicography but also in computational linguistics and language technology. Consequently, corpus data and expertise developed by lexicographical institutions have gained a broader scope of application. In the European context this has led to a revised view of corpus design. In line with these developments, the Institute for Dutch Lexicology (INL) has since 1994 been providing external access to steadily improving corpora via Internet. In August 1996, the 38 Million Words Corpus was available for consultation by the international research community. The present paper reports on the characteristics of this corpus (design, text classification, linguistic annotation) and on its use, both in dictionary projects and in linguistic research. In spite of limitations with respect to corpus design, the INL corpora accessible via Internet have proved to meet external needs. By providing these facilities, the INL has acquired a much broader experience in corpus-building than before, which is essential for new, internal dictionary projects. Giving external access to corpus data which was developed primarily for internal purposes, may be profitable for all parties involved.Keywords: large electronic dutch text corpus, corpus design, text classification, topic, publication medium, linguistic annotation, on-line access via internet, corpus user
Buitenlandsheid en begrijpelijkheid in het Nederlands van buitenlandse arbeiders, een verkennende studie
Wetensch. publicatieFaculteit der Lettere
Putting the Dutch PAROLE Corpus to Work
We discuss the activities towards the development of the retrieval application of the Dutch PAROLE Corpus. Compared to the other corpora developed by INL, the PAROLE Corpus has been encoded with more extended types of metadata, conformant to the TEI standard for text encoding. A search engine and a web-based user interface, both newly developed by INL, provide the user with the functionality to explore the corpus, not only at the level of the text, but also at the level of the metadata or a combination of the two