7 research outputs found

    Technologies in computerized lexicography

    Get PDF
    Since the early eighties, computer technology has become increasingly relevant to lexicography. Computer science will probably not be the only technological discipline which may have implications for future computerized lexicography. Some developments in the fields of language technology, information technology and knowledge engineering, may support lexicographical practice and enhance the quality of the resulting dictionary. The present paper discusses how the analysis and interpretation of electronic corpus data by the lexicographer may be improved by automatic linguistic analysis, by better access to the corpus, and by a more flexible communication with the computer system. As a frame of reference, first an indication of the state of the art in computerized lexicography will be given, by a concise discussion of three projects at the Institute for Dutch Lexicology INL considered in an international context: the conversion of the Woordenboek der Nederlandsche Taal WNT (Dictionary of the Dutch Language Based on Historical Principles) to electronic form, the compilation of the Vroegmiddelnederlands Woordenboek (Dictionary of Early Middle Dutch) in a computerized lexicographer's workbench, and the INL Taalbank (INL Language Database). Although the topic of this paper is technology, focus is on functional rather than technical aspects of computerized lexicography.Keywords: computerized lexicography, electronic dictionary, electronic text corpus, lexicographer's workbench, integrated language database, automatic linguistic analysis, information retrieval, user interfac

    A 38 million words Dutch text corpus and its users

    Get PDF
    The use of text corpora has increased considerably in the past few years, not only in the field of lexicography but also in computational linguistics and language technology. Consequently, corpus data and expertise developed by lexicographical institutions have gained a broader scope of application. In the European context this has led to a revised view of corpus design. In line with these developments, the Institute for Dutch Lexicology (INL) has since 1994 been providing external access to steadily improving corpora via Internet. In August 1996, the 38 Million Words Corpus was available for consultation by the international research community. The present paper reports on the characteristics of this corpus (design, text classification, linguistic annotation) and on its use, both in dictionary projects and in linguistic research. In spite of limitations with respect to corpus design, the INL corpora accessible via Internet have proved to meet external needs. By providing these facilities, the INL has acquired a much broader experience in corpus-building than before, which is essential for new, internal dictionary projects. Giving external access to corpus data which was developed primarily for internal purposes, may be profitable for all parties involved.Keywords: large electronic dutch text corpus, corpus design, text classification, topic, publication medium, linguistic annotation, on-line access via internet, corpus user

    Buitenlandsheid en begrijpelijkheid in het Nederlands van buitenlandse arbeiders, een verkennende studie

    Get PDF
    Wetensch. publicatieFaculteit der Lettere

    Putting the Dutch PAROLE Corpus to Work

    No full text
    We discuss the activities towards the development of the retrieval application of the Dutch PAROLE Corpus. Compared to the other corpora developed by INL, the PAROLE Corpus has been encoded with more extended types of metadata, conformant to the TEI standard for text encoding. A search engine and a web-based user interface, both newly developed by INL, provide the user with the functionality to explore the corpus, not only at the level of the text, but also at the level of the metadata or a combination of the two
    corecore