8,553 research outputs found

    Stabilizing knowledge through standards - A perspective for the humanities

    Get PDF
    It is usual to consider that standards generate mixed feelings among scientists. They are often seen as not really reflecting the state of the art in a given domain and a hindrance to scientific creativity. Still, scientists should theoretically be at the best place to bring their expertise into standard developments, being even more neutral on issues that may typically be related to competing industrial interests. Even if it could be thought of as even more complex to think about developping standards in the humanities, we will show how this can be made feasible through the experience gained both within the Text Encoding Initiative consortium and the International Organisation for Standardisation. By taking the specific case of lexical resources, we will try to show how this brings about new ideas for designing future research infrastructures in the human and social sciences

    Computerization of African languages-French dictionaries

    Get PDF
    This paper relates work done during the DiLAF project. It consists in converting 5 bilingual African language-French dictionaries originally in Word format into XML following the LMF model. The languages processed are Bambara, Hausa, Kanuri, Tamajaq and Songhai-zarma, still considered as under-resourced languages concerning Natural Language Processing tools. Once converted, the dictionaries are available online on the Jibiki platform for lookup and modification. The DiLAF project is first presented. A description of each dictionary follows. Then, the conversion methodology from .doc format to XML files is presented. A specific point on the usage of Unicode follows. Then, each step of the conversion into XML and LMF is detailed. The last part presents the Jibiki lexical resources management platform used for the project.Comment: 8 page

    Russian Lexicographic Landscape: a Tale of 12 Dictionaries

    Full text link
    The paper reports on quantitative analysis of 12 Russian dictionaries at three levels: 1) headwords: The size and overlap of word lists, coverage of large corpora, and presence of neologisms; 2) synonyms: Overlap of synsets in different dictionaries; 3) definitions: Distribution of definition lengths and numbers of senses, as well as textual similarity of same-headword definitions in different dictionaries. The total amount of data in the study is 805,900 dictionary entries, 892,900 definitions, and 84,500 synsets. The study reveals multiple connections and mutual influences between dictionaries, uncovers differences in modern electronic vs. traditional printed resources, as well as suggests directions for development of new and improvement of existing lexical semantic resources

    The construction of a linguistic linked data framework for bilingual lexicographic resources

    Get PDF
    Little-known lexicographic resources can be of tremendous value to users once digitised. By extending the digitisation efforts for a lexicographic resource, converting the human readable digital object to a state that is also machine-readable, structured data can be created that is semantically interoperable, thereby enabling the lexicographic resource to access, and be accessed by, other semantically interoperable resources. The purpose of this study is to formulate a process when converting a lexicographic resource in print form to a machine-readable bilingual lexicographic resource applying linguistic linked data principles, using the English-Xhosa Dictionary for Nurses as a case study. This is accomplished by creating a linked data framework, in which data are expressed in the form of RDF triples and URIs, in a manner which allows for extensibility to a multilingual resource. Click languages with characters not typically represented by the Roman alphabet are also considered. The purpose of this linked data framework is to define each lexical entry as “historically dynamic”, instead of “ontologically static” (Rafferty, 2016:5). For a framework which has instances in constant evolution, focus is thus given to the management of provenance and linked data generation thereof. The output is an implementation framework which provides methodological guidelines for similar language resources in the interdisciplinary field of Library and Information Science

    The combined Wordnet Bahasa

    Get PDF

    Introduction to the special issue on cross-language algorithms and applications

    Get PDF
    With the increasingly global nature of our everyday interactions, the need for multilingual technologies to support efficient and efective information access and communication cannot be overemphasized. Computational modeling of language has been the focus of Natural Language Processing, a subdiscipline of Artificial Intelligence. One of the current challenges for this discipline is to design methodologies and algorithms that are cross-language in order to create multilingual technologies rapidly. The goal of this JAIR special issue on Cross-Language Algorithms and Applications (CLAA) is to present leading research in this area, with emphasis on developing unifying themes that could lead to the development of the science of multi- and cross-lingualism. In this introduction, we provide the reader with the motivation for this special issue and summarize the contributions of the papers that have been included. The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment analysis, cross-language lexical resources, dependency parsing, information retrieval and knowledge representation. We anticipate that this special issue will serve as an invaluable resource for researchers interested in topics of cross-lingual natural language processing.Postprint (published version
    corecore