2 research outputs found

    Modelling frequency and attestations for OntoLex-Lemon

    Get PDF
    The OntoLex vocabulary enjoys increasing popularity as a means of publishing lexical resources with RDF and as Linked Data. The recent publication of a new OntoLex module for lexicography, lexicog, reflects its increasing importance for digital lexicography. However, not all aspects of digital lexicography have been covered to the same extent. In particular, supplementary information drawn from corpora such as frequency information, links to attestations, and collocation data were considered to be beyond the scope of lexicog. Therefore, the OntoLex community has put forward the proposal for a novel module for frequency, attestation and corpus information (FrAC), that not only covers the requirements of digital lexicography, but also accommodates essential data structures for lexical information in natural language processing. This paper introduces the current state of the OntoLex-FrAC vocabulary, describes its structure, some selected use cases, elementary concepts and fundamental definitions, with a focus on frequency and attestations

    Dictionaries in digital age - information technology suporrt for Serbian language ; Π‘Π»ΠΎΠ²Π°Ρ€ΠΈ Π² Ρ†ΠΈΡ„Ρ€ΠΎΠ²ΠΎΠΌ возрастС - информационная ΠΏΠΎΠ΄Π΄Π΅Ρ€ΠΆΠΊΠ° для сСрбский язык

    Get PDF
    ΠœΠΎΡ€Ρ„ΠΎΠ»ΠΎΡˆΠΊΠΈ Ρ€Π΅Ρ‡Π½ΠΈΡ†ΠΈ српског јСзика ΠΏΡ€Π΅Π΄ΡΡ‚Π°Π²Ρ™Π°Ρ˜Ρƒ СлСктронски Ρ˜Π΅Π·ΠΈΡ‡ΠΊΠΈ рСсурс који ΠΈΠΌΠ° Π·Π½Π°Ρ‡Π°Ρ˜Π½Ρƒ ΠΈΡΡ‚ΠΎΡ€ΠΈΡ˜Ρƒ Ρ€Π°Π·Π²ΠΎΡ˜Π° ΠΈ ΠΊΠΎΡ€ΠΈΡˆΡ›Π΅ΡšΠ° Π·Π° ΠΏΠΎΡ‚Ρ€Π΅Π±Π΅ ΠΎΠ±Ρ€Π°Π΄Π΅ ΠΏΡ€ΠΈΡ€ΠΎΠ΄Π½ΠΈΡ… јСзика. Π‘ ΠΎΠ±Π·ΠΈΡ€ΠΎΠΌ Π½Π° Ρ‚ΠΎ Π΄Π° су Ρ‡ΡƒΠ²Π°Π½ΠΈ Ρƒ ΠΎΠ±Π»ΠΈΠΊΡƒ Π΄Π°Ρ‚ΠΎΡ‚Π΅ΠΊΠ° Ρ‡ΠΈΡ˜ΠΈ јС Π±Ρ€ΠΎΡ˜ нарастао ΠΏΠ° јС самим Ρ‚ΠΈΠΌ ΡƒΠΏΡ€Π°Π²Ρ™Π°ΡšΠ΅ Ρ€Π΅Ρ‡Π½ΠΈΡ†ΠΈΠΌΠ° постало ΠΎΡ‚Π΅ΠΆΠ°Π½ΠΎ јавила сС ΠΏΠΎΡ‚Ρ€Π΅Π±Π° Π·Π° ΡΠΌΠ΅ΡˆΡ‚Π°ΡšΠ΅ΠΌ ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΡ˜Π° ΠΈΠ· Ρ€Π΅Ρ‡Π½ΠΈΠΊΠ° Ρƒ ΠΎΠ±Π»ΠΈΠΊ лСксикографскС Π±Π°Π·Π΅. Како Π±ΠΈ сС ΠΎΠΌΠΎΠ³ΡƒΡ›ΠΈΠΎ симултани Ρ€Π°Π΄ Π½Π° Ρ€Π°Π·Π²ΠΎΡ˜Ρƒ Ρ€Π΅Ρ‡Π½ΠΈΠΊΠ° Π·Π° вишС корисника јавила сС ΠΏΠΎΡ‚Ρ€Π΅Π±Π° Π·Π° Π²Π΅Π±-Π°ΠΏΠ»ΠΈΠΊΠ°Ρ†ΠΈΡ˜ΠΎΠΌ заснованој Π½Π° Π»Π΅ΠΊΡΠΈΠΊΠΎΠ³Ρ€Π°Ρ„ΡΠΊΠΎΡ˜ Π±Π°Π·ΠΈ. Како Π±ΠΈ сС Ρ€Π°Π·ΠΌΠΎΡ‚Ρ€ΠΈΠ»Π΅ функционалности којС ΠΏΡ€ΡƒΠΆΠ°Ρ˜Ρƒ Ρ€Π΅Ρ‡Π½ΠΈΡ†ΠΈ Ρƒ Π΄ΠΈΠ³ΠΈΡ‚Π°Π»Π½ΠΎΠΌ ΠΎΠΊΡ€ΡƒΠΆΠ΅ΡšΡƒ Ρƒ Ρ†ΠΈΡ™Ρƒ проналаска Π½Π°Ρ˜Π±ΠΎΡ™Π΅Π³ Ρ€Π΅ΡˆΠ΅ΡšΠ° Π·Π° Ρ€Π°Π·Π²ΠΎΡ˜ Π°ΠΏΠ»ΠΈΠΊΠ°Ρ†ΠΈΡ˜Π΅, дСскриптивном ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΎΠΌ су Π°Π½Π°Π»ΠΈΠ·ΠΈΡ€Π°Π½ΠΈ Ρ€Π°Π·Π»ΠΈΡ‡ΠΈΡ‚ΠΈ ΠΏΡ€ΠΈΠΌΠ΅Ρ€ΠΈ Π΄ΠΈΠ³ΠΈΡ‚Π°Π»Π½ΠΈΡ… Ρ€Π΅Ρ‡Π½ΠΈΠΊΠ° Π½Π΅ΠΊΠΎΠ»ΠΈΠΊΠΎ јСзика. Π‘Π° Ρ†ΠΈΡ™Π΅ΠΌ ΠΎΠ΄Π°Π±ΠΈΡ€Π° Π°Π΄Π΅ΠΊΠ²Π°Ρ‚Π½ΠΎΠ³ ΠΌΠΎΠ΄Π΅Π»Π° Π·Π° Ρ€Π°Π·Π²ΠΎΡ˜ лСксикографскС Π±Π°Π·Π΅ Ρ€Π°Π·ΠΌΠ°Ρ‚Ρ€Π°Π½Π° су Ρ‚Ρ€ΠΈ стандардизована ΠΌΠΎΠ΄Π΅Π»Π° Π·Π° ΠΏΡ€Π΅Π΄ΡΡ‚Π°Π²Ρ™Π°ΡšΠ΅ ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΡ˜Π° ΠΈΠ· Ρ€Π΅Ρ‡Π½ΠΈΠΊΠ°: TEI, LMF ΠΈ lemon. МодСл Ρ€Π°Π·Π²ΠΈΡ˜Π΅Π½Π΅ лСксикографскС Π±Π°Π·Π΅ сС заснива Π½Π° ΠΊΠΎΠΌΠ±ΠΈΠ½Π°Ρ†ΠΈΡ˜ΠΈ ΠΌΠΎΠ΄Π΅Π»Π° LMF ΠΈ lemon. Π’ΠΎΠΊΠΎΠΌ Ρ€Π°Π·ΠΌΠ°Ρ‚Ρ€Π°ΡšΠ° ΠΈ Ρ€Π°Π·Π²ΠΎΡ˜Π° ΠΌΠΎΠ΄Π΅Π»Π° лСксикографскС Π±Π°Π·Π΅ ΠΊΠΎΡ€ΠΈΡˆΡ›Π΅Π½Π΅ су дСскриптивна ΠΈ ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ‚ΠΈΡ‡ΠΊΠ° Π½Π°ΡƒΡ‡Π½Π° ΠΌΠ΅Ρ‚ΠΎΠ΄Π°. Π£ΠΏΠΎΡ‚Ρ€Π΅Π±Π° лСксикографскС Π±Π°Π·Π΅ јС ΠΎΠΌΠΎΠ³ΡƒΡ›ΠΈΠ»Π° Π½Π°ΠΏΡ€Π΅Π΄Π½Ρƒ ΠΏΡ€Π΅Ρ‚Ρ€Π°Π³Ρƒ ΠΊΠ°ΠΎ ΠΈ ΡƒΡΠΏΠΎΡΡ‚Π°Π²Ρ™Π°ΡšΠ΅ Ρ€Π΅Π»Π°Ρ†ΠΈΡ˜Π° ΠΈΠ·ΠΌΠ΅Ρ’Ρƒ лСксичких записа. Π£ΡΠΏΠΎΡΡ‚Π°Π²Ρ™Π°ΡšΠ΅ Ρ€Π΅Π»Π°Ρ†ΠΈΡ˜Π° сС заснива Π½Π° Π΄Π΅Ρ„ΠΈΠ½ΠΈΡΠ°ΡšΡƒ Π³Ρ€ΡƒΠΏΠ° ΠΏΡ€Π°Π²ΠΈΠ»Π° којС лСксички записи Π·Π° повСзивањС Ρ‚Ρ€Π΅Π±Π° Π΄Π° Π·Π°Π΄ΠΎΠ²ΠΎΡ™Π΅. Π—Π°Ρ…Π²Π°Ρ™ΡƒΡ˜ΡƒΡ›ΠΈ ΡƒΠΏΠΎΡ‚Ρ€Π΅Π±ΠΈ лСксикографскС Π±Π°Π·Π΅ ΠΈ Π°ΠΏΠ»ΠΈΠΊΠ°Ρ†ΠΈΡ˜Π΅ Π·Π° ΠΏΡ€Π΅Π³Π»Π΅Π΄ ΠΈ Ρƒ ΡƒΠΏΡ€Π°Π²Ρ™Π°ΡšΠ΅ Ρ€Π΅Ρ‡Π½ΠΈΡ†ΠΈΠΌΠ° појавила сС могућност Π½Π°Π΄Π³Ρ€Π°Π΄ΡšΠ΅ ΠœΠΎΡ€Ρ„ΠΎΠ»ΠΎΡˆΠΊΠΈΡ… Ρ€Π΅Ρ‡Π½ΠΈΠΊΠ° Π·Π° српски јСзик ΠΊΠ°ΠΎ рСсурса. ЛСксички записи су Π΄ΠΎΠΏΡƒΡšΠ΅Π½ΠΈ Π²Π΅Π·Π°ΠΌΠ° са СкстСрним лСксичким рСсурсима ΠΊΠ°ΠΎ ΡˆΡ‚ΠΎ су Π’ΠΎΡ€Π΄Π½Π΅Ρ‚, Π’Π΅Ρ€ΠΌΠΈ, BabelNet, Glosbe ΠΈ Wikidata. Осим Ρ‚ΠΎΠ³Π°, ΠΎΠΌΠΎΠ³ΡƒΡ›Π΅Π½ΠΎ јС повСзивањС са записима ΠΈΠ· Π΄ΠΈΠ³ΠΈΡ‚Π°Π»ΠΈΠ·ΠΎΠ²Π°Π½ΠΈΡ… Ρ‚Ρ€Π°Π΄ΠΈΡ†ΠΈΠΎΠ½Π°Π»Π½ΠΈΡ… Ρ€Π΅Ρ‡Π½ΠΈΠΊΠ° српског јСзика којС Π±ΠΈ ΠΌΠΎΠ³Π»ΠΎ Π±ΠΈΡ‚ΠΈ доступно Π³Ρ€ΡƒΠΏΠ°ΠΌΠ° корисника који ΠΈΠΌΠ°Ρ˜Ρƒ ΠΏΡ€Π°Π²ΠΎ Π½Π° ΡšΠΈΡ…ΠΎΠ²ΠΎ ΠΊΠΎΡ€ΠΈΡˆΡ›Π΅ΡšΠ΅ Ρƒ Π΄ΠΈΠ³ΠΈΡ‚Π°Π»Π½ΠΎΠΌ ΠΎΠ±Π»ΠΈΠΊΡƒ. ЛСксички записи су ΠΏΡ€ΠΈΠΌΠ΅Π½ΠΎΠΌ Ρ€Π΅Π³ΡƒΠ»Π°Ρ€Π½ΠΈΡ… ΠΈΠ·Ρ€Π°Π·Π° ΠΈ ΠΊΠΎΠ½Π°Ρ‡Π½ΠΈΡ… Π°ΡƒΡ‚ΠΎΠΌΠ°Ρ‚Π° ΠΏΠΎΠ²Π΅Π·Π°Π½ΠΈ са корпусима Ρƒ Π²ΠΈΠ΄Ρƒ могућности ΠΏΡ€Π΅Ρ‚Ρ€Π°Π³Π΅ ΠΊΠΎΠ½ΠΊΠΎΡ€Π΄Π°Π½Ρ†ΠΈ којС садрТС Π»Π΅ΠΌΡƒ записа ΠΈΠ»ΠΈ прСдСфинисанС обрасцС ΠΏΠΎΡ˜Π°Π²Ρ™ΠΈΠ²Π°ΡšΠ° Ρ€Π΅Ρ‡ΠΈ. Записима су ΠΏΡ€ΠΈΠ΄Ρ€ΡƒΠΆΠ΅Π½Π΅ ΠΈ ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΡ˜Π΅ ΠΎ Ρ„Ρ€Π΅ΠΊΠ²Π΅Π½Ρ†ΠΈΡ˜ΠΈ ΠΏΠΎΡ˜Π°Π²Ρ™ΠΈΠ²Π°ΡšΠ° Π»Π΅ΠΌΠ° ΠΈ ΠΎΠ±Π»ΠΈΠΊΠ° Ρ€Π΅Ρ‡ΠΈ Ρƒ ΠΎΠ΄Ρ€Π΅Ρ’Π΅Π½ΠΈΠΌ корпусима. РазвијСнС Π°ΠΏΠ»ΠΈΠΊΠ°Ρ†ΠΈΡ˜Π° ΠΈ Π±Π°Π·Π° су тСстиранС Π½Π° Ρ€Π΅Ρ‡Π½ΠΈΡ†ΠΈΠΌΠ° СксцСрпираним ΠΈΠ· корпуса ΠΈΠ· гСолошког Π΄ΠΎΠΌΠ΅Π½Π° Π“Π΅ΠΎΠ‘Ρ€ΠΏΠšΠΎΡ€ Ρ€Π°Π·Π²ΠΈΡ˜Π΅Π½ΠΎΠΌ Π·Π° ΠΏΠΎΡ‚Ρ€Π΅Π±Π΅ ΠΎΠ²ΠΎΠ³ ΠΈΡΡ‚Ρ€Π°ΠΆΠΈΠ²Π°ΡšΠ°.Serbian morphological dictionaries represent an electronic language resource with significant history of development and use in natural language processing. Since they were kept in form of files whose number grew, and thus the management of dictionaries became more difficult, it was necessary to store information from the dictionary in the form of a lexicographic database. In order to enable the dictionary development based on simultaneous work for several users, a web application based on a lexicographic database was needed. In order to consider the functionalities provided by dictionaries in the digital environment towards finding the best solution for application development, various examples of digital dictionaries of several languages were analyzed using the descriptive method. To establish an adequate model for development of the lexicographic database, three standardized models for presenting information from the dictionary were considered: TEI, LMF and lemon. The model of the developed lexicographic database is based on a combination of the LMF and lemon models. During the process of the lexicographic database model development, descriptive and informatics scientific methods were used. The use of lexicographic base enabled advanced search as well as the establishment of relations between lexical entries. Establishing lexical relations is based on the set of rules that define which criteria the lexical entries should meet. The upgrade of Serbian morphological dictionaries came as a result of using the lexicographic database and the application for browsing and managing dictionaries Lexical entries are enriched by links to external lexical resources, some of which are: Wordnet, Termi, BabelNet, Glosbe and Wikidata. It is also possible to set up the connection with lexical entries from digitized printed dictionaries of the Serbian language. This could be available to groups of users who have access to these dictionaries in digital form. Lexical entries are linked with corpora using regular expressions and finite automata. There is a possibility of searching for concordances that contain a lemma of lexical entries or predefined patterns of word occurrence. The lexical entries are extended by information on the frequency of occurrence of lemmas and word forms in certain corpora. The developed application and database were tested on dictionaries excerpted from the corpus from the geological domain - GeoSrpKor that was developed for the purpose of this research
    corecore