2 research outputs found
Modelling frequency and attestations for OntoLex-Lemon
The OntoLex vocabulary enjoys increasing popularity as a means of publishing lexical resources with RDF and as Linked Data. The
recent publication of a new OntoLex module for lexicography, lexicog, reflects its increasing importance for digital lexicography.
However, not all aspects of digital lexicography have been covered to the same extent. In particular, supplementary information drawn
from corpora such as frequency information, links to attestations, and collocation data were considered to be beyond the scope of
lexicog. Therefore, the OntoLex community has put forward the proposal for a novel module for frequency, attestation and corpus
information (FrAC), that not only covers the requirements of digital lexicography, but also accommodates essential data structures for
lexical information in natural language processing. This paper introduces the current state of the OntoLex-FrAC vocabulary, describes
its structure, some selected use cases, elementary concepts and fundamental definitions, with a focus on frequency and attestations
Dictionaries in digital age - information technology suporrt for Serbian language ; Π‘Π»ΠΎΠ²Π°ΡΠΈ Π² ΡΠΈΡΡΠΎΠ²ΠΎΠΌ Π²ΠΎΠ·ΡΠ°ΡΡΠ΅ - ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΠΎΠ½Π½Π°Ρ ΠΏΠΎΠ΄Π΄Π΅ΡΠΆΠΊΠ° Π΄Π»Ρ ΡΠ΅ΡΠ±ΡΠΊΠΈΠΉ ΡΠ·ΡΠΊ
ΠΠΎΡΡΠΎΠ»ΠΎΡΠΊΠΈ ΡΠ΅ΡΠ½ΠΈΡΠΈ ΡΡΠΏΡΠΊΠΎΠ³ ΡΠ΅Π·ΠΈΠΊΠ° ΠΏΡΠ΅Π΄ΡΡΠ°Π²ΡΠ°ΡΡ Π΅Π»Π΅ΠΊΡΡΠΎΠ½ΡΠΊΠΈ ΡΠ΅Π·ΠΈΡΠΊΠΈ ΡΠ΅ΡΡΡΡ ΠΊΠΎΡΠΈ ΠΈΠΌΠ° Π·Π½Π°ΡΠ°ΡΠ½Ρ ΠΈΡΡΠΎΡΠΈΡΡ ΡΠ°Π·Π²ΠΎΡΠ° ΠΈ ΠΊΠΎΡΠΈΡΡΠ΅ΡΠ° Π·Π° ΠΏΠΎΡΡΠ΅Π±Π΅ ΠΎΠ±ΡΠ°Π΄Π΅ ΠΏΡΠΈΡΠΎΠ΄Π½ΠΈΡ
ΡΠ΅Π·ΠΈΠΊΠ°. Π‘ ΠΎΠ±Π·ΠΈΡΠΎΠΌ Π½Π° ΡΠΎ Π΄Π° ΡΡ ΡΡΠ²Π°Π½ΠΈ Ρ ΠΎΠ±Π»ΠΈΠΊΡ Π΄Π°ΡΠΎΡΠ΅ΠΊΠ° ΡΠΈΡΠΈ ΡΠ΅ Π±ΡΠΎΡ Π½Π°ΡΠ°ΡΡΠ°ΠΎ ΠΏΠ° ΡΠ΅ ΡΠ°ΠΌΠΈΠΌ ΡΠΈΠΌ ΡΠΏΡΠ°Π²ΡΠ°ΡΠ΅ ΡΠ΅ΡΠ½ΠΈΡΠΈΠΌΠ° ΠΏΠΎΡΡΠ°Π»ΠΎ ΠΎΡΠ΅ΠΆΠ°Π½ΠΎ ΡΠ°Π²ΠΈΠ»Π° ΡΠ΅ ΠΏΠΎΡΡΠ΅Π±Π° Π·Π° ΡΠΌΠ΅ΡΡΠ°ΡΠ΅ΠΌ ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΡΠ° ΠΈΠ· ΡΠ΅ΡΠ½ΠΈΠΊΠ° Ρ ΠΎΠ±Π»ΠΈΠΊ Π»Π΅ΠΊΡΠΈΠΊΠΎΠ³ΡΠ°ΡΡΠΊΠ΅ Π±Π°Π·Π΅. ΠΠ°ΠΊΠΎ Π±ΠΈ ΡΠ΅ ΠΎΠΌΠΎΠ³ΡΡΠΈΠΎ ΡΠΈΠΌΡΠ»ΡΠ°Π½ΠΈ ΡΠ°Π΄ Π½Π° ΡΠ°Π·Π²ΠΎΡΡ ΡΠ΅ΡΠ½ΠΈΠΊΠ° Π·Π° Π²ΠΈΡΠ΅ ΠΊΠΎΡΠΈΡΠ½ΠΈΠΊΠ° ΡΠ°Π²ΠΈΠ»Π° ΡΠ΅ ΠΏΠΎΡΡΠ΅Π±Π° Π·Π° Π²Π΅Π±-Π°ΠΏΠ»ΠΈΠΊΠ°ΡΠΈΡΠΎΠΌ Π·Π°ΡΠ½ΠΎΠ²Π°Π½ΠΎΡ Π½Π° Π»Π΅ΠΊΡΠΈΠΊΠΎΠ³ΡΠ°ΡΡΠΊΠΎΡ Π±Π°Π·ΠΈ.
ΠΠ°ΠΊΠΎ Π±ΠΈ ΡΠ΅ ΡΠ°Π·ΠΌΠΎΡΡΠΈΠ»Π΅ ΡΡΠ½ΠΊΡΠΈΠΎΠ½Π°Π»Π½ΠΎΡΡΠΈ ΠΊΠΎΡΠ΅ ΠΏΡΡΠΆΠ°ΡΡ ΡΠ΅ΡΠ½ΠΈΡΠΈ Ρ Π΄ΠΈΠ³ΠΈΡΠ°Π»Π½ΠΎΠΌ ΠΎΠΊΡΡΠΆΠ΅ΡΡ Ρ ΡΠΈΡΡ ΠΏΡΠΎΠ½Π°Π»Π°ΡΠΊΠ° Π½Π°ΡΠ±ΠΎΡΠ΅Π³ ΡΠ΅ΡΠ΅ΡΠ° Π·Π° ΡΠ°Π·Π²ΠΎΡ Π°ΠΏΠ»ΠΈΠΊΠ°ΡΠΈΡΠ΅, Π΄Π΅ΡΠΊΡΠΈΠΏΡΠΈΠ²Π½ΠΎΠΌ ΠΌΠ΅ΡΠΎΠ΄ΠΎΠΌ ΡΡ Π°Π½Π°Π»ΠΈΠ·ΠΈΡΠ°Π½ΠΈ ΡΠ°Π·Π»ΠΈΡΠΈΡΠΈ ΠΏΡΠΈΠΌΠ΅ΡΠΈ Π΄ΠΈΠ³ΠΈΡΠ°Π»Π½ΠΈΡ
ΡΠ΅ΡΠ½ΠΈΠΊΠ° Π½Π΅ΠΊΠΎΠ»ΠΈΠΊΠΎ ΡΠ΅Π·ΠΈΠΊΠ°.
Π‘Π° ΡΠΈΡΠ΅ΠΌ ΠΎΠ΄Π°Π±ΠΈΡΠ° Π°Π΄Π΅ΠΊΠ²Π°ΡΠ½ΠΎΠ³ ΠΌΠΎΠ΄Π΅Π»Π° Π·Π° ΡΠ°Π·Π²ΠΎΡ Π»Π΅ΠΊΡΠΈΠΊΠΎΠ³ΡΠ°ΡΡΠΊΠ΅ Π±Π°Π·Π΅ ΡΠ°Π·ΠΌΠ°ΡΡΠ°Π½Π° ΡΡ ΡΡΠΈ ΡΡΠ°Π½Π΄Π°ΡΠ΄ΠΈΠ·ΠΎΠ²Π°Π½Π° ΠΌΠΎΠ΄Π΅Π»Π° Π·Π° ΠΏΡΠ΅Π΄ΡΡΠ°Π²ΡΠ°ΡΠ΅ ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΡΠ° ΠΈΠ· ΡΠ΅ΡΠ½ΠΈΠΊΠ°: TEI, LMF ΠΈ lemon. ΠΠΎΠ΄Π΅Π» ΡΠ°Π·Π²ΠΈΡΠ΅Π½Π΅ Π»Π΅ΠΊΡΠΈΠΊΠΎΠ³ΡΠ°ΡΡΠΊΠ΅ Π±Π°Π·Π΅ ΡΠ΅ Π·Π°ΡΠ½ΠΈΠ²Π° Π½Π° ΠΊΠΎΠΌΠ±ΠΈΠ½Π°ΡΠΈΡΠΈ ΠΌΠΎΠ΄Π΅Π»Π° LMF ΠΈ lemon. Π’ΠΎΠΊΠΎΠΌ ΡΠ°Π·ΠΌΠ°ΡΡΠ°ΡΠ° ΠΈ ΡΠ°Π·Π²ΠΎΡΠ° ΠΌΠΎΠ΄Π΅Π»Π° Π»Π΅ΠΊΡΠΈΠΊΠΎΠ³ΡΠ°ΡΡΠΊΠ΅ Π±Π°Π·Π΅ ΠΊΠΎΡΠΈΡΡΠ΅Π½Π΅ ΡΡ Π΄Π΅ΡΠΊΡΠΈΠΏΡΠΈΠ²Π½Π° ΠΈ ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΡΠΊΠ° Π½Π°ΡΡΠ½Π° ΠΌΠ΅ΡΠΎΠ΄Π°. Π£ΠΏΠΎΡΡΠ΅Π±Π° Π»Π΅ΠΊΡΠΈΠΊΠΎΠ³ΡΠ°ΡΡΠΊΠ΅ Π±Π°Π·Π΅ ΡΠ΅ ΠΎΠΌΠΎΠ³ΡΡΠΈΠ»Π° Π½Π°ΠΏΡΠ΅Π΄Π½Ρ ΠΏΡΠ΅ΡΡΠ°Π³Ρ ΠΊΠ°ΠΎ ΠΈ ΡΡΠΏΠΎΡΡΠ°Π²ΡΠ°ΡΠ΅ ΡΠ΅Π»Π°ΡΠΈΡΠ° ΠΈΠ·ΠΌΠ΅ΡΡ Π»Π΅ΠΊΡΠΈΡΠΊΠΈΡ
Π·Π°ΠΏΠΈΡΠ°. Π£ΡΠΏΠΎΡΡΠ°Π²ΡΠ°ΡΠ΅ ΡΠ΅Π»Π°ΡΠΈΡΠ° ΡΠ΅ Π·Π°ΡΠ½ΠΈΠ²Π° Π½Π° Π΄Π΅ΡΠΈΠ½ΠΈΡΠ°ΡΡ Π³ΡΡΠΏΠ° ΠΏΡΠ°Π²ΠΈΠ»Π° ΠΊΠΎΡΠ΅ Π»Π΅ΠΊΡΠΈΡΠΊΠΈ Π·Π°ΠΏΠΈΡΠΈ Π·Π° ΠΏΠΎΠ²Π΅Π·ΠΈΠ²Π°ΡΠ΅ ΡΡΠ΅Π±Π° Π΄Π° Π·Π°Π΄ΠΎΠ²ΠΎΡΠ΅.
ΠΠ°Ρ
Π²Π°ΡΡΡΡΡΠΈ ΡΠΏΠΎΡΡΠ΅Π±ΠΈ Π»Π΅ΠΊΡΠΈΠΊΠΎΠ³ΡΠ°ΡΡΠΊΠ΅ Π±Π°Π·Π΅ ΠΈ Π°ΠΏΠ»ΠΈΠΊΠ°ΡΠΈΡΠ΅ Π·Π° ΠΏΡΠ΅Π³Π»Π΅Π΄ ΠΈ Ρ ΡΠΏΡΠ°Π²ΡΠ°ΡΠ΅ ΡΠ΅ΡΠ½ΠΈΡΠΈΠΌΠ° ΠΏΠΎΡΠ°Π²ΠΈΠ»Π° ΡΠ΅ ΠΌΠΎΠ³ΡΡΠ½ΠΎΡΡ Π½Π°Π΄Π³ΡΠ°Π΄ΡΠ΅ ΠΠΎΡΡΠΎΠ»ΠΎΡΠΊΠΈΡ
ΡΠ΅ΡΠ½ΠΈΠΊΠ° Π·Π° ΡΡΠΏΡΠΊΠΈ ΡΠ΅Π·ΠΈΠΊ ΠΊΠ°ΠΎ ΡΠ΅ΡΡΡΡΠ°. ΠΠ΅ΠΊΡΠΈΡΠΊΠΈ Π·Π°ΠΏΠΈΡΠΈ ΡΡ Π΄ΠΎΠΏΡΡΠ΅Π½ΠΈ Π²Π΅Π·Π°ΠΌΠ° ΡΠ° Π΅ΠΊΡΡΠ΅ΡΠ½ΠΈΠΌ Π»Π΅ΠΊΡΠΈΡΠΊΠΈΠΌ ΡΠ΅ΡΡΡΡΠΈΠΌΠ° ΠΊΠ°ΠΎ ΡΡΠΎ ΡΡ ΠΠΎΡΠ΄Π½Π΅Ρ, Π’Π΅ΡΠΌΠΈ, BabelNet, Glosbe ΠΈ Wikidata. ΠΡΠΈΠΌ ΡΠΎΠ³Π°, ΠΎΠΌΠΎΠ³ΡΡΠ΅Π½ΠΎ ΡΠ΅ ΠΏΠΎΠ²Π΅Π·ΠΈΠ²Π°ΡΠ΅ ΡΠ° Π·Π°ΠΏΠΈΡΠΈΠΌΠ° ΠΈΠ· Π΄ΠΈΠ³ΠΈΡΠ°Π»ΠΈΠ·ΠΎΠ²Π°Π½ΠΈΡ
ΡΡΠ°Π΄ΠΈΡΠΈΠΎΠ½Π°Π»Π½ΠΈΡ
ΡΠ΅ΡΠ½ΠΈΠΊΠ° ΡΡΠΏΡΠΊΠΎΠ³ ΡΠ΅Π·ΠΈΠΊΠ° ΠΊΠΎΡΠ΅ Π±ΠΈ ΠΌΠΎΠ³Π»ΠΎ Π±ΠΈΡΠΈ Π΄ΠΎΡΡΡΠΏΠ½ΠΎ Π³ΡΡΠΏΠ°ΠΌΠ° ΠΊΠΎΡΠΈΡΠ½ΠΈΠΊΠ° ΠΊΠΎΡΠΈ ΠΈΠΌΠ°ΡΡ ΠΏΡΠ°Π²ΠΎ Π½Π° ΡΠΈΡ
ΠΎΠ²ΠΎ ΠΊΠΎΡΠΈΡΡΠ΅ΡΠ΅ Ρ Π΄ΠΈΠ³ΠΈΡΠ°Π»Π½ΠΎΠΌ ΠΎΠ±Π»ΠΈΠΊΡ.
ΠΠ΅ΠΊΡΠΈΡΠΊΠΈ Π·Π°ΠΏΠΈΡΠΈ ΡΡ ΠΏΡΠΈΠΌΠ΅Π½ΠΎΠΌ ΡΠ΅Π³ΡΠ»Π°ΡΠ½ΠΈΡ
ΠΈΠ·ΡΠ°Π·Π° ΠΈ ΠΊΠΎΠ½Π°ΡΠ½ΠΈΡ
Π°ΡΡΠΎΠΌΠ°ΡΠ° ΠΏΠΎΠ²Π΅Π·Π°Π½ΠΈ ΡΠ° ΠΊΠΎΡΠΏΡΡΠΈΠΌΠ° Ρ Π²ΠΈΠ΄Ρ ΠΌΠΎΠ³ΡΡΠ½ΠΎΡΡΠΈ ΠΏΡΠ΅ΡΡΠ°Π³Π΅ ΠΊΠΎΠ½ΠΊΠΎΡΠ΄Π°Π½ΡΠΈ ΠΊΠΎΡΠ΅ ΡΠ°Π΄ΡΠΆΠ΅ Π»Π΅ΠΌΡ Π·Π°ΠΏΠΈΡΠ° ΠΈΠ»ΠΈ ΠΏΡΠ΅Π΄Π΅ΡΠΈΠ½ΠΈΡΠ°Π½Π΅ ΠΎΠ±ΡΠ°ΡΡΠ΅ ΠΏΠΎΡΠ°Π²ΡΠΈΠ²Π°ΡΠ° ΡΠ΅ΡΠΈ. ΠΠ°ΠΏΠΈΡΠΈΠΌΠ° ΡΡ ΠΏΡΠΈΠ΄ΡΡΠΆΠ΅Π½Π΅ ΠΈ ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΡΠ΅ ΠΎ ΡΡΠ΅ΠΊΠ²Π΅Π½ΡΠΈΡΠΈ ΠΏΠΎΡΠ°Π²ΡΠΈΠ²Π°ΡΠ° Π»Π΅ΠΌΠ° ΠΈ ΠΎΠ±Π»ΠΈΠΊΠ° ΡΠ΅ΡΠΈ Ρ ΠΎΠ΄ΡΠ΅ΡΠ΅Π½ΠΈΠΌ ΠΊΠΎΡΠΏΡΡΠΈΠΌΠ°.
Π Π°Π·Π²ΠΈΡΠ΅Π½Π΅ Π°ΠΏΠ»ΠΈΠΊΠ°ΡΠΈΡΠ° ΠΈ Π±Π°Π·Π° ΡΡ ΡΠ΅ΡΡΠΈΡΠ°Π½Π΅ Π½Π° ΡΠ΅ΡΠ½ΠΈΡΠΈΠΌΠ° Π΅ΠΊΡΡΠ΅ΡΠΏΠΈΡΠ°Π½ΠΈΠΌ ΠΈΠ· ΠΊΠΎΡΠΏΡΡΠ° ΠΈΠ· Π³Π΅ΠΎΠ»ΠΎΡΠΊΠΎΠ³ Π΄ΠΎΠΌΠ΅Π½Π° ΠΠ΅ΠΎΠ‘ΡΠΏΠΠΎΡ ΡΠ°Π·Π²ΠΈΡΠ΅Π½ΠΎΠΌ Π·Π° ΠΏΠΎΡΡΠ΅Π±Π΅ ΠΎΠ²ΠΎΠ³ ΠΈΡΡΡΠ°ΠΆΠΈΠ²Π°ΡΠ°.Serbian morphological dictionaries represent an electronic language resource with significant history of development and use in natural language processing. Since they were kept in form of files whose number grew, and thus the management of dictionaries became more difficult, it was necessary to store information from the dictionary in the form of a lexicographic database. In order to enable the dictionary development based on simultaneous work for several users, a web application based on a lexicographic database was needed.
In order to consider the functionalities provided by dictionaries in the digital environment towards finding the best solution for application development, various examples of digital dictionaries of several languages were analyzed using the descriptive method.
To establish an adequate model for development of the lexicographic database, three standardized models for presenting information from the dictionary were considered: TEI, LMF and lemon. The model of the developed lexicographic database is based on a combination of the LMF and lemon models. During the process of the lexicographic database model development, descriptive and informatics scientific methods were used. The use of lexicographic base enabled advanced search as well as the establishment of relations between lexical entries. Establishing lexical relations is based on the set of rules that define which criteria the lexical entries should meet.
The upgrade of Serbian morphological dictionaries came as a result of using the lexicographic database and the application for browsing and managing dictionaries Lexical entries are enriched by links to external lexical resources, some of which are: Wordnet, Termi, BabelNet, Glosbe and Wikidata. It is also possible to set up the connection with lexical entries from digitized printed dictionaries of the Serbian language. This could be available to groups of users who have access to these dictionaries in digital form.
Lexical entries are linked with corpora using regular expressions and finite automata. There is a possibility of searching for concordances that contain a lemma of lexical entries or predefined patterns of word occurrence. The lexical entries are extended by information on the frequency of occurrence of lemmas and word forms in certain corpora.
The developed application and database were tested on dictionaries excerpted from the corpus from the geological domain - GeoSrpKor that was developed for the purpose of this research