70 research outputs found
RjeÄnik suvremenoga slovenskog jezika: od slovenske leksiÄke baze do digitalne rjeÄniÄke baze
The ability to process language data has become fundamental to the development of technologies in various areas of human life in the digital world. The development of digitally readable linguistic resources, methods, and tools is, therefore, also a key challenge for the contemporary Slovene language. This challenge has been recognized in the Slovene language community both at the professional and state level and has been the subject of many activities over the past ten years, which will be presented in this paper.
The idea of a comprehensive dictionary database covering all levels of linguistic description in modern Slovene, from the morphological and lexical levels to the syntactic level, has already formulated within the framework of the European Social Fundās Communication in Slovene (2008-2013) project; the Slovene Lexical Database was also created within the framework of this project. Two goals were pursued in designing the Slovene Lexical Database (SLD): creating linguistic descriptions of Slovene intended for human users that would also be useful for the machine processing of Slovene. Ever since the construction of the first Slovene corpus, it has become evident that there is a need for a description of modern Slovene based on real language data, and that it is necessary to understand the needs of language users to create useful language reference works. It also became apparent that only the digital medium enables the comprehensiveness of language description and that the design of the database must be adapted to it from the start. Also, the description must follow best practices as closely as possible in terms of formats and international standards, as this enables the inclusion of Slovene into a wider network of resources, such as Open Linked Data, babelNet and ELExIS. Due to time pressures and trends in lexicography, procedures to automate the extraction of linguistic data from corpora and the inclusion of crowdsourcing into the lexicographic process were taken into consideration.
Following the essential idea of creating an all-inclusive digital dictionary database for Slovene, a few independent databases have been created over the past two years: the Collocations Dictionary of Modern Slovene, and the automatically generated Thesaurus of Modern Slovene, both of which also exist as independent online dictionary portals. One of the novelties that we put forward together with both dictionaries is the āresponsive dictionaryā concept, which includes crowdsourcing methods. Ultimately, the Digital Dictionary Database provides all (other) levels of linguistic description: the morphological level with the Sloleks database upgrade, the phraseological level with the construction of a multi-word expressions lexicon, and the syntactic level with the formalization of Slovene verb valency patterns. Each of these databases contains its specific language data that will ultimately be included in the comprehensive Slovene Digital Dictionary Database, which will represent basic linguistic descriptions of Slovene both for the human and machine user.Ideja sveobuhvatne rjeÄniÄke baze koja ukljuÄuje sve razine jeziÄnoga opisa suvremenoga slovenskog jezika od morfoloÅ”ke i leksiÄke do sintaktiÄke prvotno je formulirana u okviru projekta Sporazumijevanje na slovenskomu jeziku (2008. ā 2013.). U cilju ostvarenja ideje o stvaranju sveobuhvatne digitalne rjeÄniÄke baze stvorene su dvije neovisne baze podataka: Kolokacijski rjeÄnik suvremenoga slovenskoga jezika i automatski generiran Tezaurus modernoga slovenskoga jezika. Jedna od novina u obama rjeÄnicima koncept je responzivnoga rjeÄnika, koji ukljuÄuje masovnu podrÅ”ku. Digitalna rjeÄniÄka baza sadržava sve razine jeziÄnoga opisa: morfoloÅ”ku nadograÄenu Sloleksom, izraznu s opisom konstrukcija viÅ”erjeÄnih jedinica te sintaktiÄku s formalizacijom modela glagolskih valencija. Svaka od postojeÄih baza podataka sadržava specifiÄne jeziÄne podatke koji Äe biti ukljuÄeni u sveobuhvatnu Slovensku digitalnu rjeÄniÄku bazu podataka, koja Äe sadržavati temeljni jezikoslovni opis slovenskoga jezika Äiji korisnici mogu biti ljudi i strojevi
Leksikalna baza za slovenÅ”Äino: komu, zakaj in kako (naprej)?
This article describes the guidelines in the formation of the Slovenian lexical
database, especially the issue of various users and the types and manners
of structuring lexical and grammatical information in this database. Special
emphasis is placed on questions dealing with the scope and selection of lexical units and the arrangement of lexical and grammatical information, while taking into account the premise that information in the lexical database is primarily intended for web applications and modern electronic media.V prispevku so opisane smernice pri oblikovanju leksikalne baze za slovenÅ”Äino, zlasti vpraÅ”anje razliÄnih uporabnikov ter vrste in naÄina strukturiranja leksikalno-slovniÄnih podatkov v njej. Posebej so izpostavljene dileme, ki zadevajo doloÄitev obsega in izbora leksikalnih enot ter razporeditev leksikalno-slovniÄnih podatkov ob upoÅ”tevanju predpostavke, da bodo podatki v leksikalni bazi za slovenÅ”Äino namenjeni primarno spletnim aplikacijam in sodobnim elektronskim medijem
Slovar sodobnega slovenskega jezika: leksikografska tradicija in/ali inovacija
Ko je bil konec maja 2013 objavljen Predlog za izdelavo Slovarja sodobnega slovenskega jezika, se je tako na strokovnih forumih kot v medijih razvila debata o tem, ali naj novi slovar slovenskega jezika sledi leksikografski tradiciji, kot se je oblikovala s Slovarjem slovenskega knjižnega jezika, ali naj se od te tradicije oddalji. Ker so se ob tem oblikovali razliÄni pogledi na razumevanje slovarske tradicije kot tudi na vkljuÄevanje sodobnih slovarskih praks, želimo v prispevku na podlagi analize zasnove SSKJ in SNB ter s prispevki, ki se kakorkoli nanaÅ”ajo na koncept bodoÄega slovarja slovenskega jezika, ugotoviti, katere elemente leksikografske teorije in prakse lahko pojmujemo kot tradicionalne ter katere so predlagane novosti v slovenski leksikografiji. Vzporedno predlagamo tudi zasnovo novega slovarja v kljuÄnih segmentih, tj. z vidika uporabnika, medija in uporabe jezikovnotehnoloÅ”kega znanja, ki bi zadostila opisu sodobnega slovenskega jezika, ki kar v najveÄji meri zadovoljuje potrebe jezikovne skupnosti v danaÅ”njem Äasu in okoliÅ”Äinah
Temeljne prvine zasnove frazeoloŔkega slovarja
Z analitiÄno-sintetiÄno metodo primerjanja slovarskih reÅ”itev v frazeoloÅ”kih slovarjih je mogoÄe izloÄiti prvine slovarske zasnove, ki jih predvideva celovit slovarski opis frazemske enote. SpecifiÄnost zasnove frazeoloÅ”kega slovarja, kot je prikazana v Älanku, upoÅ”teva povezanost frazeoloÅ”kega in frazeografskega sistema. Pregled prvin slovarskih zasnov znotraj posameznih segmentov slovarskega opisa nakazuje možne reÅ”itve tudi za frazeoloÅ”ki slovar slovenskega jezika
Stalne besedne zveze v slovenÅ”Äini
Osrednji predmet opazovanja v knjigi Stalne besedne zveze v slovenÅ”Äini ā korpusni pristop so leksikalne enote, ki so praviloma sestavljene iz veÄ kot ene besede, poseben poudarek pa je namenjen njihovi umestitvi v sodobni slovenski leksikalni fond na podlagi empiriÄne analize jezikovnih podatkov, pridobljenih iz slovenskih referenÄnih elektronskih besedilnih korpusov FIDA in FidaPLUS. Omenjeni pristop opazuje jezik izkljuÄno na podlagi realnih besedil, ki tvorijo diskurzni univerzum, in so zajeta v konkretni besedilni korpus. Pristop k leksikalnemu opisu slovenskega jezika na tej podlagi ponuja v slovenistiÄnem jezikoslovju novo opazovalno izhodiÅ”Äe tako glede kakovosti in koliÄine jezikovnih podatkov kot tudi glede metodologije jezikoslovne analize. Bistvena posledica takega pristopa se kaže v brisanju mej med eno- in veÄbesednimi leksikalnimi enotami ter v razÅ”iritvi frazeoloÅ”ke problematike ne samo na raven leksikologije paÄ pa tudi skladnje in besediloslovja
Uvodnik
S prvo Å”tevilko drugega letnika revija SlovenÅ”Äina 2.0: empiriÄne, aplikativne in interdisciplinarne raziskave, ki jo tisti, ki nam je že domaÄa, na kratko imenujemo SLO 2.0, utrjuje svojo osrednjo vlogo na podroÄju prikaza rezultatov raziskav slovenskega in drugih jezikov, ki združujejo empiriÄni ter interdisciplinarni, zlasti pa jezikovnotehnoloÅ”ki pristop in aplikativno naravnanost. Z izidom Å”tevilke 1 (2014) pa v slovenistiÄno znanstveno periodiko prinaÅ”amo Å”e eno novost: sprotno objavljanje
Uvodnik
Digitalizirani jezikovni viri, procesiranje naravnega jezika, korpusne analizeĀ slovniÄnih in drugih jezikovnih pojavov, rudarjenje besedil, oznaÄevalniki,Ā luÅ”Äilniki, leksikografska orodja, sinteza govora, strojno prevajanje, avatarskiĀ sogovorci, pametne hiÅ”e ... Skupna toÄka: jezik
Defining collocation for Slovenian lexical resources
In this paper, we define the notion of collocation for the purpose of its use in machine-readable language resources, which will be used in the creation of electronic dictionaries and language applications for Slovene. Based on theoretical and lexicographically-driven studies we define collocation as a lexical phenomenon, defined by three key aspects: statistical, syntactic, and semantic. We take lexicographic relevance as a point of departure for defining collocations within the typology of word combinations, as well as for distinguishing them from free combinations. Free combinations are (frequent) syntactically valid word combinations without lexicographic value and consequently there is no need for the description of their meaning, or syntactic role. Next, we distinguish collocations from all multiword lexical units (compounds, phraseological units and lexico-grammatical units) using the lexicographic view that multiword lexical units, whose meaning is not a sum of its parts, require a description of their meaning whereas collocations do not. In the final part, we return to the three aspects of collocation and their role in automatic extraction of collocational information from corpora. Semantic criterion or dictionary relevance of extracted collocations has particularly exposed the problem of semantically broad collocates such as certain types of adverbs, adjectives and verbs, and word which feature in different syntactic roles (e.g. pronouns and adjuncts). We discuss a particular issue of collocations related to proper names and the decisions about their inclusion into the dictionary based on the evaluation of lexicographers
- ā¦