100 research outputs found

    From language resources to language commentary: the interview with Ben Zimmer

    Get PDF
    Ben Zimmer is a linguist, lexicographer, and all-around word nut. He is the language columnist for The Wall Street Journal and former columnist for The Boston Globe and The New York Times Magazine. He is the recipient of the first ever Linguistics Journalism Award. Ben Zimmer has worked as the executive editor of Vocabulary.com and the Visual Thesaurus. He was also editor for American dictionaries at Oxford University Press and as a consultant to the Oxford English Dictionary. This interview took place at eLex 2017, a biannual conference on electronic lexicography, where Ben Zimmer gave a keynote talk titled »Defining the Digital Dictionary: How to Build More Useful Online Lexical References«. The interview was conducted by Dr. Iztok Kosem from the University of Ljubljana (Faculty of Arts and Centre for Language Resources and Technologies)

    Fran: pameten in intuitiven?

    Get PDF
    Slovarji so uporabnikom pogosto ponujeni (tudi) prek slovarskih portalov, ki združujejo različne slovarje in pogosto tudi ostale referenčne vire. Portali so odgovor na potrebe sodobnega slovarskega uporabnika, ki je vajen hkratnega dostopa do različnih vrst informacij na enem mestu. Slovarski portali so že nekaj časa prisotni tudi v slovenskem prostoru, vendar pa se je šele nedavno vzpostavljeni portal Fran Inštituta za slovenski jezik Frana Ramovša ZRC SAZU prvi osredotočil samo na (sicer lastne) enojezične slovarske vire in tako povzročil pomemben premik v rabi enojezičnih slovarjev pri nas. V prispevku najprej pregledamo nekatere tuje in slovenske slovarske portale, nato pa se osredotočimo na pregled ter evalvacijo portala Fran. Preverili smo način predstavljanja slovarskih informacij z vidika preglednosti in jasnosti, preizkusili različne možnosti iskanj ter ocenili funkcionalnosti portala in pomoč uporabnikom. Evalvacijo smo opravili tudi z vidika dognanj študij slovarskih uporabnikov. V zaključku prispevka povzamemo glavne ugotovitve in podamo razmisleke o pomenu portala za prihodnost slovenske leksikografije

    Slovenščina 2.0: “Leksikografija”

    Get PDF
    Na konferenci o elektronski leksikografiji eLex 2011 je potekala okrogla miza, na kateri so se razpravljalci ukvarjali z vprašanjem, ali bodo leta 2020 slovarji še obstajali. Prevladalo je splošno mnenje, da bodo slovarji še obstajali, vendar pa bodo drugačni od slovarjev, kot smo jih bili vajeni do sedaj, in da nekaterim od njih verjetno ne bomo več rekli slovarji. Poleg tega so se razpravljalci strinjali v predvidevanju, da bo spletni medij postal v leksikografiji vse bolj prevladujoč. Kasnejše dogajanje na leksikografskem področju je pokazalo, da se leksikografi vsega tega močno zavedajo, kar pa potrjuje tudi ustanovitev evropske mreže za elektronsko leksikografijo (ENeL). Danes se leksikografi ne ukvarjajo več z vprašanjem, ali bodo slovarji v prihodnje še obstajali, temveč bolj s tem, kako elektronske slovarje, zlasti spletne, zasnovati in izdelati, da bodo zadostili potrebam sodobnih uporabnikov

    V iskanju slovarsko relevantne kolokacije na primeru struktur s prislovi

    Get PDF
    V leksikogramatični raziskavi so predstavljeni rezultati analiz struktur s prislovi, ki so bile izvedene v okviru temeljnega raziskovalnega projekta Kolokacije kot temelj jezikovnega opisa: semantični in časovni vidiki (KOLOS; J6-8255). Na podlagi rezultatov predhodno izvedene pilotne množičenjske naloge, v kateri so označevalci jezikoslovci ocenjevali avtomatsko izluščene kolokacijske kandidate iz korpusa Gigafida in se opredeljevali do tega, kaj je in kaj ni slovarsko relevantna kolokacija, smo analizirali vse kolokacijsko produktivne strukture s prislovi. Namen analiziranja je bil opredeliti kolokacijo kot semantično relevantno sopojavitev dveh (ali več) besed ter s tem tudi razliko med slovarsko relevantnimi kolokacijami in statistično prepoznanimi oz. šibkejšimi kolokacijami, ki ne opravljajo semantične funkcije in so posledično nerelevantne za kolokacijski slovar. Analize struktur s prislovi so pokazale, da se je o semantični relevantnosti in slovarski vključenosti pri posameznih primerih kolokacijskih kandidatov ali tipih kolokatorjev treba odločati na ravni posamezne strukture. Tovrstni primer so prislovi, ki lahko nastopajo v vlogi intenzifikatorja (tip kar pošteno [načeti]) ali pa semantično manj relevantni vlogi poudarnosti oz. členkovnosti (tip kar prekiniti). Podobne jezikoslovne obravnave so potrebne širše skupine števniškosti, kot je kratnost ali zaporedje (zaporednostni prislovi), ki jih zaradi raznolike semantične relevantnosti ne moremo strukturno omejiti (četrtič doktorirati proti stokrat povedati). Podatki, pridobljeni na podlagi opravljenih analiz, bodo omogočali podrobnejše ali nadaljnje analize, predvsem pa celovit opis vsake kolokacijske strukture in njene kolokativnosti. Na podlagi identificiranih napak zaradi avtomatskega označevanja strukture pa bo mogoče nadgraditi obstoječe vzorce za luščenje in predvsem izboljšati avtomatsko luščenje za problematične strukture. Podatki bodo zelo uporabni tudi pri nadaljnjem vključevanju in obravnavi novih struktur, prvotno izločenih zaradi precejšnjega šuma. Vse ugotovitve bo mogoče implementirati v leksikografski delotok in na ta način izboljšati podatke (stopenjskost gesel) v slovarju. Učno množico s 17.576 kandidati bo mogoče uporabiti tudi v drugih dejavnostih projekta KOLOS: za uvrščanje kolokatorjev v gruče, primerjavo sopomenk s kolokacijami in nenazadnje za proučevanje kolokacijskih trendov skozi čas

    Editorial

    Get PDF

    Key word analysis of discourses in Slovene speech : differences and similarities

    Get PDF
    One of the aspects of speech that remains under-researched is the internal variety of speech, i.e. the differences and similarities between different types of speech. This paper aims to contribute to this research by making the comparison between different discourses of Slovene spontaneous speech, focusing on the use of vocabulary. The key word analysis (Scott, 1997), conducted on a million‑word corpus of spoken Slovene, was used to identify lexical items and groups of lexical items typical of a particular spoken discourse, or common to different types of spoken discourse. The results indicate that the presence or absence of a particular word class in the key word list can be a good indicator of a type of spoken discourse, or discourses.

    Devising a Sketch Grammar for Academic Portuguese

    Get PDF
    This paper presents the development of a new sketch grammar designed specifically for CoPEP, a newly compiled 40-million corpus comprising texts from academic journals, tagged with Freeling v3, the default tagger available in the Sketch Engine for corpora of Portuguese. We first provide an overview and evaluation of existing sketch grammars for Portuguese, followed by a detailed description of the development of a new sketch grammar, and the presentation of some of the problems encountered. We conclude by summarizing the main findings, highlighting important implications, and offering suggestions for further improvement of the sketch grammar. More accurate and varied word sketch results than those offered by the current default sketch grammar indicate that our sketch grammar can be used for advanced lexicographic tasks such as automatic extraction of lexical data from CoPEP, the methodology of knowledge acquisition planned for the compilation of the proposed dictionary of Portuguese for university students. Moreover, this new sketch grammar can be used with any other corpus of Portuguese tagged with Freeling v3, which makes it an important resource for lexicographic and corpus linguistic research of the Portuguese language

    Collocation ranking: frequency vs semantics

    Get PDF
    Collocations play a very important role in language description, especially in identifying meanings of words. Modern lexicography’s inevitable part of meaning deduction are lists of collocates ranked by some statistical measurement. In the paper, we present a comparison between two approaches to the ranking of collocates: (a) the logDice method, which is dominantly used and frequency-based, and (b) the fastText word embeddings method, which is new and semantic-based. The comparison was made on two Slovene datasets, one representing general language headwords and their collocates, and the other representing headwords and their collocates extracted from a language for special purposes corpus. In the experiment, two methods were used: for the quantitative part of the evaluation, we used supervised machine learning with the area-under-the-curve (AUC) ROC score and support-vector machines (SVMs) algorithm, and in the qualitative part the ranking results of the two methods were evaluated by lexicographers. The results were somewhat inconsistent; while the quantitative evaluation confirmed that the machine-learning-based approach produced better collocate ranking results than the frequency-based one, lexicographers in most cases considered the listings of collocates of both methods very similar

    Defining collocation for Slovenian lexical resources

    Get PDF
    In this paper, we define the notion of collocation for the purpose of its use in machine-readable language resources, which will be used in the creation of electronic dictionaries and language applications for Slovene. Based on theoretical and lexicographically-driven studies we define collocation as a lexical phenomenon, defined by three key aspects: statistical, syntactic, and semantic. We take lexicographic relevance as a point of departure for defining collocations within the typology of word combinations, as well as for distinguishing them from free combinations. Free combinations are (frequent) syntactically valid word combinations without lexicographic value and consequently there is no need for the description of their meaning, or syntactic role. Next, we distinguish collocations from all multiword lexical units (compounds, phraseological units and lexico-grammatical units) using the lexicographic view that multiword lexical units, whose meaning is not a sum of its parts, require a description of their meaning whereas collocations do not. In the final part, we return to the three aspects of collocation and their role in automatic extraction of collocational information from corpora. Semantic criterion or dictionary relevance of extracted collocations has particularly exposed the problem of semantically broad collocates such as certain types of adverbs, adjectives and verbs, and word which feature in different syntactic roles (e.g. pronouns and adjuncts). We discuss a particular issue of collocations related to proper names and the decisions about their inclusion into the dictionary based on the evaluation of lexicographers
    corecore