108 research outputs found

    Wiktionary Popularity from a Citation Analysis Point of View

    Get PDF
    Wiktionary is a collaborative web-based project to produce a free-content multilingual dictionary of terms in all natural languages and in a number of artificial languages. This study aims to provide an overview of the citation rate of Wiktionary. The primary source of data utilized in this study was the Scopus database. A REFERENCE search was conducted for indexed citations in the Scopus citation index, to find citations to Wiktionary in June 2023. Bibliometrix was used to design the keyword co-occurrence network of author-supplied keywords of documents citing Wiktionary. This study determines to what extent the Wiktionary is used and cited by papers indexed in Scopus. The total number of citations to Wiktionary from 2006 was 1,766 of which the highest number of citations is 161 in the year 2017 and the lowest number of citations is five in the year 2006. Wiktionary is highly cited by the subject areas of computer science, social sciences, and arts and humanities. The analysis of the language distribution of citations to Wiktionary indicates that the authors of citing papers used Wiktionary in different languages. However, the English language was the most dominant language of citing documents with 1,642 citations (i.e., 93%). Wiktionary was cited 1,766 times in Scopus by different languages (especially English, German, and French) in different countries (especially the U.S. with 335 citations, Germany with 295 citations, and France with 122 citations) mainly by the subject areas of computer science, social sciences, and arts and humanities. The significance of Wiktionary from a citation analysis point of view goes well beyond open access and enhanced opportunities for citation in linguistics, natural language processing systems, computational linguistics, semantics, and ontology

    Lexicographical Explorations of Neologisms in the Digital Age. Tracking New Words Online and Comparing Wiktionary Entries with ‘Traditional’ Dictionary Representations

    Get PDF
    This thesis explores neologisms in two distinct but related contexts: dictionaries and newspapers. Both present neologisms to the world, the former through information and elucidation of meaning, the latter through exemplification of real-world use and behaviour. The thesis first explores the representation of new words in a range of different dictionary types and formats, comparing entries from collaborative dictionary Wiktionary with those in expert-produced dictionaries, both those categorised here as ‘corpus-based’ and those termed ‘corpus-informed’. The former represent the most current of the expert-produced dictionary models, drawing on corpora for almost all of the data they include in an entry, while the latter draw on a mixture of old-style citations and Reading Programmes for much of their data, although this is supplemented with corpus information in some areas. The purpose of this part of the study was to compare degrees of comprehensiveness between the expert and collaborative dictionaries as demonstrated by the level and quality of detail included in new-word entries and in the dictionaries’ responsiveness to new words. This is done by comparing the number and quality of components that appear in a dictionary entry, both the standardised elements found in all of the dictionary types, such as the ‘headword’ at the top of the entry, to the non-standardised elements such as Discussion Forums found almost exclusively in Wiktionary. Wiktionary is found to provide more detailed entries on new words than the expert dictionaries, and to be generally more flexible, responding more quickly and effectively to neologisms. This is due in no small part to the way in which every time an entry or discussion is saved, the entire site updates, something which occurs for expert-produced online dictionaries once a quarter at best. The thesis further explores the way in which the same neologisms are used in four UK national newspapers across the course of their neologic life-cycle. In order to do this, a new methodology is devised for the collection of web-based data for context-rich, genre-specific corpus studies. This produced highly detailed, contextualised data that not only showed how certain newspapers are more likely to use less-well established neologisms (the Independent), while others have an overall stronger record of neologism usage across the 14 years of the study (The Guardian). As well as generating findings on the use and behaviour of neologisms in these newspapers, the manual methodology devised here is compared with a similar automated system, to assess which approach is more appropriate for use in this kind of context-rich database/corpus. The ability to accurately date each article in the study, using information which only the manual methods could accurately access, coupled with the more targeted approach it can offer by excluding unwanted texts from the outset made it the more appropriate approach

    Meilės skaitymui diskurso sklaida Europoje nuo XVIII a. iki XX a.

    Get PDF
    When reading in the 18th century became an activity common among an ever growing part of the European population and thereby a socially more visible cultural phenomenon, a need arose to create concepts and linguistic terms to refer to the new types of reading behavior. The new masses of readers did not seemingly have a rational goal for their reading, they just read for the sake of reading itself. Therefore, an explanation for their behavior was that they had a love of reading. To speak about people’s love of reading became a recurrent feature of the discourse on reading, a sub-discourse of its own, the discourse on the love of reading. The birthplace of the discourse may have been in 17th century France, wherefrom it was mediated into other countries and language areas. Even the contemporaries believed that the reading mania was contagious, and expected, feared, or hoped that something similar would happen in their own country. This caused debate and the use, even invention, of words and phrases that belong to the discourse on love of reading. Even the words and phrases used for speaking about reading migrated over linguistic, political, and social borders. The initiation, growth, and diffusion of the discourse can be followed by searching the typical words and phrases that indicate the presence of the discourse. Data were obtained from Google Books Ngram Viewer and national full-text databases of books and newspapers. A map representing the geographical diffusion of the discourse in Europe until the 20th century is constructed. The historical conditions for the diffusion of the discourse are discussed. Methodological problems are discussed and future research is outlined.Kai XVIII a. skaitymas tapo vis įprastesniu užsiėmimu tarp vis didėjančios daugumos europiečių, kartu virsdamas ir vis labiau socialiai akivaizdesniu kultūriniu fenomenu, atsirado poreikis kurti konceptus ir lingvistinius terminus, skirtus naujoms skaitymo rūšims apibūdinti. Naujosios skaitančiųjų masės, atrodo, užuot turėjusios racionalų skaitymo tikslą, skaitė dėl paties skaitymo. Todėl imta aiškinti, kad jų elgesio priežastis yra meilė skaitymui. Kalbėjimas apie žmonių meilę skaitymui tapo pasikartojančia skaitymo diskurso ypatybe, savaime tam tikru subdiskursu, diskursu apie meilę skaitymui. Šio diskurso atsiradimo vieta galėjo būti XVII a. Prancūzija, iš kurios tai pasklido į kitas šalis ir kalbų teritorijas. Diskurso amžininkai manė, kad skaitymo manija yra užkrečiama, ir tikėjosi, baiminosi ar vylėsi, kad kažkas panašaus gali atsirasti ir jų šalyse. Tai sukėlė diskusijas ir paskatino kūrimą bei vartoseną žodžių ir frazių, priskirtinų skaitymo meilės diskursui; net šie žodžiai ar frazės, skirti kalbėti apie skaitymą, kirto lingvistines, politines ir socialines sienas. Diskurso atsiradimą, plėtimąsi ir sklaidą galima sekti ieškant tipinių žodžių bei frazių, kurie patvirtintų diskurso egzistavimą. Šio tyrimo duomenys buvo gauti naudojantis Google Books Ngram Viewer įrankiu ir ieškant knygų bei dienraščių nacionalinėse tekstų duomenų bazėse. Taip pat buvo sukurtas žemėlapis, rodantis geografinę šio diskurso sklaidą Europoje iki XX a. Aptartos istorinės diskurso sklaidos aplinkybės, metodologinės problemos, apibrėžti ateities tyrimų tikslai

    Language technologies for a multilingual Europe

    Get PDF
    This volume of the series “Translation and Multilingual Natural Language Processing” includes most of the papers presented at the Workshop “Language Technology for a Multilingual Europe”, held at the University of Hamburg on September 27, 2011 in the framework of the conference GSCL 2011 with the topic “Multilingual Resources and Multilingual Applications”, along with several additional contributions. In addition to an overview article on Machine Translation and two contributions on the European initiatives META-NET and Multilingual Web, the volume includes six full research articles. Our intention with this workshop was to bring together various groups concerned with the umbrella topics of multilingualism and language technology, especially multilingual technologies. This encompassed, on the one hand, representatives from research and development in the field of language technologies, and, on the other hand, users from diverse areas such as, among others, industry, administration and funding agencies. The Workshop “Language Technology for a Multilingual Europe” was co-organised by the two GSCL working groups “Text Technology” and “Machine Translation” (http://gscl.info) as well as by META-NET (http://www.meta-net.eu)

    Wiktionary: The Metalexicographic and the Natural Language Processing Perspective

    Get PDF
    Dictionaries are the main reference works for our understanding of language. They are used by humans and likewise by computational methods. So far, the compilation of dictionaries has almost exclusively been the profession of expert lexicographers. The ease of collaboration on the Web and the rising initiatives of collecting open-licensed knowledge, such as in Wikipedia, caused a new type of dictionary that is voluntarily created by large communities of Web users. This collaborative construction approach presents a new paradigm for lexicography that poses new research questions to dictionary research on the one hand and provides a very valuable knowledge source for natural language processing applications on the other hand. The subject of our research is Wiktionary, which is currently the largest collaboratively constructed dictionary project. In the first part of this thesis, we study Wiktionary from the metalexicographic perspective. Metalexicography is the scientific study of lexicography including the analysis and criticism of dictionaries and lexicographic processes. To this end, we discuss three contributions related to this area of research: (i) We first provide a detailed analysis of Wiktionary and its various language editions and dictionary structures. (ii) We then analyze the collaborative construction process of Wiktionary. Our results show that the traditional phases of the lexicographic process do not apply well to Wiktionary, which is why we propose a novel process description that is based on the frequent and continual revision and discussion of the dictionary articles and the lexicographic instructions. (iii) We perform a large-scale quantitative comparison of Wiktionary and a number of other dictionaries regarding the covered languages, lexical entries, word senses, pragmatic labels, lexical relations, and translations. We conclude the metalexicographic perspective by finding that the collaborative Wiktionary is not an appropriate replacement for expert-built dictionaries due to its inconsistencies, quality flaws, one-fits-all-approach, and strong dependence on expert-built dictionaries. However, Wiktionary's rapid and continual growth, its high coverage of languages, newly coined words, domain-specific vocabulary and non-standard language varieties, as well as the kind of evidence based on the authors' intuition provide promising opportunities for both lexicography and natural language processing. In particular, we find that Wiktionary and expert-built wordnets and thesauri contain largely complementary entries. In the second part of the thesis, we study Wiktionary from the natural language processing perspective with the aim of making available its linguistic knowledge for computational applications. Such applications require vast amounts of structured data with high quality. Expert-built resources have been found to suffer from insufficient coverage and high construction and maintenance cost, whereas fully automatic extraction from corpora or the Web often yields resources of limited quality. Collaboratively built encyclopedias present a viable solution, but do not cover well linguistically oriented knowledge as it is found in dictionaries. That is why we propose extracting linguistic knowledge from Wiktionary, which we achieve by the following three main contributions: (i) We propose the novel multilingual ontology OntoWiktionary that is created by extracting and harmonizing the weakly structured dictionary articles in Wiktionary. A particular challenge in this process is the ambiguity of semantic relations and translations, which we resolve by automatic word sense disambiguation methods. (ii) We automatically align Wiktionary with WordNet 3.0 at the word sense level. The largely complementary information from the two dictionaries yields an aligned resource with higher coverage and an enriched representation of word senses. (iii) We represent Wiktionary according to the ISO standard Lexical Markup Framework, which we adapt to the peculiarities of collaborative dictionaries. This standardized representation is of great importance for fostering the interoperability of resources and hence the dissemination of Wiktionary-based research. To this end, our work presents a foundational step towards the large-scale integrated resource UBY, which facilitates a unified access to a number of standardized dictionaries by means of a shared web interface for human users and an application programming interface for natural language processing applications. A user can, in particular, switch between and combine information from Wiktionary and other dictionaries without completely changing the software. Our final resource and the accompanying datasets and software are publicly available and can be employed for multiple different natural language processing applications. It particularly fills the gap between the small expert-built wordnets and the large amount of encyclopedic knowledge from Wikipedia. We provide a survey of previous works utilizing Wiktionary, and we exemplify the usefulness of our work in two case studies on measuring verb similarity and detecting cross-lingual marketing blunders, which make use of our Wiktionary-based resource and the results of our metalexicographic study. We conclude the thesis by emphasizing the usefulness of collaborative dictionaries when being combined with expert-built resources, which bears much unused potential

    Language technologies for a multilingual Europe

    Get PDF
    This volume of the series “Translation and Multilingual Natural Language Processing” includes most of the papers presented at the Workshop “Language Technology for a Multilingual Europe”, held at the University of Hamburg on September 27, 2011 in the framework of the conference GSCL 2011 with the topic “Multilingual Resources and Multilingual Applications”, along with several additional contributions. In addition to an overview article on Machine Translation and two contributions on the European initiatives META-NET and Multilingual Web, the volume includes six full research articles. Our intention with this workshop was to bring together various groups concerned with the umbrella topics of multilingualism and language technology, especially multilingual technologies. This encompassed, on the one hand, representatives from research and development in the field of language technologies, and, on the other hand, users from diverse areas such as, among others, industry, administration and funding agencies. The Workshop “Language Technology for a Multilingual Europe” was co-organised by the two GSCL working groups “Text Technology” and “Machine Translation” (http://gscl.info) as well as by META-NET (http://www.meta-net.eu)

    DARIAH and the Benelux

    Get PDF

    Lexicography of coronavirus-related neologisms

    Get PDF
    This volume brings together contributions by international experts reflecting on Covid19-related neologisms and their lexicographic processing and representation. The papers analyze new words, new meanings of existing words, and new multiword units, where they come from, how they are transmitted (or differ) across languages, and how their use and meaning are reflected in dictionaries of all sorts. Recent trends in as many as ten languages are considered, including general and specialized language, monolingual as well as bilingual and printed as well as online dictionaries
    corecore