346 research outputs found

    Lexikos at eighteen: an analysis

    Get PDF
    At eighteen, Lexikos became a major player in the field of linguistics, by being awarded an Impact Factor. This article presents a double analysis of the foundation that led to this success. On the one hand a thorough statistical study is undertaken with regard to all contributors and their contributions to Lexikos. To this end a metadata database was designed, with the aim to answer the question: 'Who publishes what type of material from where and when?' On the other hand a content analysis is carried out which focuses on the actual topics (i.e. 'keywords') in Lexikos. To this end an all-inclusive text corpus containing all the Lexikos material was built, with the aim to answer the question: 'What are the major trends in Lexikos?

    Techniques for Automatic Normalization of Orthographically Variant Yiddish Texts

    Full text link
    Yiddish is characterized by a multitude of orthographic systems. A number of approaches to automatic normalization of variant orthography have been explored for the processing of historic texts of languages whose orthography has since been standardized. However, these approaches have not yet been applied to Yiddish. Using a manually normalized set of 16 Yiddish documents as a training and test corpus, four techniques for automatic normalization were compared: a hand-crafted set of transformation rules, an off-the-shelf spell checker, edit distance minimization with manually set weights, and edit distance minimization with weights learned through a training set. Performance was evaluated by calculating the proportion of correctly normalized words in a test set, and by measuring precision and recall in a test of information retrieval. For the given test corpus, normalization by minimization of edit distance with multi-character edit operations and learned weights was found to perform best in all tests

    Lessons from free-range language

    Get PDF
    Synopsis: Current research in grammatical analysis and sociolinguistics points to two core characteristics of language that seem incommensurable at first sight: (1) research on linguistic structure indicates internal organisation and coherence, and the workings and interactions of distinct grammatical systems, but (2) sociolinguistic research suggests that language borders and bound ‘languages’ are counterfactual social constructs that cannot capture the diversity and fluidity of actual language use. This seems to constitute something like a “quantum-linguistic” paradox: language systems aren’t real (they are just ideological constructions), but at the same time, they are a reflection of actual structure. This book shows how this paradox can be resolved through an architecture that allows for grammatical systems without presupposing language borders: this architecture puts communicative situations, rather than languages, at the core of linguistic systematicity, while named languages are captured as optional sociolinguistic indices. The approach builds on insights from “free-range” language, a metaphor for language in settings that are less confined by monoglossic ideologies. The author looks at four different kinds of settings: urban markets, heritage language settings, multiethnic adolescent peer-groups, and digital social media. Central lessons to be learned from such free-range language settings are: (1) communicative situations support linguistic differentiation and can thus be the basis for fluid registers; (2) grammatical systematicity is grounded in communicative situations and does not require bound languages and linguistic borders; (3) named ‘languages’ can emerge as social indices signalling belonging, but this is an optional, not a necessary development

    Romani Language Dictionaries: (1755-2019): An Annotated Critical Bibliography

    Get PDF
    The purpose of this work is to present an up-to-date bibliography of monographic dictionaries and selected wordlists that record the lexicological wealth of the many varieties of the Romani language. While salient features of each dictionary will be discussed, the bibliography is not intended to offer a comprehensive book review of each dictionary. It is also not within scope of the bibliography to present a detailed overview of the Romani language and its dialects per se, as this has been covered well in myriad other resources, including in the additional information found in many of the dictionaries cited in this bibliography. I will, however, give a brief overview of certain features of Romani insofar as this will help understand the organization of the bibliography and clarify some of the features of the works which I have chosen to highlight in the annotations. This bibliography is intended primarily for scholars and students interested in Romani from the linguistic perspective to aid in identifying appropriate dictionaries for their research. It can also be of use to libraries in assessing their collection to determine if they have key lexicological resources for those researching Romani

    Contemporary research in minoritized and diaspora languages of Europe

    Get PDF
    Synopsis: This volume provides a collection of research reports on multilingualism and language contact ranging from Romance, to Germanic, Greco and Slavic languages in situations of contact and diaspora. Most of the contributions are empirically-oriented studies presenting first-hand data based on original fieldwork, and a few focus directly on the methodological issues in such research. Owing to the multifaceted nature of contact and diaspora phenomena (e.g. the intrinsic transnational essence of contact and diaspora, and the associated interplay between majority and minoritized languages and multilingual practices in different contact settings, contact-induced language change, and issues relating to convergence) the disciplinary scope is broad, and includes ethnography, qualitative and quantitative sociolinguistics, formal linguistics, descriptive linguistics, contact linguistics, historical linguistics, and language acquisition. Case studies are drawn from Italo-Romance varieties in the Americas, Spanish-Nahuatl contact, Castellano Andino, Greko/Griko in Southern Italy, Yiddish in Anglophone communities, Frisian in the Netherlands, Wymysiöryƛ in Poland, Sorbian in Germany, and Pomeranian and Zeelandic Flemish in Brazil

    Lexikos at Eighteen: An Analysis

    Get PDF
    <p>Abstract: At eighteen, Lexikos became a major player in the field of linguistics, by being awarded an Im-pact Factor. This article presents a double analysis of the foundation that led to this success. On the one hand a thorough statistical study is undertaken with regard to all contributors and their contributions to Lexikos. To this end a metadata database was designed, with the aim to answer the question: 'Who publishes what type of material from where and when?' On the other hand a content analysis is carried out which focuses on the actual topics (i.e. 'keywords') in Lexikos. To this end an all-inclusive text corpus containing all the Lexikos material was built, with the aim to answer the question: 'What are the major trends in Lexikos?'</p><p>Keywords: LEXIKOS, LEXICOGRAPHY, METALEXICOGRAPHY, DICTIONARIES, LEXICOGRAPHERS,METADATA DATABASE, TEXT CORPUS, CONTRIBUTIONS, CONTRIBUTORS,AFFILIATIONS, STATISTICS, TRENDS, ENGLISH, AFRIKAANS, BANTU</p><p>Samenvatting: Lexikos op achttienjarige leeftijd: Een analyse. Op achttienjarigeleeftijd werd Lexikos een speler van wereldformaat binnen de linguĂŻstiek, door het behalenvan een Impact Factor. Dit artikel stelt een dubbele analyse voor van de grondslag die tot ditsucces leidde. Aan de ene kant wordt een gedetailleerde statistische studie ondernomen met betrekkingtot alle auteurs en hun bijdragen tot Lexikos. Daarvoor werd een metadata-databasis ontworpen,waarmee de volgende vraag wordt beantwoord: 'Wie publiceert welk soort materiaal, vanwaaruit en wanneer?' Aan de andere kant wordt een inhoudsanalyse uitgevoerd die zich toespitstop de feitelijke onderwerpen (m.n. 'sleutelwoorden') in Lexikos. Daarvoor werd een alles-inclusieftekstcorpus met al het Lexikos materiaal gebouwd, waarmee de volgende vraag wordt beantwoord:'Wat zijn de voornaamste trends in Lexikos?'</p><p>Sleutelwoorden: LEXIKOS, LEXICOGRAFIE, METALEXICOGRAFIE, WOORDENBOEKEN,LEXICOGRAFEN, METADATA-DATABASIS, TEKSTCORPUS, BIJDRAGEN, AUTEURS,AFFILIATIES, STATISTIEKEN, TRENDS, ENGELS, AFRIKAANS, BANTOE</p&gt

    Reflexive constructions in the world's languages

    Get PDF
    Synopsis: This landmark publication brings together 28 papers on reflexive constructions in languages from all continents, representing very diverse language types. While reflexive constructions have been discussed in the past from a variety of angles, this is the first edited volume of its kind. All the chapters are based on original data, and they are broadly comparable through a common terminological framework. The volume opens with two introductory chapters by the editors that set the stage and lay out the main comparative concepts, and it concludes with a chapter presenting generalizations on the basis of the studies of individual languages

    Graphemic Normalization of the Perso-Arabic Script

    Full text link
    Since its original appearance in 1991, the Perso-Arabic script representation in Unicode has grown from 169 to over 440 atomic isolated characters spread over several code pages representing standard letters, various diacritics and punctuation for the original Arabic and numerous other regional orthographic traditions. This paper documents the challenges that Perso-Arabic presents beyond the best-documented languages, such as Arabic and Persian, building on earlier work by the expert community. We particularly focus on the situation in natural language processing (NLP), which is affected by multiple, often neglected, issues such as the use of visually ambiguous yet canonically nonequivalent letters and the mixing of letters from different orthographies. Among the contributing conflating factors are the lack of input methods, the instability of modern orthographies, insufficient literacy, and loss or lack of orthographic tradition. We evaluate the effects of script normalization on eight languages from diverse language families in the Perso-Arabic script diaspora on machine translation and statistical language modeling tasks. Our results indicate statistically significant improvements in performance in most conditions for all the languages considered when normalization is applied. We argue that better understanding and representation of Perso-Arabic script variation within regional orthographic traditions, where those are present, is crucial for further progress of modern computational NLP techniques especially for languages with a paucity of resources.Comment: Pre-print to appear in the Proceedings of Grapholinguistics in the 21st Century (G21C), 2022. Telecom Paris, Palaiseau, France, June 8-10, 2022. 41 pages, 38 tables, 3 figure

    The \u27Schizoid\u27 Nature of Modern Hebrew Linguistics: A Contact Language in Search of a Genetic Past

    Get PDF
    This conflict in views is the main issue of this paper. It involves an enduring tension between synchrony and diachrony which has characterized almost all analysis of the Modern Hebrew revival. This tension has made Modern Hebrew, especially the spoken variety of native Israeli Jews, one of the most fascinating objects of study for both linguists and non-linguists, who have explored Modern Hebrew to express both highly conventional and highly unorthodox opinions regarding its character. Some consider it the direct descendent of an ongoing linguistic legacy, transcending certain principles of linguistic behavior (e.g. Tur-Sinai 1960). Others vehemently assert its autonomy from Hebrews past, stressing its uniqueness exclusively in structural linguistic terms (e.g. Rosen 1956). And most intriguingly, some refine the finer points of both views to posit rather unorthodox facts regarding the nature of Modern Hebrew (e.g. Wexler 1990b). How is it that a single language, covering so small a geographical area and used only so recently by native speakers, whose internal past and external history are so well-documented, has been so divergently analyzed

    Issues in Rusyn language standardisation

    Get PDF
    This thesis is an examination of the factors which have led to the standardisation of several variants of the Rusyn language in central and eastern Europe since 1989. It includes an assessment of aspects of the linguistic and extra-linguistic language planning activities carried out within and between the different Rusyn standard languages. The thesis considers the development of Rusyn standard languages with particular focus on those created for the Rusyns of the Preƥov Region of Slovakia and the Lemkos of Poland, with reference to the language situation in the Transcarpathian Region of Ukraine and that of Vojvodina Rusyn in Serbia and Croatia. It also considers factors which have facilitated and militated against the creation of standard languages in the regions concerned and sets the development of Rusyn standardisation in the context of the development of regional and minority languages elsewhere and as an element of identity construction and assertion. A study is made of the prospects for the so-called Rusyn koiné, an auxiliary standard proposed for use across all Rusyn groups
    • 

    corecore