364 research outputs found

    Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-10)

    Full text link

    Evaluation of taxonomic and neural embedding methods for calculating semantic similarity

    Full text link
    Modelling semantic similarity plays a fundamental role in lexical semantic applications. A natural way of calculating semantic similarity is to access handcrafted semantic networks, but similarity prediction can also be anticipated in a distributional vector space. Similarity calculation continues to be a challenging task, even with the latest breakthroughs in deep neural language models. We first examined popular methodologies in measuring taxonomic similarity, including edge-counting that solely employs semantic relations in a taxonomy, as well as the complex methods that estimate concept specificity. We further extrapolated three weighting factors in modelling taxonomic similarity. To study the distinct mechanisms between taxonomic and distributional similarity measures, we ran head-to-head comparisons of each measure with human similarity judgements from the perspectives of word frequency, polysemy degree and similarity intensity. Our findings suggest that without fine-tuning the uniform distance, taxonomic similarity measures can depend on the shortest path length as a prime factor to predict semantic similarity; in contrast to distributional semantics, edge-counting is free from sense distribution bias in use and can measure word similarity both literally and metaphorically; the synergy of retrofitting neural embeddings with concept relations in similarity prediction may indicate a new trend to leverage knowledge bases on transfer learning. It appears that a large gap still exists on computing semantic similarity among different ranges of word frequency, polysemous degree and similarity intensity

    Planning non existent dictionaries

    Get PDF
    In 2013, a conference entitled Planning non-existent dictionaries was held at the University of Lisbon. Scholars and lexicographers were invited to present and submit for discussion their research and practices, focusing on aspects that are traditionally perceived as shortcomings by dictionary makers and dictionary users. This book contains a collection of papers divided in three sections. The first section is devoted to heritage dictionaries, referring to lexicographic projects that aim to register all the documented words in a language, particularly those that can be described as early linguistic evidence. The second section is devoted to dictionaries for special purposes and it gathers papers that describe innovative lexicographic projects. The last section in this volume provides an overview of contemporary e- lexicography projects.publishe

    Similarity in conceptual analysis and concept as proper function

    Get PDF
    In the last decades, experimental philosophers have introduced the notion that conceptual analysis could use empirical evidence to back some of its claims. This opens up the possibility for the development of a corpus-based conceptual analysis. However, progress in this direction is contingent on the development of a proper account of concepts and corpus-based conceptual analysis itself that can be leveraged on textual data. In this essay, I address this problem through the question of similarity: how do we evaluate similarity between two concepts, as similarity relates to identity? After a survey of prominent conceptual analysis methods, I propose a cursory account of corpus-based conceptual analysis. Then I formulate the question of similarity, and argue for an account that is functionalist in Millikan's (1984) sense. In this process, I propose a new account of concept that bases itself on millikanian teleosemantics in order to account for concepts' contribution in discourse. I then illustrate its fruitfulness by showing how it enables accounts of concept presence detection in textual data, both automatically and by a human judge

    Unsupervised methods in multilingual and multimodal semantic modeling

    Get PDF
    In the first part of this project, independent component analysis has been applied to extract word clusters from two Farsi corpora. Both word-document and word-context matrices have been considered to extract such clusters. The application of ICA on the word-document matrices extracted from these two corpora led to the detection of syntagmatic word clusters, while the utilization of word-context matrix resulted in the extraction of both syntagmatic and paradigmatic word clusters. Furthermore, we have discussed some potential benefits of this automatically extracted thesaurus. In such a thesaurus, a word is defined by some other words without being connected to the outer physical objects. In order to fill such a gap, symbol grounding has been proposed by philosophers as a mechanism which might connect words to their physical referents. From their point of view, if words are properly connected to their referents, their meaning might be realized. Once this objective is achieved, a new promising horizon would open in the realm of artificial intelligence. In the second part of the project, we have offered a simple but novel method for grounding words based on the features coming from the visual modality. Firstly, indexical grounding is implemented. In this naïve symbol grounding method, a word is characterized using video indexes as its context. Secondly, such indexical word vectors have been normalized according to the features calculated for motion videos. This multimodal fusion has been referred to as the pattern grounding. In addition, the indexical word vectors have been normalized using some randomly generated data instead of the original motion features. This third case was called randomized grounding. These three cases of symbol grounding have been compared in terms of the performance of translation. Besides that, word clusters have been excerpted by comparing the vector distances and from the dendrograms generated using an agglomerative hierarchical clustering method. We have observed that pattern grounding exceled the indexical grounding in the translation of the motion annotated words, while randomized grounding has deteriorated the translation significantly. Moreover, pattern grounding culminated in the formation of clusters in which a word fit semantically to the other members, while using the indexical grounding, some of the closely related words dispersed into arbitrary clusters

    Experimentalism and Innovation in the Kurdish Short Story in Bahdinan Since 1991

    Get PDF
    Abstract Within the framework of experimentation and innovation in the short story, this study examines the most significant creative aspects of the Kurdish short story written in Kurmanji dialect in Bahdinan in Iraqi Kurdistan. A specific period was covered, starting in 1991, as this represented the genesis of a new era in Kurdish literature. Despite the short story having experienced a rapid renewal, there is still ongoing debate among scholars as to whether or not the creations in Bahdinan are modernist. Consequently, the current study was aimed at contributing to this debate, by assessing the experimental and innovative aspects of Kurdish short stories. Eight of the most experimental and innovative writers, whose works have played a crucial role during recent history, were chosen and their texts analysed within the frame of three phenomena of contemporary fiction, namely, mixing genre, intertextuality and the impact of memory of trauma events on the structure of the short story. The study of the notion of genre in the Kurdish short story in Bahdinan, has led to the discovery that many writers have explored the genre concept via the phenomenon of crossing generic boundaries as a mean to writing experimental texts. That is, their texts have been formed by a combination of the formulated conventions of more than one literary genre. This has been achieved through the employment of different strategies, such as the short story cycle, short-short story cycle and the combination of many scenes. As a consequence of their dealing with big topics and short texts, the majority of their texts can be placed between the totality of the novel or epic and the limitation of the short story. The examination of the phenomenon of intertextuality as an aspect of the contemporary short story has elicited that several authors have transposed pre-existent literary and religious heritage practices for new purposes, such as to criticise society and many of its taboos. This has involved, in addition to meeting aesthetic requirements, intertextuality being employed to avoid religious, social and political censorship. When tackling traumatic events, a number of Kurmanji writers have incorporated the influence of the nature of their memories into the structure of their texts. These texts are, on the whole, fragmented and presented new ways of narrating plots. This has been achieved through the adaption of various strategies, such as nightmares, dreams, repetition, images and scenes. According to the techniques that have been employed by Kurdish authors, their texts can be considered as ‘acting out’ or ‘working through’, while sometimes the two concepts are blurred. Despite Kurdish authors having presented both personal and collective trauma, they have placed greater emphasis on the psychological effects on individuals and society and on the fictional side rather than the factual historical context. Through exploring the three above mentioned literary phenomena in this study, a rich vein of experimentation and innovation in the Kurdish short story has been uncovered. Finally, Kurdish writers in Bahdinan, whilst drawing on historical events to experiment in their short story creations, have also taken inspiration from other nations’ literary forms, especially, Western modernism. However, what they have produced through their innovative works is a body of literature that succinctly addressed the historical and cultural particularities of the Kurdish people. In addition, these short stories illustrate complex relations between politics, Kurdish identity and experience and literature.Higher Education and Scientific Research, Iraq, Kurdistan regio

    Term-driven E-Commerce

    Get PDF
    Die Arbeit nimmt sich der textuellen Dimension des E-Commerce an. Grundlegende Hypothese ist die textuelle Gebundenheit von Information und Transaktion im Bereich des elektronischen Handels. Überall dort, wo Produkte und Dienstleistungen angeboten, nachgefragt, wahrgenommen und bewertet werden, kommen natürlichsprachige Ausdrücke zum Einsatz. Daraus resultiert ist zum einen, wie bedeutsam es ist, die Varianz textueller Beschreibungen im E-Commerce zu erfassen, zum anderen können die umfangreichen textuellen Ressourcen, die bei E-Commerce-Interaktionen anfallen, im Hinblick auf ein besseres Verständnis natürlicher Sprache herangezogen werden

    Aspects of multilingual storage, processing and retrieval

    Get PDF
    The reason for my long-term interest in multilinguality derives from the fact that, as I explain in Chapter One, a large proportion of language users and learners in the world are no longer monolingual, nor even bilingual, but rather multilingual. The spread of English as a lingua franca also contributed to thedevelopment of multilinguality around the world. The spread of multilinguality does not only relate to the natural setting of multilingual societies and mixed-marriages, but also to formal instruction contexts, where the introduction of at least two foreign languages has become an educational norm. Consequently, as teachers of EFL we are often faced with learners who are also acquiring/learning another foreign language and this is clearly reflected in the cross-linguistic influences we can observe in their language production, in terms of both positive/facilitative effects and interference. Shouldn’t their increased learning experience be harnessed in our teaching and in their learning practices
    • …
    corecore