45 research outputs found

    Dependency Based Bilingual word Embeddings without word alignment

    Get PDF
    In this work, we trained different bilingual word embeddings models without word alignments (BilBOWA) using linear Bag-of-words contexts and dependency-based contexts. BilBOWA embedding models learn distributed representations of words by jointly optimizing a monolingual and a bilingual objective. Including dependency features in the monolingual objective, improves the accuracy of learning bilingual word embeddings up to 6% points in English-Spanish (En-Es) and up to 2.5% points in English-German (En-De) language pairs in word translation task compared to the baseline model. However, using these dependency features in both monolingual and bilingual objectives does not lead to any improvement in the En-Es language pair and only shows minor improvement for En-De. Moreover, our results provide evidence that using dependency features in bilingual word embeddings has a different effect based on syntactic and sentence structure similarity of the language pair

    Cross Lingual Sentiment Analysis: A Clustering-Based Bee Colony Instance Selection and Target-Based Feature Weighting Approach

    Get PDF
    The lack of sentiment resources in poor resource languages poses challenges for the sentiment analysis in which machine learning is involved. Cross-lingual and semi-supervised learning approaches have been deployed to represent the most common ways that can overcome this issue. However, performance of the existing methods degrades due to the poor quality of translated resources, data sparseness and more specifically, language divergence. An integrated learning model that uses a semi-supervised and an ensembled model while utilizing the available sentiment resources to tackle language divergence related issues is proposed. Additionally, to reduce the impact of translation errors and handle instance selection problem, we propose a clustering-based bee-colony-sample selection method for the optimal selection of most distinguishing features representing the target data. To evaluate the proposed model, various experiments are conducted employing an English-Arabic cross-lingual data set. Simulations results demonstrate that the proposed model outperforms the baseline approaches in terms of classification performances. Furthermore, the statistical outcomes indicate the advantages of the proposed training data sampling and target-based feature selection to reduce the negative effect of translation errors. These results highlight the fact that the proposed approach achieves a performance that is close to in-language supervised models

    Dependency-based Bilingual Word Embeddings and Neural Machine Translation

    Get PDF
    Bilingual word embeddings, which represent lexicons from various languages in a common embedding space, are critical for facilitating semantic and knowledge trans- fers in a wide range of cross-lingual NLP applications. The significance of learning bilingual word embedding representations in many Natural Language Processing (NLP) tasks motivates us to investigate the effect of many factors, including syntac- tical information, on the learning process for different languages with varying levels of structural complexity. By analysing the components that influence the learning process of bilingual word embeddings (BWEs), this thesis examines some factors for learning bilingual word embeddings effectively. Our findings in this thesis demon- strate that increasing the embedding size for language pairs has a positive impact on the learning process for BWEs. While sentence length depends on the language. Short sentences perform better than long ones in the En-ES experiment. However, by increasing the sentence, En-Ar and En-De experiment achieve improved model accuracy. Arabic segmentation, according to En-Ar experiments, is essential to the learning process for BWEs and can boost model accuracy by up to 10%. Incorporating dependency features into the learning process enhances the trained models performance and results in more improved BWEs in all language pairs. Finally, we investigated how the dependancy-based pretrained BWEs affected the neural machine translation (NMT) model. The findings indicate that in various MT evaluation matrices, the trained dependancy-based NMT models outperform the baseline NMT model

    En-Ar Bilingual word Embeddings without Word Alignment : Factors Effects

    Get PDF
    This paper introduces the first attempt to investigate morphological segmentation on En-Ar bilingual word embeddings using bilingual word embeddings model without word alignment (BilBOWA). We investigate the effect of sentence length and embedding size on the learning process. Our experiment shows that using the D3 segmentation scheme improves the accuracy of learning bilingual word embeddings up to 10 percentage points compared to the ATB and D0 schemes in all different training settings

    Sentiment analysis on film review in Gujarati language using machine learning

    Get PDF
    Opinion analysis is by a long shot most basic zone of characteristic language handling. It manages the portrayal of information to choose the motivation behind the wellspring of the content. The reason might be of a type of gratefulness (positive) or study (negative). This paper offers a correlation between the outcomes accomplished by applying the calculation arrangement using various classifiers for instance K-nearest neighbor and multinomial naive Bayes. These techniques are utilized to assess a significant assessment with either a positive remark or negative remark. The gathered information considered on the grounds of the extremity film datasets and an association with the results accessible proof has been created for a careful assessment. This paper investigates the word level count vectorizer and term frequency inverse document frequency (TF-IDF) influence on film sentiment analysis. We concluded that multinomial Naive Bayes (MNB) classier generate more accurate result using TF-IDF vectorizer compared to CountVectorizer, K-nearest-neighbors (KNN) classifier has the same accuracy result in case of TF-IDF and CountVectorizer

    An Historiographical Reading of the Founding of Canada's National Theatre School

    Get PDF
    On November 2, 1960, French director and teacher Michel Saint-Denis declared the National Theatre School of Canada (NTS)the nations first professional theatre training institutionopen, and the Canadian theatreits English and French traditionsentered a new stage of professional development. But how did it get there? This historiographical study of the NTS founding is the first thorough examination of the complex process through which the only bi-cultural, co-lingual school in Canada was established, from first inklings in the nineteenth century to its official opening in 1960. This dissertation utilizes Thomas Postlewaits four-part model of historiographical theory to explore and document the various contexts which helped to shape the ways in which the School was structured, operated, and received by the public at the time it opened. While the National Theatre School of Canada is clearly recognized as an important part of the professional Canadian theatre, it is argued here that the details of the Schools foundingeven nowremain contradictory, forcing the discussion to focus more on the results of the school after it officially opened rather than on the ideas which created it. After half a century, it seems time to articulate, at the very least, those founding debates, adding them to Canadas theatre history and giving them relevance in todays increasingly diverse Canada

    In Search of a Single Voice: The Politics of Form, Use and Belief in the Kernewek Language

    Get PDF
    This dissertation is based upon fieldwork performed between 2007 and 2011 in Cornwall, a region of Southwestern Britain notable for its ambiguous ethnic identity - caught between England and the Celtic nations - and its unique, revived Celtic language, Kernewek. During the course of the research, work focused upon the role of the language revival movement as a tool for ethnic identification: hardening boundaries, shoring up faltering communities and nationalist purification. However, the language movement is divided into three primary factions, which take differing approaches to the language, and to their corresponding language ideology based upon their relationship to Cornish identity. These relationships are based upon speakers\u27 sense of ethnic self as formed through class, kinship, linguistic self-perception, religious and political affiliations and place of birth and childhood. However, since the 2006 recognition of the language by the British states, all of these debates have become intensified due to pressure to standardize. This study examines specific examples including: teaching materials and pedagogical approaches in the language, debates over the minutiae of spelling, aesthetic sensibilities, and practices of the naming and renaming of people and places

    Augustanism in Scotland: the pastoral and the georgic in the work of Allan Ramsay and James Thomson

    Get PDF
    Contemporaries Allan Ramsay (1684-1758) and James Thomson (1700-1748) are two of Scotland’s most influential literary figures. Without the invention and impetus of these two writers, it is difficult to imagine how the work of Burns or Scott would have been possible. Ironically, the cultural legacy left by these two writers has, to a certain extent, been misunderstood by traditional criticism. Conventional criticism portrays Thomson as personifying a British literary identity while, conversely, Ramsay has been appropriated by Scottish nationalist criticism; for these critics Ramsay represents an ardent Scottish literary and cultural patriotism. While these critical constructions are not without justification and validity, the portrayal of Ramsay and Thomson as literary opposites operating within separate cultural and national spheres is both reductive and misleading. Recent scholarship, that of Mary Jane Scott for example, has attempted to repatriate Thomson into the Scottish literary canon whilst critics such as Gerard Carruthers and Carol McGuirk have begun to explore the extent of Ramsay’s Augustan and nuanced British literary identity. In the course of this dissertation I will build upon this line of research as a more fruitful and less limiting formulation for interpreting the creative output of Scottish writers following the Union of 1707. The term ‘Scottish Augustan’ has of course been used as a label for Scottish writers of this period in the past. I intend, however, to counter the often negative connotations of this label which render it a somewhat derogatory term for a group of Scottish writers who were contemporaneous with the English Augustans and who are perceived as mere imitators, and rather poor ones at that, of the literary styles and modes which were in literary currency in England during the period. In order to evaluate Ramsay's and Thomson's contribution to the Augustan milieu, this dissertation focuses on the way in which the pastoral and the georgic modes are manipulated in their corpora. By pursuing such a line of enquiry, I hope not to present an Anglocentric view of the period, in which Scottish writers are judged by the English Augustan norms and standards of the day, but to offer an alternative formulation whereby the creative contribution of Scottish writers becomes a significant factor in the development of a British literature in the wake of the Union of 1707
    corecore