20 research outputs found

    Building WordNet for Afaan Oromoo

    Get PDF
    WordNet is a lexical database which has many relations to disambiguate the sense of words for natural languages. From the WordNet relations synonyms and hyponym has major role for natural language processing and artificial intelligence applications. In this paper, word embedding (Word2Vec) and lexico-syntactic pattern (LSP) are developed to extract automatically synonyms and hyponyms respectively. For this study, the word embedding is evaluated on two specialized domain algorithms such as a continuous bag of words and Skip Gram algorithms and show superior results. Applying word embedding (Word2Vec) algorithms for Afaan Oromo texts has been registered 80.09% and 85.04% for the continuous bag of words and Skip Gram respectively. According to the result achieved in this study, the skip-gram algorithm does a better job for frequent pairs of words than a continuous bag of words. But, a continuous bag of words algorithm is faster while skip-gram is slower. A lexical syntactic pattern with the combination of Word2Vec and without Word2Vec is also evaluated using information retrieval evaluation metrics such as precision, recall and F-measure to extract hyponym relation from Afaan Oromoo texts. The precision, recall and F-measure have been registered by lexical syntactic patterns without the combination of Word2Vec is 66.73%, 72%, and 69.26% respectively and with the combination of Word2Vec 81.14%, 80.8%, and 81.1% have been registered for precision, recall and F-measure respectively. There are factors that could affect the accuracy of results: 1) the style of writer of Afaan Oromoo i.e. they write a noun phrase with many adjective to express the noun for the reader; and, 2) it is possible that some instances of the LSP are missed due to misspellings and other typographical errors. Keywords: Afaan Oromoo WordNet, Word embedding, Lexico syntactic patterns, Extraction of WordNet relations. DOI: 10.7176/CEIS/11-3-01 Publication date:May 31st 202

    Semantic Similarity Analysis for Paraphrase Identification in Arabic Texts

    Get PDF

    Leveraging analytics to produce compelling and profitable film content

    Get PDF
    Producing compelling film content profitably is a top priority to the long-term prosperity of the film industry. Advances in digital technologies, increasing availabilities of granular big data, rapid diffusion of analytic techniques, and intensified competition from user generated content and original content produced by Subscription Video on Demand (SVOD) platforms have created unparalleled needs and opportunities for film producers to leverage analytics in content production. Built upon the theories of value creation and film production, this article proposes a conceptual framework of key analytic techniques that film producers may engage throughout the production process, such as script analytics, talent analytics, and audience analytics. The article further synthesizes the state-of-the-art research on and applications of these analytics, discuss the prospect of leveraging analytics in film production, and suggest fruitful avenues for future research with important managerial implications

    XGV-BERT: Leveraging Contextualized Language Model and Graph Neural Network for Efficient Software Vulnerability Detection

    Full text link
    With the advancement of deep learning (DL) in various fields, there are many attempts to reveal software vulnerabilities by data-driven approach. Nonetheless, such existing works lack the effective representation that can retain the non-sequential semantic characteristics and contextual relationship of source code attributes. Hence, in this work, we propose XGV-BERT, a framework that combines the pre-trained CodeBERT model and Graph Neural Network (GCN) to detect software vulnerabilities. By jointly training the CodeBERT and GCN modules within XGV-BERT, the proposed model leverages the advantages of large-scale pre-training, harnessing vast raw data, and transfer learning by learning representations for training data through graph convolution. The research results demonstrate that the XGV-BERT method significantly improves vulnerability detection accuracy compared to two existing methods such as VulDeePecker and SySeVR. For the VulDeePecker dataset, XGV-BERT achieves an impressive F1-score of 97.5%, significantly outperforming VulDeePecker, which achieved an F1-score of 78.3%. Again, with the SySeVR dataset, XGV-BERT achieves an F1-score of 95.5%, surpassing the results of SySeVR with an F1-score of 83.5%

    Journalism Education and Fake News: A Literature Review

    Get PDF
    This article offers a scholarly review of the literature and research on journalism education and fake news from an international and a local (Croatian) perspective. The purpose of this paper is to examine the connection between the education for journalists as a scholarly and academic discipline (as well as a teaching practice) and the issues caused by fake news in the digital age of mass media. Based on a comprehensive critical conceptual analysis of the body of knowledge available on the subject, it was determined that there is a diverse discussion about the status of journalism education regarding fake news. In that context, fake news has so far been internationally researched from several angles – curriculum content, journalism students, journalism and media studies, journalism practice, media audience, etc. When addressing the issue of education of journalists and fake news, three streams can be singled out. The first and most voluminous one refers to the systematic formal or additional education regarding media and information literacy. The next one refers to various changes related to the higher education system for the education of journalists, but without any concrete propositions for system reconstruction or upgrading. The last one advocates providing additional professional education to employed journalists. From the local perspective, even though only two articles suggest journalism education as a solution for the problems caused by fake news, based on thorough research it can be concluded that fake news and journalism education are not yet topics of interest among communication scholars in Croatia

    Learner Corpus Research Meets Chinese as a Second Language Acquisition: Achievements and Challenges

    Get PDF
    The article sheds light on Chinese Learner Corpus Research (CLCR), emphasizing advances and lacks in this field. First, the paper describes the potentials of learner corpora in the investigation of learner language. The specificity of learner corpus data compared to learner data in Second Language Acquisition (SLA) studies will be also analyzed. Second, it provides an overview of Chinese learner corpus-based research and reviews existing L2 Chinese learner corpora. The paper highlights the lack of L2 Chinese learner corpora collecting data from Italian learners and discuss the challenges and the needs of compiling L2 Chinese corpora to conduct studies on the acquisition of L2 Chinese by learners whose L1 is other than English or an Asian language. This issue is addressed by taking into account recent projects integrating the LCR methodology with L2 Chinese studies for Italian-speaking learners. Finally, the paper encourages a concrete integration between the application of the methodological framework of LCR and the implementation of the theoretical interpretation of data of SLA research in the design of Chinese acquisitional studies

    ANNABELL, a cognitive system able to learn different languages

    Get PDF
    © 2018 The authors and IOS Press. All rights reserved. ANNABELL is a cognitive system entirely based on a large-scale neural architecture capable of learning to communicate through natural language starting from a tabula rasa condition. In order to shed light on the level of cognitive development required for language acquisition, in this work the model is used to study the acquisition of a new language, namely Albanian, in addition to English. The aim is to evaluate in a completely different and more complex language the ability of the model to acquire new information through several examples introduced in the new language and to process the acquired information, answering questions that require the use of different language patterns. The results show that the system is capable of learning cumulatively in either language and to develop a broad range of language processing functionalities in both languages
    corecore