45 research outputs found

    Building WordNet for Afaan Oromoo

    Get PDF
    WordNet is a lexical database which has many relations to disambiguate the sense of words for natural languages. From the WordNet relations synonyms and hyponym has major role for natural language processing and artificial intelligence applications. In this paper, word embedding (Word2Vec) and lexico-syntactic pattern (LSP) are developed to extract automatically synonyms and hyponyms respectively. For this study, the word embedding is evaluated on two specialized domain algorithms such as a continuous bag of words and Skip Gram algorithms and show superior results. Applying word embedding (Word2Vec) algorithms for Afaan Oromo texts has been registered 80.09% and 85.04% for the continuous bag of words and Skip Gram respectively. According to the result achieved in this study, the skip-gram algorithm does a better job for frequent pairs of words than a continuous bag of words. But, a continuous bag of words algorithm is faster while skip-gram is slower. A lexical syntactic pattern with the combination of Word2Vec and without Word2Vec is also evaluated using information retrieval evaluation metrics such as precision, recall and F-measure to extract hyponym relation from Afaan Oromoo texts. The precision, recall and F-measure have been registered by lexical syntactic patterns without the combination of Word2Vec is 66.73%, 72%, and 69.26% respectively and with the combination of Word2Vec 81.14%, 80.8%, and 81.1% have been registered for precision, recall and F-measure respectively. There are factors that could affect the accuracy of results: 1) the style of writer of Afaan Oromoo i.e. they write a noun phrase with many adjective to express the noun for the reader; and, 2) it is possible that some instances of the LSP are missed due to misspellings and other typographical errors. Keywords: Afaan Oromoo WordNet, Word embedding, Lexico syntactic patterns, Extraction of WordNet relations. DOI: 10.7176/CEIS/11-3-01 Publication date:May 31st 202

    Proceedings of FACTS-IR 2019

    Get PDF

    Copy-cat Bot for Narendra Modi which generates plausible new speeches in Modhi’s style using machine learning approaches

    Get PDF
    Many consequences in the human past can be traced back to that one well-written, well-presentedspeech. Speeches grasp the power to move nations or touch hearts as long as they are well-thought-out.This is why gaining the expertise of speech giving and speech writing is something we should all intent togain. A copy-cat bot is a model that can learn the writing and talking style of a certain person and replicateit. The main objective of this research study is to apply simple Recurrent Neural Network (RNN), LongShort-Term Memory (LSTM) Recurrent Neural Networks and Gated Recurrent Unit (GRU) in developinga speech generation system that deep learns one text and then generates new text. This research looks intothe generation of English transcripts of Narendra Modi’s speeches. The generated text using LSTM andGRU models has great potential. The output resulted by RNN is less realistic and pragmatic, but itsvariants LSTM and GRU performed better. Though the grammatical correctness and the sentencetransitions were absent in generated text of LSTM and GRU, but their output is somewhat logical ascompared to RNN. LSTM and GRU performed better as it generated more realistic text and training lossis small, perplexity is small and mean probability is high compared to RNN

    A Comparative Study of Word Embedding Techniques for SMS Spam Detection

    Get PDF
    The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.E-mail and SMS are the most popular communication tools used by businesses, organizations and educational institutions. Every day, people receive hundreds of messages which could be either spam or ham. Spam is any form of unsolicited, unwanted digital communication, usually sent out in bulk. Spam emails and SMS waste resources by unnecessarily flooding network lines and consuming storage space. Therefore, it is important to develop high accuracy spam detection models to effectively block spam messages, so as to optimize resources and protect users. Various word-embedding techniques such as Bag of Words (BOW), N-grams, TF-IDF, Word2Vec and Doc2Vec have been widely applied to NLP problems, however a comparative study of these techniques for SMS spam detection is currently lacking. Hence, in this paper, we provide a comparative analysis of these popular word embedding techniques for SMS spam detection by evaluating their performance on a publicly available ham and spam dataset. We investigate the performance of the word embedding techniques using 5 different machine learning classifiers i.e. Multinomial Naive Bayes (MNB), KNN, SVM, Random Forest and Extra Trees. Based on the dataset employed in the study, N-gram, BOW and TF-IDF with oversampling recorded the highest F1 scores of 0.99 for ham and 0.94 for spam

    Opinion Mining Terhadap Toko Online Di Media Sosial Menggunakan Algoritma Naïve Bayes (Studi Kasus: Akun Facebook Dugal Delivry)

    Get PDF
    The Internet era has had an impact in various sectors of human life. One is the economic sector. Economic transactions change from the traditional pattern (face to face) to online. The customer does not need to ask about the condition of an item to be purchased to a close friend or family, but simply by reviewing the product from the same buyer's comments. Products that get good reviews mean good quality. However, a problem arises if the comment data is very large and will make it difficult for customers to summarize the quality. Therefore, an automatic opinion mining system is required which can directly give conclusions about the quality of a product. This research makes an opinion mining system by applying the Naïve Bayes algorithm by taking a case study of facebook account Dugal Delivry. The measurement result with confusion matrix gives precision value of 88,89%, recall 80% and accuracy equal to 85%

    FVEC feature and Machine Learning Approach for Indonesian Opinion Mining on YouTube Comments

    Get PDF
    Mining opinions from Indonesian comments from YouTube videos are required to extract interesting patterns and valuable information from consumer feedback. Opinions can consist of a combination of sentiments and topics from comments. The features considered in the mining of opinion become one of the important keys to getting a quality opinion. This paper proposes to utilize FVEC and TF-IDF features to represent the comments. In addition, two popular machine learning approaches in the field of opinion mining, i.e., SVM and CNN, are explored separately to extract opinions in Indonesian comments of YouTube videos. The experimental results show that the use of FVEC features on SVM and CNN achieves a very significant effect on the quality of opinions obtained, in term of accuracy

    A Unique Multi-Agent-Based Approach for Enhanced QoS Resource Allocation in Multi Cloud Environment while Maintaining Minimized Energy and Maximize Revenue

    Get PDF
    The use of the multi-cloud data storage in one heterogeneous service is a polynimbus cloud strategy. Cloud computing uses a pay-as-you-go model to deliver services to a variety of end users. Customers can outsource daunting tasks to cloud data centres for processing and producing results, thanks to cloud computing. Cloud computing becomes the popular IT brand that provides various on-demand services over the internet. This technology is devoted to distributing computer and software resources. The proven usefulness of workflows to enforce relevant scientific achievements is the availability of data from advanced scientific tools. Scheduling algorithms are essential in order to automate these strenuous workflows efficiently. A number of new heuristics based on a Cloud resource model have been developed. The majority of these heuristic - based address QoS issues in one or two dimensions. The cloud computing technology offers a decentralised pool of services and resources with various models that are provided to the customers across the Internet in an on-demand, continuously distributed, and pay-per-use model. The key challenge we address in this paper is to maximise revenue while maintaining a minimum consumption of energy with an enhanced QoS for resource allocation. The obtained results from proposed method when compared with the existing state of art methods observed to be novel and better
    corecore