212 research outputs found

    Classification of Under-Resourced Language Documents Using English Ontology

    Get PDF
    Automatic documents classification is an important task due to the rapid growth of the number of electronic documents, which aims automatically assign the document to a predefined category based on its contents. The use of automatic document classification has been plays an important role in information extraction, summarization, text retrieval, question answering, e-mail spam detection, web page content filtering, automatic message routing , etc.Most existing methods and techniques in the field of document classification are keyword based, but due to lack of semantic consideration of this technique, it incurs low performance. In contrast, documents also be classified by taking their semantics using ontology as a knowledge base for classification; however, it is very challenging of building ontology with under-resourced language. Hence, this approach is only limited to resourced language (i.e. English) support. As a result, under-resourced language written documents are not benefited such ontology based classification approach. This paper describes the design of automatic document classification of under-resourced language written documents. In this work, we propose an approach that performs classification of under-resourced language written documents on top of English ontology. We used a bilingual dictionary with Part of Speech feature for word-by-word text translation to enable the classification of document without any language barrier. The design has a concept-mapping component, which uses lexical and semantic features to map the translated sense along the ontology concepts. Beside this, the design also has a categorization component, which determines a category of a given document based on weight of mapped concept. To evaluate the performance of the proposed approach 20-test documents for Amharic and Tigrinya and 15-test document for Afaan Oromo in each news category used. In order to observe the effect of incorporated features (i.e. lemma based index term selection, pre-processing strategies during concept mapping, lexical and semantics based concept mapping) five experimental techniques conducted. The experimental result indicated that the proposed approach with incorporation of all features and components achieved an average F-measure of 92.37%, 86.07% and 88.12% for Amharic, Afaan Oromo and Tigrinya documents respectively. Keywords: under-resourced language, Multilingual, Documents or text Classification, knowledge base, Ontology based text categorization, multilingual text classification, Ontology. DOI: 10.7176/CEIS/10-6-02 Publication date:July 31st 201

    Recognition of compound characters in Kannada language

    Get PDF
    Recognition of degraded printed compound Kannada characters is a challenging research problem. It has been verified experimentally that noise removal is an essential preprocessing step. Proposed are two methods for degraded Kannada character recognition problem. Method 1 is conventionally used histogram of oriented gradients (HOG) feature extraction for character recognition problem. Extracted features are transformed and reduced using principal component analysis (PCA) and classification performed. Various classifiers are experimented with. Simple compound character classification is satisfactory (more than 98% accuracy) with this method. However, the method does not perform well on other two compound types. Method 2 is deep convolutional neural networks (CNN) model for classification. This outperforms HOG features and classification. The highest classification accuracy is found as 98.8% for simple compound character classification. The performance of deep CNN is far better for other two compound types. Deep CNN turns out to better for pooled character classes

    Query by Example of Speaker Audio Signals using Power Spectrum and MFCCs

    Get PDF
    Search engine is the popular term for an information retrieval (IR) system. Typically, search engine can be based on full-text indexing. Changing the presentation from the text data to multimedia data types make an information retrieval process more complex such as a retrieval of image or sounds in large databases. This paper introduces the use of language and text independent speech as input queries in a large sound database by using Speaker identification algorithm. The method consists of 2 main processing first steps, we separate vocal and non-vocal identification after that vocal be used to speaker identification for audio query by speaker voice. For the speaker identification and audio query by process, we estimate the similarity of the example signal and the samples in the queried database by calculating the Euclidian distance between the Mel frequency cepstral coefficients (MFCC) and Energy spectrum of acoustic features. The simulations show that the good performance with a sustainable computational cost and obtained the average accuracy rate more than 90%

    Sorotan histeria massa remaja Muslim di Malaysia

    Get PDF
    Histeria merupakan permasalahan sosial masyarakat yang sering didengari berlaku dalam komuniti. Gejala histeria yang berlaku sama ada secara individu atau kumpulan menunjukkan terdapat tekanan dalam kelompok atau mangsa yang membawa kepada ledakan psikologi yang ekstrim dan di luar kawalan. Histeria yang berlaku khusus dalam kalangan remaja di Malaysia kebanyakannya terjadi secara beramai-ramai atau berkumpulan. Gejala ini dikenali sebagai histeria massa atau histeria epidemik iaitu cetusan histeria yang melibatkan sekumpulan individu yang berkongsi keadaan tertekan dan emosi yang saling berhubung antara satu sama lain. Justeru dengan melihat kepada senario yang berlaku artikel ini ditulis untuk mengupas isu histeria massa dan menyoroti fenomena histeria dalam kalangan remaja Muslim di Malaysia. Kupasan isu ini menggunakan kaedah analisis kandungan dengan cara meneliti dokumen dan artikel berkaitan untuk mengenal pasti gejala histeria massa dalam kalangan remaja. Rumusan perbincangan mendapati histeria massa dalam kalangan remaja di Malaysia kebanyakannya bercorak mass motor hysteria (histeria massa motor) dengan orientasi tingkah laku ceraian (dissociative), iaitu tingkah laku fizikal ekstrim berbanding di Barat yang sering berbentuk mass anxiety hysteria (histeria massa kebimbangan

    PCROD: Context Aware Role based Offensive Detection using NLP/ DL Approaches

    Get PDF
    With the increased use of social media many people misuse online platforms by uploading offensive content and sharing the same with vast audience. Here comes controlling of such offensive contents. In this work we concentrate on the issue of finding offensive text in social media. Existing offensive text detection systems treat weak pejoratives like ‘idiot‘ and extremely indecent pejoratives like ‘f***‘ as same as offensive irrespective of formal and informal contexts . In fact the weakly pejoratives in informal discussions among friends are casual and common which are not offensive but the same can be offensive when expressed in formal discussions. Crucial challenges to accomplish the task of role based offensive detection in text are i) considering the roles while classifying the text as offensive or not i) creating a contextual datasets including both formal and informal roles. To tackle the above mentioned challenges we develop deep neural network based model known as context aware role based offensive detection(CROD). We examine CROD on the manually created dataset that is collected from social networking sites. Results show that CROD gives better performance with RoBERTa with an accuracy of 94% while considering the context and role in data specifics
    corecore