Search CORE

3 research outputs found

Pembangunan taksonomi dari teks Melayu menggunakan algoritma kunang-kunang pembahagi dua sama

Author: Abdul Razak Hamdan
Kurniawan Tri Basuki
Mohammed Azlan Mis
Mohd Zakree Ahmad Nazri
Salwani Abdullah
Publication venue: 'Penerbit Universiti Kebangsaan Malaysia (UKM Press)'
Publication date: 01/05/2018
Field of study

Taksonomi digunakan untuk menerangkan bahawa haiwan boleh dikelaskan kepada beberapa kategori seperti mamalia, reptilia dan buaya. Taksonomi biologi ini membolehkan persamaan, perbezaan malah hubungan antara haiwan ditakrifkan. Konsep dan fungsi taksonomi biologi ini ‘dipinjam’ oleh saintis dan jurutera Internet dalam membangunkan taksonomi untuk Internet. Seperti taksonomi biologi, membangunkan taksonomi untuk Internet secara manual bukanlah suatu yang mudah dan murah. Tugas ini mengambil masa dan memerlukan kepintaran dalam bidang. Justeru saintis komputer telah menggunakan pendekatan kecerdasan buatan untuk membangunkan taksonomi secara automatik dari teks. Algoritma pembelajaran mesin dicipta untuk membolehkan mesin ‘membaca’ teks dan kemudiannya ‘belajar’ untuk membina taksonomi dari konteks yang diperolehi dari teks. Objektif utama kajian ini adalah untuk membangunkan algoritma pembelajaran taksonomi dari Bahasa Melayu yang lebih berkesan dari algoritma sedia ada menggunakan kaedah penghibridan. Makalah ini menyiasat keberkesanan algoritma hibrid antara Algoritma Kunang-Kunang (AKK) dengan Algoritma K-Min Pembahagi Dua Sama (PDS) yang dipanggil Algoritma Kunang-Kunang Pembahagi Dua Sama (AKK-PD). Kajian empirikal ini mengumpul data dari eksperimen yang dijalankan ke atas tiga teks Bahasa Melayu dari bidang Fekah, Biokimia dan Teknologi Maklumat. Perbandingan data ketepatan berasaskan ukuran-F menunjukkan algoritma hybrid AKK-PD membina taksonomi yang lebih tepat berbanding menggunakan algoritma sedia ada. AKK-PD didapati lebih berkesan dan mantap berbanding algoritma bandingan apabila mengendalikan masalah kejarangan data . Walau bagaimanapun, kajian penerokaan ini perlu diteruskan kepada korpus Bahasa Melayu yang lebih besar untuk menguji ketahanan algoritma ini apabila berhadapan dengan korpus yang lebih umum sifatnya berbanding korpus teks yang teknikal dan menjurus kepada suatu bidang sahaja. Teknik pengekstrakan ciri berasakan kebergantungan sintaksis juga perlu dipertingkatkan kerana jelas teknik telah menghasilkan konteks yang mengalami masalah kejarangan data yang serius. Justeru memberi cabaran baharu untuk penyelidikan pembelajaran taksonomi dari teks Melayu

UKM Journal Article Repository

Hybrid fuzzy multi-objective particle swarm optimization for taxonomy extraction

Author: Syafrullah Mohammad
Publication venue
Publication date: 01/09/2015
Field of study

Ontology learning refers to an automatic extraction of ontology to produce the ontology learning layer cake which consists of five kinds of output: terms, concepts, taxonomy relations, non-taxonomy relations and axioms. Term extraction is a prerequisite for all aspects of ontology learning. It is the automatic mining of complete terms from the input document. Another important part of ontology is taxonomy, or the hierarchy of concepts. It presents a tree view of the ontology and shows the inheritance between subconcepts and superconcepts. In this research, two methods were proposed for improving the performance of the extraction result. The first method uses particle swarm optimization in order to optimize the weights of features. The advantage of particle swarm optimization is that it can calculate and adjust the weight of each feature according to the appropriate value, and here it is used to improve the performance of term and taxonomy extraction. The second method uses a hybrid technique that uses multi-objective particle swarm optimization and fuzzy systems that ensures that the membership functions and fuzzy system rule sets are optimized. The advantage of using a fuzzy system is that the imprecise and uncertain values of feature weights can be tolerated during the extraction process. This method is used to improve the performance of taxonomy extraction. In the term extraction experiment, five extracted features were used for each term from the document. These features were represented by feature vectors consisting of domain relevance, domain consensus, term cohesion, first occurrence and length of noun phrase. For taxonomy extraction, matching Hearst lexico-syntactic patterns in documents and the web, and hypernym information form WordNet were used as the features that represent each pair of terms from the texts. These two proposed methods are evaluated using a dataset that contains documents about tourism. For term extraction, the proposed method is compared with benchmark algorithms such as Term Frequency Inverse Document Frequency, Weirdness, Glossary Extraction and Term Extractor, using the precision performance evaluation measurement. For taxonomy extraction, the proposed methods are compared with benchmark methods of Feature-based and weighting by Support Vector Machine using the f-measure, precision and recall performance evaluation measurements. For the first method, the experiment results concluded that implementing particle swarm optimization in order to optimize the feature weights in terms and taxonomy extraction leads to improved accuracy of extraction result compared to the benchmark algorithms. For the second method, the results concluded that the hybrid technique that uses multi-objective particle swarm optimization and fuzzy systems leads to improved performance of taxonomy extraction results when compared to the benchmark methods, while adjusting the fuzzy membership function and keeping the number of fuzzy rules to a minimum number with a high degree of accuracy

Universiti Teknologi Malaysia Institutional Repository

A hybrid approach for learning concept hierarchy from Malay text using GAHC and immune network

Author: Abdullah Salwani
Abu Bakar Azuraliza
Ahmad Nazri Mohd. Zakree
Shamsuddin Siti Mariyam
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

The human immune system provides inspiration in the attempt of solving the knowledge acquisition bottleneck in developing ontology for semantic web application. In this paper, we proposed an extension to the Guided Agglomerative Hierarchical Clustering (GAHC) method that uses an Artificial Immune Network (AIN) algorithm to improve the process of automatically building and expanding the concept hierarchy. A small collection of Malay text is used from three different domains which are IT, Biochemistry and Fiqh to test the effectiveness of the proposed approach and also by comparing it with GAHC. The proposed approach consists of three stages: pre-processing, concept hierarchy induction using GAHC and concept hierarchy learning using AIN. To validate our approach, the automatically learned concept hierarchy is compared to a reference ontology developed by human experts. Thus it can be concluded that the proposed approach has greater ability to be used in learning concept hierarchy

Universiti Teknologi Malaysia Institutional Repository