3 research outputs found
Pembangunan taksonomi dari teks Melayu menggunakan algoritma kunang-kunang pembahagi dua sama
Taksonomi digunakan untuk menerangkan bahawa haiwan boleh dikelaskan kepada beberapa
kategori seperti mamalia, reptilia dan buaya. Taksonomi biologi ini membolehkan
persamaan, perbezaan malah hubungan antara haiwan ditakrifkan. Konsep dan fungsi
taksonomi biologi ini ‘dipinjam’ oleh saintis dan jurutera Internet dalam membangunkan
taksonomi untuk Internet. Seperti taksonomi biologi, membangunkan taksonomi untuk
Internet secara manual bukanlah suatu yang mudah dan murah. Tugas ini mengambil masa
dan memerlukan kepintaran dalam bidang. Justeru saintis komputer telah menggunakan
pendekatan kecerdasan buatan untuk membangunkan taksonomi secara automatik dari teks.
Algoritma pembelajaran mesin dicipta untuk membolehkan mesin ‘membaca’ teks dan
kemudiannya ‘belajar’ untuk membina taksonomi dari konteks yang diperolehi dari teks.
Objektif utama kajian ini adalah untuk membangunkan algoritma pembelajaran taksonomi
dari Bahasa Melayu yang lebih berkesan dari algoritma sedia ada menggunakan kaedah
penghibridan. Makalah ini menyiasat keberkesanan algoritma hibrid antara Algoritma
Kunang-Kunang (AKK) dengan Algoritma K-Min Pembahagi Dua Sama (PDS) yang
dipanggil Algoritma Kunang-Kunang Pembahagi Dua Sama (AKK-PD). Kajian empirikal ini
mengumpul data dari eksperimen yang dijalankan ke atas tiga teks Bahasa Melayu dari
bidang Fekah, Biokimia dan Teknologi Maklumat. Perbandingan data ketepatan berasaskan
ukuran-F menunjukkan algoritma hybrid AKK-PD membina taksonomi yang lebih tepat
berbanding menggunakan algoritma sedia ada. AKK-PD didapati lebih berkesan dan mantap
berbanding algoritma bandingan apabila mengendalikan masalah kejarangan data . Walau
bagaimanapun, kajian penerokaan ini perlu diteruskan kepada korpus Bahasa Melayu yang
lebih besar untuk menguji ketahanan algoritma ini apabila berhadapan dengan korpus yang lebih umum sifatnya berbanding korpus teks yang teknikal dan menjurus kepada suatu bidang
sahaja. Teknik pengekstrakan ciri berasakan kebergantungan sintaksis juga perlu
dipertingkatkan kerana jelas teknik telah menghasilkan konteks yang mengalami masalah
kejarangan data yang serius. Justeru memberi cabaran baharu untuk penyelidikan
pembelajaran taksonomi dari teks Melayu
Hybrid fuzzy multi-objective particle swarm optimization for taxonomy extraction
Ontology learning refers to an automatic extraction of ontology to produce the ontology learning layer cake which consists of five kinds of output: terms, concepts, taxonomy relations, non-taxonomy relations and axioms. Term extraction is a prerequisite for all aspects of ontology learning. It is the automatic mining of complete terms from the input document. Another important part of ontology is taxonomy, or the hierarchy of concepts. It presents a tree view of the ontology and shows the inheritance between subconcepts and superconcepts. In this research, two methods were proposed for improving the performance of the extraction result. The first method uses particle swarm optimization in order to optimize the weights of features. The advantage of particle swarm optimization is that it can calculate and adjust the weight of each feature according to the appropriate value, and here it is used to improve the performance of term and taxonomy extraction. The second method uses a hybrid technique that uses multi-objective particle swarm optimization and fuzzy systems that ensures that the membership functions and fuzzy system rule sets are optimized. The advantage of using a fuzzy system is that the imprecise and uncertain values of feature weights can be tolerated during the extraction process. This method is used to improve the performance of taxonomy extraction. In the term extraction experiment, five extracted features were used for each term from the document. These features were represented by feature vectors consisting of domain relevance, domain consensus, term cohesion, first occurrence and length of noun phrase. For taxonomy extraction, matching Hearst lexico-syntactic patterns in documents and the web, and hypernym information form WordNet were used as the features that represent each pair of terms from the texts. These two proposed methods are evaluated using a dataset that contains documents about tourism. For term extraction, the proposed method is compared with benchmark algorithms such as Term Frequency Inverse Document Frequency, Weirdness, Glossary Extraction and Term Extractor, using the precision performance evaluation measurement. For taxonomy extraction, the proposed methods are compared with benchmark methods of Feature-based and weighting by Support Vector Machine using the f-measure, precision and recall performance evaluation measurements. For the first method, the experiment results concluded that implementing particle swarm optimization in order to optimize the feature weights in terms and taxonomy extraction leads to improved accuracy of extraction result compared to the benchmark algorithms. For the second method, the results concluded that the hybrid technique that uses multi-objective particle swarm optimization and fuzzy systems leads to improved performance of taxonomy extraction results when compared to the benchmark methods, while adjusting the fuzzy membership function and keeping the number of fuzzy rules to a minimum number with a high degree of accuracy
A hybrid approach for learning concept hierarchy from Malay text using GAHC and immune network
The human immune system provides inspiration in the attempt of solving the knowledge acquisition bottleneck in developing ontology for semantic web application. In this paper, we proposed an extension to the Guided Agglomerative Hierarchical Clustering (GAHC) method that uses an Artificial Immune Network (AIN) algorithm to improve the process of automatically building and expanding the concept hierarchy. A small collection of Malay text is used from three different domains which are IT, Biochemistry and Fiqh to test the effectiveness of the proposed approach and also by comparing it with GAHC. The proposed approach consists of three stages: pre-processing, concept hierarchy induction using GAHC and concept hierarchy learning using AIN. To validate our approach, the automatically learned concept hierarchy is compared to a reference ontology developed by human experts. Thus it can be concluded that the proposed approach has greater ability to be used in learning concept hierarchy