20 research outputs found

    The study of probability model for compound similarity searching

    Get PDF
    Information Retrieval or IR system main task is to retrieve relevant documents according to the users query. One of IR most popular retrieval model is the Vector Space Model. This model assumes relevance based on similarity, which is defined as the distance between query and document in the concept space. All currently existing chemical compound database systems have adapt the vector space model to calculate the similarity of a database entry to a query compound. However, it assumes that fragments represented by the bits are independent of one another, which is not necessarily true. Hence, the possibility of applying another IR model is explored, which is the Probabilistic Model, for chemical compound searching. This model estimates the probabilities of a chemical structure to have the same bioactivity as a target compound. It is envisioned that by ranking chemical structures in decreasing order of their probability of relevance to the query structure, the effectiveness of a molecular similarity searching system can be increased. Both fragment dependencies and independencies assumption are taken into consideration in achieving improvement towards compound similarity searching system. After conducting a series of simulated similarity searching, it is concluded that PM approaches really did perform better than the existing similarity searching. It gave better result in all evaluation criteria to confirm this statement. In terms of which probability model performs better, the BD model shown improvement over the BIR model

    Pengelompokan Data Kaji Cuaca Menggunakan Teknik Engelompokan Hierarki Agglomerative Bagi Peramalan Taburan Hujan

    Get PDF
    Kertas kerja ini melaporkan penggunaan teknik pengelompokan hierarki Agglomerative bagi melakukan peramalan taburan hujan. Tujuan utama kajian ini adalah untuk melihat keberkesanan serta prestasi algoritma yang terdapat di dalam teknik pengelompokan hierarki. Kertas kerja ini bermula dengan penerangan ke atas pengelompokan hierarki yang memfokus kepada algoritma Single Link, Average Link dan Complete Link. Melalui penggunaan algoritma-algoritma tersebut, kelompok dihasilkan berdasarkan pembentukan susunan skema pengelompokan dengan mengurangkan jumlah kelompok bagi setiap proses. Kelompok yang dihasilkan, diperolehi daripada gabungan kelompok-kelompok yang terhampir (sama) kepada satu kelompok. Kelompok-kelompok yang dihasilkan melalui ketiga-tiga algoritma tersebut akan digunakan sebagai input bagi melakukan peramalan taburan hujan. Langkah-langkah yang terlibat di dalam proses pengelompokan ini akan diterangkan dengan lebih jelas di dalam bahagian metodologi kajian. Seterusnya, kertas kerja ini akan menerangkan mengenai eksperimen yang dilakukan ke atas kelompok-kelompok yang dihasilkan dengan menggunakan ketiga-tiga algoritma di atas. Pengukuran prestasi pengelompokan dibuat berdasarkan hasil pengelompokan ialah nilai ralat min punca kuasa dua (RMS) dan nilai pekali kolerasi yang dihasilkan di dalam setiap eksperimen yang telah dijalankan. Hasil kajian menunjukkan bahawa peramalan taburan hujan yang terbaik diperolehi melalui penggunaan algoritma Complete-Link

    Pengkelasan dokumen web menggunakan teknik vector machine (SVM)

    Get PDF
    Dewasa ini, kebanyakan enjin carian di internet menggunakan sistem pengindeksan subjek berbanding pengkelasan dokumen. Dalam sistem pengindeksan subjek, kosa kata atau kata kunci yang terkawal digunakan untuk menetapkan istilah pengindeksan pada dokumendokumen web. Manakala, pengkelasan dokumen pula akan mengkelaskan dokumen-dokumen web dalam satu struktur hirarki berdasarkan kategori subjek. Pengindeksan berdasarkan kata kunci berkemampuan untuk mencari dokumen-dokumen yang mengandungi kata kunci yang spesifik. Walau bagaimanapunia sukar untuk mengenalpasti dokumen-dokumen yang mempunyai kategori yang sama. Oleh yang demikian, pengkelasan teks secara automatik adalah diperlukan. Ini bertujuan untuk mengkelaskan dokumen-dokumen ke dalam kategoriketegori yang berbeza berdasarkan kandungan teks. Sehubungan dengan itu, kertas kerja ini akan membincangkan tentang kajian pengkelasan teks dengan menggunakan kaedah Support Vector Machine (SVM). Set data yang digunakan dalam kajian ini diperolehi daripada Bank Search Information Consultancy Ltd. dan Jabatan Sains Komputer di University of Reading. Set data ini dipecahkan kepada empat kategori iaitu perbankan dan kewangan, bahasa pengaturcaraan, sains dan sukan. Hasil kajian ini menunjukkan peratus ketepatan pengkelasan dokumen web untuk set data yang digunakan adalah rendah dan kurang memuaskan

    Rubber-Tree Leaf Diseases Mapping Using Close Range Remote Sensing Images

    Get PDF
     Currently, close-range remote sensing method using drone-based platform which payload compact sensor has been used for monitoring and mapping in the agriculture sector at large area. Thus, this study is deployed drone with a compact sensor to identify the rubber tree leaf diseases based on two groups of a spectral wavelength which are visible (RGB: 0.4 µm – 0.7 µm) and near infrared (NIR: 0.7µm – 2.0 µm), respectively. Spectral obtained from drone-based platform will be validated using ground observation handheld spectroradiometer. Eight types of rubber tree clones leaf at three different conditions (healthy, unhealthy and severe) were randomly selected within the 9.4-hectare Experimental Rubber Plot, Rubber Research Institute of Malaysia (RRIM), Kota Tinggi, Johor whereby consist RRIM 2000 series, RRIM 3000 series, and PB series, respectively. Based on the result, quantitative analysis shows that the f-value is smaller than Critical-one tail for healthy, unhealthy while for severe the f-value is larger than Critical-one tail. The f-value is 2.887 < 4.283 (healthy), 0.002 < 0.264 (unhealthy) and 1.008 > 0.0526, respectively. Thus, this can be concluded that spectral and estimate is equal at the 0.05 significant levels. For qualitative analysis, it shows that each rubber clone tree diseases can be distinguished at the near infrared band for healthy, unhealthy and severe respectively

    Rubber-Tree Leaf Diseases Mapping Using Close Range Remote Sensing Images

    Get PDF
     Currently, close-range remote sensing method using drone-based platform which payload compact sensor has been used for monitoring and mapping in the agriculture sector at large area. Thus, this study is deployed drone with a compact sensor to identify the rubber tree leaf diseases based on two groups of a spectral wavelength which are visible (RGB: 0.4 µm – 0.7 µm) and near infrared (NIR: 0.7µm – 2.0 µm), respectively. Spectral obtained from drone-based platform will be validated using ground observation handheld spectroradiometer. Eight types of rubber tree clones leaf at three different conditions (healthy, unhealthy and severe) were randomly selected within the 9.4-hectare Experimental Rubber Plot, Rubber Research Institute of Malaysia (RRIM), Kota Tinggi, Johor whereby consist RRIM 2000 series, RRIM 3000 series, and PB series, respectively. Based on the result, quantitative analysis shows that the f-value is smaller than Critical-one tail for healthy, unhealthy while for severe the f-value is larger than Critical-one tail. The f-value is 2.887 < 4.283 (healthy), 0.002 < 0.264 (unhealthy) and 1.008 > 0.0526, respectively. Thus, this can be concluded that spectral and estimate is equal at the 0.05 significant levels. For qualitative analysis, it shows that each rubber clone tree diseases can be distinguished at the near infrared band for healthy, unhealthy and severe respectively

    Classifying biomedical text abstracts based on hierarchical 'concept' structure

    Get PDF
    Classifying biomedical literature is a difficult and challenging task, especially when a large number of biomedical articles should be organized into a hierarchical structure. In this paper, we present an approach for classifying a collection of biomedical text abstracts downloaded from Medline database with the help of ontology alignment. To accomplish our goal, we construct two types of hierarchies, the OHSUMED disease hierarchy and the Medline abstract disease hierarchies from the OHSUMED dataset and the Medline abstracts, respectively. Then, we enrich the OHSUMED disease hierarchy before adapting it to ontology alignment process for finding probable concepts or categories. Subsequently, we compute the cosine similarity between the vector in probable concepts (in the "enriched" OHSUMED disease hierarchy) and the vector in Medline abstract disease hierarchies. Finally, we assign category to the new Medline abstracts based on the similarity score. The results obtained from the experiments show the performance of our proposed approach for hierarchical classification is slightly better than the performance of the multi-class flat classification

    Feasibility study of fuzzy clustering techniques in chemical database for compound classification

    Get PDF
    Compound selection methods are important in drug discovery especially in lead identification process. Finding the best method in compound selection has become a need to the pharmaceutical industry because of the increasing number of chemical compound to be screened. One of the best and widely used methods in compound selection is cluster-based selection where the compound datasets are grouped into clusters and representative compounds are selected from each cluster. Non-overlapping methods, such as Ward’s clustering method, have been widely used and it was agreed as the most efficient clustering method in compound selection. However, little focus has been given to overlapping method in compound selection or even in lead identification process. The research focused on the fuzzy c-means clustering where the effectiveness of the clusters produced with regard to compound selection is analyzed and compared with other conventional cluster-based compound selection method. Fuzzy c-means have been chosen because it produces clusters by identifying the cluster centroid and their corresponding degree of membership, therefore the compounds may belong to more than one cluster. The results from fuzzy c-means method are compared to Ward’s clustering method and also to the results from the fuzzification of Ward’s cluster. The analysis shows that fuzzy c-means clustering gives the best result in intermolecular dissimilarity; however it shows poor results of separation of active/inactive structure

    Pengelompokan data kajicuaca bagi peramalan taburan hujan

    Get PDF
    Penganalisaan data kajicuaca merupakan satu tugas yang sangat penting tetapi sukar dan amat mencabar kepada pihak Jabatan Perkhidmatan Kajicuaca Malaysia (JPKM) ekoran penambahan jumlah data kajicuaca dari masa ke semasa. Hasil daripada penganalisaan tersebut akan digunakan untuk membuat keputusan dan juga peramalan taburan hujan pada masa akan datang. Di dalam proses peramalan taburan hujan, peramal mestilah mengenalpasti parameter kajicuaca manakah yang memberikan pengaruh yang besar kepada ketepatan atau prestasi peramalan taburan hujan. Salah satu cara untuk kenalpasti parameter tersebut ialah dengan melakukan pengelompokan ke atas data kajicuaca tersebut. Oleh yang demikian, kertas kerja ini membincangkan tentang perbandingan di antara dua teknik pengelompokan iaitu teknik peraturan kesatuan dan kaedah statistik di dalam mengelompokkan data kajicuaca bagi tujuan peramalan taburan hujan. Hasil kajian telah mendapati bahawa teknik peraturan kesatuan adalah lebih sesuai untuk digunakan di dalam pengelompokan data kajicuaca berbanding kaedah statistik. Selain daripada itu, penggunaan parameter data kajicuaca yang berada di dalam kelompok yang berlainan didapati telah memberikan prestasi peramalan taburan hujan yang lebih baik berbanding penggunaan parameter data kajicuaca yang berada di dalam kelompok yang sama

    Employing ontology enrichment algorithm in classifying biomedical text abstracts

    No full text
    The application of text classification systems on biomedical literature aims to select articles relevant to a specific issue from large corpora. As the amount of online biomedical literature grows, the task of finding relevant information becomes very complicated, due to the difficulties in browsing and searching the relevant information through the web. Ontology is useful for organizing and navigating the We b sites and also for improving the accuracy of Web searches. It provides a shared understanding of domain, to overcome differences in terminology such as synonym, term variants and terms ambiguity. However, one of the problems raised in ontology is the maintenance of these bases of concepts. Therefore, we investigate and propose an ontology enrichment algorithm as one of the methods to modify an existing ontology. In this research, we present a new ontology enrichment algorithm for assigning or associating each concept in the training ontology with the relevant and informative features from biomedical information sources