4 research outputs found

    Penentuan fitur bagi pengekstrakan tajuk berita akhbar bahasa Melayu

    Get PDF
    Ringkasan tajuk berita (headline) adalah salah satu teknik ringkasan teks automatik yang boleh mengurangkan masalah kebanjiran maklumat dalam sistem capaian. Teknik ini berupaya mengurangkan beban kognitif pengguna semasa meneliti dan memilih dokumen relevan dalam kuantiti yang besar. Keupayaan teknik ini dipengaruhi oleh ciri-ciri sistem bahasa tabii yang mewakili maklumat dalam dokumen. Kajian ini membincangkan proses dalam penentuan ciri-ciri sistem bahasa Melayu pada dokumen genre berita. Metodologi kajian dimulai dengan analisis ke atas korpus dokumen berita bahasa Melayu. Korpus ini mengandungi 140 dokumen berita teras yang dipilih daripada dua pangkalan data berita arus perdana di Malaysia iaitu Berita Harian dan Utusan Malaysia. Kriteria pemilihan adalah kategori berita teras, bersaiz 50 hingga 250 perkataan, dengan tahun penerbitan dari 2007 hingga 2012 dan genre berita adalah ekonomi, jenayah, pendidikan dan sukan. Tiga pakar linguistik bahasa Melayu menghasilkan satu ringkasan tajuk berita bagi setiap dokumen berita secara manual. Ketiga-tiga pakar linguistik ini perlu mematuhi tiga syarat iaitu ringkasan dilakukan secara pengekstrakan, teknik pemilihan perkataan secara select-wordinorder dan perubahan morfologi perkataan. Hasil eksperimen menunjukkan tiga fitur telah dikenal pasti iaitu, pertama: dua ayat pertama adalah calon sesuai ayat terpenting, kedua: ayat mengandungi takrifan akronim berpotensi sebagai ayat terpenting dan ketiga: saiz ringkasan tajuk berita ideal adalah enam perkataan. Pertimbangan fitur ini membolehkan ringkasan tajuk berita dijana secara automatik yang lebih mirip seperti dilakukan oleh manusia

    Bilingual Extractive Text Summarization Model using Textual Pattern Constraints

    Get PDF
    In the era of digital information, an auto-generated summary can help readers to easily find important and relevant information. Most of the studies and benchmark data sets in the field of text summarization are in English. Hence, there is a need to study the potential of Malay language in this field. This study also highlights the problems in identifying and generating important information in extractive summaries. This is because existing text representation models such as BOW has weaknesses in inaccurate semantic representation, while the N-gram model has the issue of producing very high word vector dimensions. In this study, a bilingual text summarization model named MYTextSumBASIC has been developed to generate an extractive summary automatically in Malay and English. The MYTextSumBASIC summarizer model applies a text representation model known as FASP using three Textual Pattern Constraints, namely word item constraints, adjacent word constraints and sequence size constraints. There are three main phases in the framework of MYTextSumBASIC model, which are the development of the Malay language corpus, the development of MYTextSumBASIC model using FASP and the summary evaluation phase. In the summary evaluation phase, using the Malay language data sets of 100 news articles, the summaries produced by MYTextSumBASIC outperformed the summary generated by Baseline (Lead) and OTS summarizer with the highest average for retrieval (R) is 0.5849, precision (P) is 0.5736 and the F-score (Fm) is 0.5772. For manual evaluation by linguists, the MYTextSumBASIC method yielded a reading score of 4.1 and 3.87 for summary content generated using a random data set. Further experiments using the 2002 DUC English benchmark data set of 102 news articles have also shown that the MYTextSumBASIC model outperformed the best and lowest systems in the comparison with the mean retrieval values of ROUGE-1 (0.43896) and ROUGE-2 (0.19918). These findings conclude that the FASP text representation feature along with the textual pattern constraints used by our model can be used for bilingual text with competitive performance compared to other text summarization models

    The evaluation of occupational accident with sequential pattern mining

    Get PDF
    Accidents in manufacturing systems greatly affect productivity and efficiency, which are well known perfor-mance indicaters in practice. Therefore, it is very important to know the sequential patterns among the accidents to avode possible losses decrasing performance of the manufacturing systems. In order to reduce accidents, it is necessary to determine the patterns that cause the accident first. The associations among the causes of the occurrence of accidents is rarely investigated in the literature. To fill this gap, the patterns of causes among the accidents in the manufacturing system are revealed by using sequential pattern mining in this study. The most important contribution of this study is the discovery of sequential patterns formed by accident characteristics of pre-accident, moment of accident and post-accident stages unlike traditional accident investigation methods. Additionally, knowing the patterns of causes among the accidents can help decision makers to prepare a more proactive security program in real life. The CloFast algorithm is performed to go into the details of accidents in manufacturing systems. Accident records induding data between 2013 and 2019 are used to discover the sequential patterns. The results of this study showed that each accidents has its own sequential accident patterns and it is also posible to prevent possible accidents and reduce losses due to accidents considering sequential patterns in real life. Safety engineers and occupational safety specialists should take into account the sequential patterns among the accidents to avoid similar accident in the near future

    Closing the Affordable Housing Gap: Identifying the Barriers Hindering the Sustainable Design and Construction of Affordable Homes

    Get PDF
    Despite the commitment of the United Nations (UN) to provide everyone with equal access to basic services, the construction sector still fails to reach the production capacity and quality standards which are needed to meet the fast-growing demand for affordable homes. Whilst innovation measures are urgently needed to address the existing inefficiencies, the identification and development of the most appropriate solutions require a comprehensive understanding of the barriers obstructing the design and construction phase of affordable housing. To identify such barriers, an exploratory data mining analysis was conducted in which agglomerative hierarchical clustering made it possible to gather latent knowledge from 3566 text-based research outputs sourced from the Web of Science and Scopus. The analysis captured 83 supply-side barriers which impact the efficiency of the value chain for affordable housing provision. Of these barriers, 18 affected the design and construction phase, and after grouping them by thematic area, seven key matters of concern were identified: (1) design (not) for all, (2) homogeneity of provision, (3) unhealthy living environment, (4) inadequate construction project management, (5) environmental unsustainability, (6) placemaking, and (7) inadequate technical knowledge and skillsets. The insights which resulted from the analysis were seen to support evidence-informed decision making across the affordable housing sector. The findings suggest that fixing the inefficiencies of the affordable housing provision system will require UN Member States to accelerate the transition towards a fully sustainable design and construction process. This transition should prioritize a more inclusive and socially sensitive approach to the design and construction of affordable homes, capitalizing on the benefits of greater user involvement. In addition, transformative actions which seek to deliver more resource-efficient and environmentally friendly homes should be promoted, as well as new investments in the training and upskilling of construction professionals