12 research outputs found

    Single Document Automatic Text Summarization Using Term Frequency-Inverse Document Frequency (TF-IDF)

    Full text link
    The increasing availability of online information has triggered an intensive research in the area of automatic text summarization within the Natural Language Processing (NLP). Text summarization reduces the text by removing the less useful information which helps the reader to find the required information quickly. There are many kinds of algorithms that can be used to summarize the text. One of them is TF-IDF (TermFrequency-Inverse Document Frequency). This research aimed to produce an automatic text summarizer implemented with TF-IDF algorithm and to compare it with other various online source of automatic text summarizer. To evaluate the summary produced from each summarizer, The F-Measure as the standard comparison value had been used. The result of this research produces 67% of accuracy with three data samples which are higher compared to the other online summarizers

    Extractive Text Summarization of Student Essay Assignment Using Sentence Weight Features and Fuzzy C-Means

    Get PDF
    One of the main tasks of a lecturer is to give students an academic assessment in the learning process. The assessment process begins with reading or checking the answers of student assignments that contain a combination of very long sentences such as essay or report assignments. This certainly takes a lot of time to get the primary information contained therein. It is necessary to summarize the answers so that the lecturer does not need to read the whole document but is still able to take the essence of the response to the task. This study proposes the application of summarizing text documents of student essay assignments automatically using the Fuzzy C-Means method with the sentence weighting feature. The sentence weighting feature is used by selecting the sentence with the highest weight in one cluster, helping the system to get the primary information from a document quickly. The results of this study indicate that the system succeeds in summarizing text with an average evaluation of the values of precision, recall, accuracy, and F-measure of 0.52, 0.54, 0.70, and 0.52, respectively.One of the main tasks of a lecturer is to give students an academic assessment in the learning process. The assessment process begins with reading or checking the answers of student assignments that contain a combination of very long sentences such as essay or report assignments. This certainly takes a lot of time to get the primary information contained therein. It is necessary to summarize the answers so that the lecturer does not need to read the whole document but is still able to take the essence of the response to the task. This study proposes the application of summarizing text documents of student essay assignments automatically using the Fuzzy C-Means method with the sentence weighting feature. The sentence weighting feature is used by selecting the sentence with the highest weight in one cluster, helping the system to get the primary information from a document quickly. The results of this study indicate that the system succeeds in summarizing text with an average evaluation of the values of precision, recall, accuracy, and F-measure of 0.52, 0.54, 0.70, and 0.52, respectively

    Automatic Text Summarization of Legal Cases: A Hybrid Approach

    Full text link
    Manual Summarization of large bodies of text involves a lot of human effort and time, especially in the legal domain. Lawyers spend a lot of time preparing legal briefs of their clients' case files. Automatic Text summarization is a constantly evolving field of Natural Language Processing(NLP), which is a subdiscipline of the Artificial Intelligence Field. In this paper a hybrid method for automatic text summarization of legal cases using k-means clustering technique and tf-idf(term frequency-inverse document frequency) word vectorizer is proposed. The summary generated by the proposed method is compared using ROGUE evaluation parameters with the case summary as prepared by the lawyer for appeal in court. Further, suggestions for improving the proposed method are also presented.Comment: Part of 5th International Conference on Natural Language Processing (NATP 2019) Proceeding

    Automatic Cover Letter Generator System from CVs

    Get PDF
    The proposed system comes to overcome the problem of writing a C.V. Cover letter which requires some linguistic skills and a lot of experience in this domain in addition to its cost in term of time and money. The ACLGS solved the problem by developing an auto generated cover letter based on the user C.V. regardless its format. The ACLGS takes the user C.V. and the carrier announcement that contains the job requirements and the skills needed as input. The system solved the problem by building a template as a frame of slots each slot contains a required skill for the job; the system extracted the required information from the user CV and fills the slots in an automatic fashion. The ACLGS applies the Information retrieval methodologies to extract information with intelligence trends to mine the user C.V. in terms of part of speech tags and some of indicator words that the system used to recognize the proper data and required information. In addition, the system specifies a set of features for each slot in the form. The user C.V. clustered into a number of categories (e.g. Personal information, Qualifications, Experience, Skill, Rewords, and Publications). These categories are used as additional features for the extracted information and data. The system took into account the problem of sentence coherence and improves the output document through using pre-specified sentences that inserted into the output document based on the extracted information discovered from the user C.V

    Automatic Arabic Text Summarization System (AATSS) Based on Semantic Feature Extraction

    Get PDF
    Recently, one of the problems arisen due to the amount of information and it’s availability on the web, is the increased need for effective and powerful tool to automatically summarize text. For English and European languages an intensive works have been done with high performance and nowadays they look forward to multi-document and multi-language summarization. However, Arabic language still suffers from the little attentions and research done in this filed. In our research we propose a model to automatically summarize Arabic text using text extraction. Various steps are involved in the approach: preprocessing text, extract set of feature from sentences, classify sentence based on scoring method, ranking sentences and finally generate an extract summary. The main difference between our proposed system and other Arabic summarization systems are the consideration of semantics, entity objects such as names and places, and similarity factors in our proposed system. The proposed system has been applied on news domain using a dataset obtained from Falesteen newspaper. Manual evaluation techniques are used to evaluate and test the system. The results obtained by the proposed method achieve 86.5% similarity between the system and human summarization. A comparative study between our proposed system and Sakhr Arabic online summarization system has been conducted. The results show that our proposed system outperforms the Shakr system

    PERBANDINGAN ALGORITMA K-MEANS DAN K-NEAREST NEIGHBORS PADA SISTEM PERINGKASAN OTOMATIS

    Get PDF
    Peringkasan suatu dokumen mengambil informasi utama yang terkandung dalam dokumen tersebut. Akan tetapi untuk memperoleh informasi penting yang terkandung di suatu artikel, dibutuhkan waktu yang lama. Hal inilah yang menyebabkan munculnya berbagai penelitian yang berkaitan dengan sistem peringkasan otomatis. Dengan adanya sistem ini, pembaca diharapkan dapat lebih mudah menemukan informasi yang relevan dengan kebutuhannya. K-Means dan K-NN adalah dua buah metode yang telah digunakan untuk meringkas teks secara otomatis. Kedua penelitian yang masing-masing menggunakan metode tersebut, menghasilkan kinerja yang baik (akurasi di atas 50%). Namun, untuk dapat digunakan secara luas, perlu diteliti metode mana yang memiliki akurasi lebih tinggi. Berdasarkan hal tersebut dalam penelitian ini, dilakukan perbandingan metode K-Means dan K-NN dalam kasus peringkasan teks secara otomatis. Dokumen yang digunakan sebagai bahan uji adalah dokumen latar belakang laporan Skripsi. Perbandingan dilakukan dengan menggunakan 100 buah data. Berdasarkan pengujian yang telah dilakukan, peringkasan dengan K-NN menghasilkan rata-rata akurasi sebesar 49%, sementara K-Means sebesar 51%. Hal ini menunjukkan bahwa walaupun K-Means memiliki akurasi yang lebih tinggi, perbedaan keduanya tidaklah mencolok secara umum. Dalam beberapa dokumen, K-NN justru menghasilkan akurasi yang lebih tinggi secara signifikan

    Semantic Selection of Internet Sources through SWRL Enabled OWL Ontologies

    Get PDF
    This research examines the problem of Information Overload (IO) and give an overview of various attempts to resolve it. Furthermore, argue that instead of fighting IO, it is advisable to start learning how to live with it. It is unlikely that in modern information age, where users are producer and consumer of information, the amount of data and information generated would decrease. Furthermore, when managing IO, users are confined to the algorithms and policies of commercial Search Engines and Recommender Systems (RSs), which create results that also add to IO. this research calls to initiate a change in thinking: this by giving greater power to users when addressing the relevance and accuracy of internet searches, which helps in IO. However powerful search engines are, they do not process enough semantics in the moment when search queries are formulated. This research proposes a semantic selection of internet sources, through SWRL enabled OWL ontologies. the research focuses on SWT and its Stack because they (a)secure the semantic interpretation of the environments where internet searches take place and (b) guarantee reasoning that results in the selection of suitable internet sources in a particular moment of internet searches. Therefore, it is important to model the behaviour of users through OWL concepts and reason upon them in order to address IO when searching the internet. Thus, user behaviour is itemized through user preferences, perceptions and expectations from internet searches. The proposed approach in this research is a Software Engineering (SE) solution which provides computations based on the semantics of the environment stored in the ontological model
    corecore