272 research outputs found

    Applications of Mining Arabic Text: A Review

    Get PDF
    Since the appearance of text mining, the Arabic language gained some interest in applying several text mining tasks over a text written in the Arabic language. There are several challenges faced by the researchers. These tasks include Arabic text summarization, which is one of the challenging open areas for research in natural language processing (NLP) and text mining fields, Arabic text categorization, and Arabic sentiment analysis. This chapter reviews some of the past and current researches and trends in these areas and some future challenges that need to be tackled. It also presents some case studies for two of the reviewed approaches

    MultiGBS: A multi-layer graph approach to biomedical summarization

    Full text link
    Automatic text summarization methods generate a shorter version of the input text to assist the reader in gaining a quick yet informative gist. Existing text summarization methods generally focus on a single aspect of text when selecting sentences, causing the potential loss of essential information. In this study, we propose a domain-specific method that models a document as a multi-layer graph to enable multiple features of the text to be processed at the same time. The features we used in this paper are word similarity, semantic similarity, and co-reference similarity, which are modelled as three different layers. The unsupervised method selects sentences from the multi-layer graph based on the MultiRank algorithm and the number of concepts. The proposed MultiGBS algorithm employs UMLS and extracts the concepts and relationships using different tools such as SemRep, MetaMap, and OGER. Extensive evaluation by ROUGE and BERTScore shows increased F-measure values

    SemPCA-Summarizer: Exploiting Semantic Principal Component Analysis for Automatic Summary Generation

    Get PDF
    Text summarization is the task of condensing a document keeping the relevant information. This task integrated in wider information systems can help users to access key information without having to read everything, allowing for a higher efficiency. In this research work, we have developed and evaluated a single-document extractive summarization approach, named SemPCA-Summarizer, which reduces the dimension of a document using Principal Component Analysis technique enriched with semantic information. A concept-sentence matrix is built from the textual input document, and then, PCA is used to identify and rank the relevant concepts, which are used for selecting the most important sentences through different heuristics, thus leading to various types of summaries. The results obtained show that the generated summaries are very competitive, both from a quantitative and a qualitative viewpoint, thus indicating that our proposed approach is appropriate for briefly providing key information, and thus helping to cope with a huge amount of information available in a quicker and efficient manner

    Automatic Text Summarization for Hindi Using Real Coded Genetic Algorithm

    Get PDF
    In the present scenario, Automatic Text Summarization (ATS) is in great demand to address the ever-growing volume of text data available online to discover relevant information faster. In this research, the ATS methodology is proposed for the Hindi language using Real Coded Genetic Algorithm (RCGA) over the health corpus, available in the Kaggle dataset. The methodology comprises five phases: preprocessing, feature extraction, processing, sentence ranking, and summary generation. Rigorous experimentation on varied feature sets is performed where distinguishing features, namely- sentence similarity and named entity features are combined with others for computing the evaluation metrics. The top 14 feature combinations are evaluated through Recall-Oriented Understudy for Gisting Evaluation (ROUGE) measure. RCGA computes appropriate feature weights through strings of features, chromosomes selection, and reproduction operators: Simulating Binary Crossover and Polynomial Mutation. To extract the highest scored sentences as the corpus summary, different compression rates are tested. In comparison with existing summarization tools, the ATS extractive method gives a summary reduction of 65%

    Abstract Creation of Research Paper Using Feature Specific Sentence Extraction based Summarization

    Get PDF
    Several techniques for identifying essential content for text summarization have been created to date. Subject representation techniques is primary infer a midway reflection of the content that that grabs the styles discussed in the data. Considering these representations of topics, phrases in the details records are obtained for each and every relevance. In our suggested system sentence relevance detection is applied determines a score for each sentence based on its significance. Then an overview is produced by selecting most calculated sentences. The produced overview is use for producing subjective by Enhanced summation technique, choosing the sentences from the overview one by one and create word chart. In our system enhance edge weighting strategy is applied for high connection throughout words of produced chart. For discovering few shortest path sentences suggested method use dijkstras algorithm. Before choosing the best quickest path sentences, system examine framework of phrase grammatically. Outcomes demonstrate that extractive and abstractive-oriented overviews produced by Improve COPMENDIUM outshine current system of summation system. We used feature specific sentence extraction techniques which enhance the effectiveness of the summarization strategy. DOI: 10.17762/ijritcc2321-8169.15074
    • …
    corecore