272 research outputs found
Applications of Mining Arabic Text: A Review
Since the appearance of text mining, the Arabic language gained some interest in applying several text mining tasks over a text written in the Arabic language. There are several challenges faced by the researchers. These tasks include Arabic text summarization, which is one of the challenging open areas for research in natural language processing (NLP) and text mining fields, Arabic text categorization, and Arabic sentiment analysis. This chapter reviews some of the past and current researches and trends in these areas and some future challenges that need to be tackled. It also presents some case studies for two of the reviewed approaches
MultiGBS: A multi-layer graph approach to biomedical summarization
Automatic text summarization methods generate a shorter version of the input
text to assist the reader in gaining a quick yet informative gist. Existing
text summarization methods generally focus on a single aspect of text when
selecting sentences, causing the potential loss of essential information. In
this study, we propose a domain-specific method that models a document as a
multi-layer graph to enable multiple features of the text to be processed at
the same time. The features we used in this paper are word similarity, semantic
similarity, and co-reference similarity, which are modelled as three different
layers. The unsupervised method selects sentences from the multi-layer graph
based on the MultiRank algorithm and the number of concepts. The proposed
MultiGBS algorithm employs UMLS and extracts the concepts and relationships
using different tools such as SemRep, MetaMap, and OGER. Extensive evaluation
by ROUGE and BERTScore shows increased F-measure values
SemPCA-Summarizer: Exploiting Semantic Principal Component Analysis for Automatic Summary Generation
Text summarization is the task of condensing a document keeping the relevant information. This task integrated in wider information systems can help users to access key information without having to read everything, allowing for a higher efficiency. In this research work, we have developed and evaluated a single-document extractive summarization approach, named SemPCA-Summarizer, which reduces the dimension of a document using Principal Component Analysis technique enriched with semantic information. A concept-sentence matrix is built from the textual input document, and then, PCA is used to identify and rank the relevant concepts, which are used for selecting the most important sentences through different heuristics, thus leading to various types of summaries. The results obtained show that the generated summaries are very competitive, both from a quantitative and a qualitative viewpoint, thus indicating that our proposed approach is appropriate for briefly providing key information, and thus helping to cope with a huge amount of information available in a quicker and efficient manner
Automatic Text Summarization for Hindi Using Real Coded Genetic Algorithm
In the present scenario, Automatic Text Summarization (ATS) is in great demand to address the ever-growing volume of text data available online to discover relevant information faster. In this research, the ATS methodology is proposed for the Hindi language using Real Coded Genetic Algorithm (RCGA) over the health corpus, available in the Kaggle dataset. The methodology comprises five phases: preprocessing, feature extraction, processing, sentence ranking, and summary generation. Rigorous experimentation on varied feature sets is performed where distinguishing features, namely- sentence similarity and named entity features are combined with others for computing the evaluation metrics. The top 14 feature combinations are evaluated through Recall-Oriented Understudy for Gisting Evaluation (ROUGE) measure. RCGA computes appropriate feature weights through strings of features, chromosomes selection, and reproduction operators: Simulating Binary Crossover and Polynomial Mutation. To extract the highest scored sentences as the corpus summary, different compression rates are tested. In comparison with existing summarization tools, the ATS extractive method gives a summary reduction of 65%
Abstract Creation of Research Paper Using Feature Specific Sentence Extraction based Summarization
Several techniques for identifying essential content for text summarization have been created to date. Subject representation techniques is primary infer a midway reflection of the content that that grabs the styles discussed in the data. Considering these representations of topics, phrases in the details records are obtained for each and every relevance. In our suggested system sentence relevance detection is applied determines a score for each sentence based on its significance. Then an overview is produced by selecting most calculated sentences. The produced overview is use for producing subjective by Enhanced summation technique, choosing the sentences from the overview one by one and create word chart. In our system enhance edge weighting strategy is applied for high connection throughout words of produced chart. For discovering few shortest path sentences suggested method use dijkstras algorithm. Before choosing the best quickest path sentences, system examine framework of phrase grammatically. Outcomes demonstrate that extractive and abstractive-oriented overviews produced by Improve COPMENDIUM outshine current system of summation system. We used feature specific sentence extraction techniques which enhance the effectiveness of the summarization strategy.
DOI: 10.17762/ijritcc2321-8169.15074
- …