Search CORE

481,564 research outputs found

Feature Selection Using Hybrid Binary Grey Wolf Optimizer for Arabic Text Classification

Author: Arifin Agus Zainal
Fatichah Chastine
Subkhi Muhammad Bahrul
Publication venue: 'Lembaga Penelitian dan Pengabdian kepada Masyarakat ITS'
Publication date: 02/09/2022
Field of study

Feature selection in Arabic text is a challenging task due to the complex and rich nature of Arabic. The feature selection requires solution quality, stability, conver- gence speed, and the ability to find the global optimal. This study proposes a feature selection method using the Hybrid Binary Gray Wolf Optimizer (HBGWO) for Ara- bic text classification. The HBGWO method combines the local search capabilities or exploratory of the BGWO and the search capabilities around the best solutions or exploits of the PSO. HBGWO method also combines SCA’s capabilities in finding global solutions. The data set used Arabic text from islambook.com, which consists of five Hadith books. The books selected five classes: Tauhid, Prayer, Zakat, Fasting, and Hajj. The results showed that the BGWO-PSO-SCA feature selection method with the fitness function search and classification method using SVM could per- form better on Arabic text classification problems. BGWO-PSO with fitness function and the classification method using SVM (C=1.0) gives a high accuracy value of 76.37% compared to without feature selection. The BGWO-PSO-SCA feature selec- tion method provides an accuracy value of 88.08%. This accuracy value is higher than the BGWO-PSO feature selection and other feature selection methods

Center for Scientific Publication

IPTEK The Journal for Technology and Science

Efficient Feature Subset Selection Algorithm for High Dimensional Data

Author: Chormunge Smita
Jena Sudarson
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/08/2016
Field of study

Feature selection approach solves the dimensionality problem by removing irrelevant and redundant features. Existing Feature selection algorithms take more time to obtain feature subset for high dimensional data. This paper proposes a feature selection algorithm based on Information gain measures for high dimensional data termed as IFSA (Information gain based Feature Selection Algorithm) to produce optimal feature subset in efficient time and improve the computational performance of learning algorithms. IFSA algorithm works in two folds: First apply filter on dataset. Second produce the small feature subset by using information gain measure. Extensive experiments are carried out to compare proposed algorithm and other methods with respect to two different classifiers (Naive bayes and IBK) on microarray and text data sets. The results demonstrate that IFSA not only produces the most select feature subset in efficient time but also improves the classifier performance

IAES journal

Crossref

Institute of Advanced Engineering and Science

Penalty Decoding: Well Suppress the Self-Reinforcement Effect in Open-Ended Text Generation

Author: Hao Hongkun
Wang Rui
Zhu Wenhong
Publication venue
Publication date: 23/10/2023
Field of study

The decoding algorithm is critical for open-ended text generation, transforming latent representations into coherent and meaningful outputs. This paper investigates the self-reinforcement effect in text generation and the effectiveness of a repetition penalty to mitigate it. However, determining the optimal repetition penalty value is challenging. To tackle this, we propose a forgetting mechanism that disregards distant tokens, reducing the burden of penalty selection. In addition, we introduce a length penalty to address overly short sentences caused by excessive penalties. Our penalty decoding approach incorporating three strategies helps resolve issues with sampling methods deviating from factual information. Experimental results demonstrate the efficacy of our approach in generating high-quality sentences resembling human output.Comment: Accepted by EMNLP202

arXiv.org e-Print Archive

Enhancing extractive text summarization using natural language processing with an optimal deep learning model

Author: Abdulbasit A. Darem
Abdulkhaleq Q. A. Hassan
Ahmed Mahmud
Badriyya B. Al-onazi
Ibrahim Abunadi
Mashael Maashi
Publication venue: AIMS Press
Publication date: 01/04/2024
Field of study

Natural language processing (NLP) performs a vital function in text summarization, a task targeted at refining the crucial information from the massive quantity of textual data. NLP methods allow computers to comprehend and process human language, permitting the development of advanced summarization methods. Text summarization includes the automatic generation of a concise and coherent summary of a specified document or collection of documents. Extracting significant insights from text data is crucial as it provides advanced solutions to end-users and business organizations. Automatic text summarization (ATS) computerizes text summarization by decreasing the initial size of the text without the loss of main data features. Deep learning (DL) approaches exhibited significant performance in abstractive and extractive summarization tasks. This research designed an extractive text summarization using NLP with an optimal DL (ETS-NLPODL) model. The major goal of the ETS-NLPODL technique was to exploit feature selection with a hyperparameter-tuned DL model for summarizing the text. In the ETS-NLPODL technique, an initial step of data preprocessing was involved to convert the input text into a compatible format. Next, a feature extraction process was carried out and the optimal set of features was chosen by the hunger games search optimization (HGSO) algorithm. For text summarization, the ETS-NLPODL model used an attention-based convolutional neural network with a gated recurrent unit (ACNN-GRU) model. Finally, the mountain gazelle optimization (MGO) algorithm was employed for the optimal hyperparameter selection of the ACNN-GRU model. The experimental results of the ETS-NLPODL system were examined under the benchmark dataset. The experimentation outcomes pointed out that the ETS-NLPODL technique gained better performance over other methods concerning diverse performance measures

Directory of Open Access Journals

Information gain based dimensionality selection for classifying text documents

Author: Manic Milos
McQueen Miles
Wijayasekara Dumidu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2013
Field of study

Selecting the optimal dimensions for various knowledge extraction applications is an essential component of data mining. Dimensionality selection techniques are utilized in classification applications to increase the classification accuracy and reduce the computational complexity. In text classification, where the dimensionality of the dataset is extremely high, dimensionality selection is even more important. This paper presents a novel, genetic algorithm based methodology, for dimensionality selection in text mining applications that utilizes information gain. The presented methodology uses information gain of each dimension to change the mutation probability of chromosomes dynamically. Since the information gain is calculated a priori, the computational complexity is not affected. The presented method was tested on a specific text classification problem and compared with conventional genetic algorithm based dimensionality selection. The results show an improvement of 3% in the true positives and 1.6% in the true negatives over conventional dimensionality selection methods

Crossref

UNT Digital Library

Q(D)O-ES: Population-based Quality (Diversity) Optimisation for Post Hoc Ensemble Selection in AutoML

Author: Anastacio Marie
Beel Joeran
Bischl Bernd
Hoos Holger
Purucker Lennart
Schneider Lennart
Publication venue
Publication date: 02/08/2023
Field of study

Automated machine learning (AutoML) systems commonly ensemble models post hoc to improve predictive performance, typically via greedy ensemble selection (GES). However, we believe that GES may not always be optimal, as it performs a simple deterministic greedy search. In this work, we introduce two novel population-based ensemble selection methods, QO-ES and QDO-ES, and compare them to GES. While QO-ES optimises solely for predictive performance, QDO-ES also considers the diversity of ensembles within the population, maintaining a diverse set of well-performing ensembles during optimisation based on ideas of quality diversity optimisation. The methods are evaluated using 71 classification datasets from the AutoML benchmark, demonstrating that QO-ES and QDO-ES often outrank GES, albeit only statistically significant on validation data. Our results further suggest that diversity can be beneficial for post hoc ensembling but also increases the risk of overfitting.Comment: 10 pages main paper, 24 pages references and appendix, 4 figures, 16 subfigures, 13 tables, to be published in: International Conference on Automated Machine Learning 2023; affiliations corrected. arXiv admin note: text overlap with arXiv:2307.0028

arXiv.org e-Print Archive

Modelos de clasificación binaria de la coloración semántica de textos

Author: Boyko Nataliya
Publication venue: 'Universidad de Santander - UDES'
Publication date: 01/12/2023
Field of study

Introduction: The purpose of the research is to compare different types of recurrent neural network architectures, namely the long short-term memory and gate recurrent node architecture and the convolutional neural network, and to explore their performance on the example of binary text classification. Material and Methods: To achieve this, the research evaluates the performance of these two popular deep-learning approaches on a dataset consisting of film reviews that are marked with both positive and adverse opinions. The real-world dataset was used to train neural network models using software implementations. Results and Discussion: The research focuses on the implementation of a recurrent neural network for the binary classification of a dataset and explores different types of architecture, approaches and hyperparameters to determine the best model to achieve optimal performance. The software implementation allowed evaluating of various quality metrics, which allowed comparing the performance of the proposed approaches. In addition, the research explores various hyperparameters such as learning rate, packet sizes, and regulation methods to determine their impact on model performance. Conclusion: In general, the research provides valuable insights into the performance of neural networks in binary text classification and highlights the importance of careful architecture selection and hyperparameter tuning to achieve optimal performance

Revistas UDES (Universidad de Santander)