20 research outputs found

    Feature Selection Based on Multi-Filters for Classification of Mammogram Images to Look for Signs of Breast Cancer

    Get PDF
    The accuracy of classification results on mammogram images has a significant role in breast cancer diagnosis. Therefore, many stages consider finding the model has a high level of accuracy and minimizing the computing load, one of which is the accuracy in using the best feature. This needs to be prioritized considering that mammogram image has many features resulting from the mammogram extraction process. Our research has four stages: feature extraction, feature selection-multi filters, classification, and performance evaluation. Thus, in this research, we propose algorithms that can select the features by utilizing multiple filters simultaneously on the filter model for feature selection of mammogram images based on multi-filters/FSbMF. There are six feature selection algorithms with a filter approach (information gain, rule, relief, correlation, gini index, and chi-square) used in this research. Based on the testing result using 10-fold cross-validation, the features resulting from the FSbMF algorithm have the best performance based on the accuracy, recall, and precision from 72,63%, 70,38%, 75,01% to be 100%. Furthermore, the number of resulting features is the minimum because it results from intersection operation from the feature subsets resulting from the multi-filter

    Effective Feature Selection Methods for User Sentiment Analysis using Machine Learning

    Get PDF
    Text classification is the method of allocating a particular piece of text to one or more of a number of predetermined categories or labels. This is done by training a machine learning model on a labeled dataset, where the texts and their corresponding labels are provided. The model then learns to predict the labels of new, unseen texts. Feature selection is a significant step in text classification as it helps to identify the most relevant features or words in the text that are useful for predicting the label. This can include things like specific keywords or phrases, or even the frequency or placement of certain words in the text. The performance of the model can be improved by focusing on the features that are most important to the information that is most likely to be useful for classification. Additionally, feature selection can also help to reduce the dimensionality of the dataset, making the model more efficient and easier to interpret. A method for extracting aspect terms from product reviews is presented in the research paper. This method makes use of the Gini index, information gain, and feature selection in conjunction with the Machine learning classifiers. In the proposed method, which is referred to as wRMR, the Gini index and information gain are utilized for feature selection. Following that, machine learning classifiers are utilized in order to extract aspect terms from product reviews. A set of customer testimonials is used to assess how well the projected method works, and the findings indicate that in terms of the extraction of aspect terms, the method that has been proposed is superior to the method that has been traditionally used. In addition, the recommended approach is contrasted with methods that are currently thought of as being state-of-the-art, and the comparison reveals that the proposed method achieves superior performance compared to the other methods. In general, the method that was presented provides a promising solution for the extraction of aspect terms, and it can also be utilized for other natural language processing tasks

    Bibliometric of Feature Selection Using Optimization Techniques in Healthcare using Scopus and Web of Science Databases

    Get PDF
    Feature selection technique is an important step in the prediction and classification process, primarily in data mining related aspects or related to medical field. Feature selection is immersive with the errand of choosing a subset of applicable features that could be utilized in developing a prototype. Medical datasets are huge in size; hence some effective optimization techniques are required to produce accurate results. Optimization algorithms are a critical function in medical data mining particularly in identifying diseases since it offers excellent effectiveness in minimum computational expense and time. The classification algorithms also produce superior outcomes when an objective function is built using the feature selection algorithm. The solitary motive of the research paper analysis is to comprehend the reach and utility of optimization algorithms such as the Genetic Algorithm (GA), the Particle Swarm Optimization (PSO) and the Ant Colony Optimization (ACO) in the field of Health care. The aim is to bring efficiency and maximum optimization in the health care sector using the vast information that is already available related to these fields. With the help of data sets that are available in the health care analysis, our focus is to extract the most important features using optimization techniques and work on different algorithms so as to get the most optimized result. Precision largely depends on usefulness of features that are taken into consideration along with finding useful patterns in those features to characterize the main problem. The Performance of the optimized algorithm finds the overall optimum with less function evaluation. The principle target of this examination is to optimize feature selection technique to bring an optimized and efficient model to cater to various health issues. In this research paper, to do bibliometric analysis Scopus and Web of Science databases are used. This bibliometric analysis considers important keywords, datasets, significance of the considered research papers. It also gives details about types, sources of publications, yearly publication trends, significant countries from Scopus and Web of Science. Also, it captures details about co-appearing keywords, authors, source titles through networked diagrams. In a way, this research paper can be useful to researchers who want to contribute in the area of feature selection and optimization in healthcare. From this research paper it is observed that there is a lot scope for research for the considered research area. This kind of research will also be helpful for analyzing pandemic scenarios like COVID-19

    Multimodal forecasting methodology applied to industrial process monitoring

    Get PDF
    IEEE Industrial process modelling represents a key factor to allow the future generation of industrial manufacturing plants. In this regard, accurate models of critical signals need to be designed in order to forecast process deviations. In this work a novel multimodal forecasting methodology based on adaptive dynamics packaging and codification of the process operation is proposed. First, a target signal is decomposed by means of the Empirical Mode Decomposition in order to identify the characteristics intrinsic mode functions. Second, such dynamics are packaged depending on their significance and modelling complexity. Third, the operating condition of the considered process, reflected by available auxiliary signals, is codified by means of a Self-Organizing Map and presented to the modelling structure. The forecasting structure is supported by a set of ensemble ANFIS based models, each one focused on a different set of signal dynamics. The performance and effectiveness of the proposed method is validated experimentally with industrial data from a copper rod manufacturing plant and performance comparison with classical approaches. The proposed method improves performance and generalization versus classical single model approaches.Peer ReviewedPostprint (author's final draft

    Industrial time series modelling by means of the neo-fuzzy neuron

    Get PDF
    Abstract—Industrial process monitoring and modelling represents a critical step in order to achieve the paradigm of Zero Defect Manufacturing. The aim of this paper is to introduce the Neo-Fuzzy Neuron method to be applied in industrial time series modelling. Its open structure and input independency provides fast learning and convergence capabilities, while assuring a proper accuracy and generalization in the modelled output. First, the auxiliary signals in the database are analyzed in order to find correlations with the target signal. Second, the Neo-Fuzzy Neuron is configured and trained according by means of the auxiliary signal, past instants and dynamics information of the target signal. The proposed method is validated by means of real data from a Spanish copper rod industrial plant, in which a critical signal regarding copper refrigeration process is modelled. The obtained results indicate the suitability of the Neo-Fuzzy Neuron method for industrial process modelling.Postprint (published version

    Migrating Birds Optimization-Based Feature Selection for Text Classification

    Full text link
    This research introduces a novel approach, MBO-NB, that leverages Migrating Birds Optimization (MBO) coupled with Naive Bayes as an internal classifier to address feature selection challenges in text classification having large number of features. Focusing on computational efficiency, we preprocess raw data using the Information Gain algorithm, strategically reducing the feature count from an average of 62221 to 2089. Our experiments demonstrate MBO-NB's superior effectiveness in feature reduction compared to other existing techniques, emphasizing an increased classification accuracy. The successful integration of Naive Bayes within MBO presents a well-rounded solution. In individual comparisons with Particle Swarm Optimization (PSO), MBO-NB consistently outperforms by an average of 6.9% across four setups. This research offers valuable insights into enhancing feature selection methods, providing a scalable and effective solution for text classificatio

    Enhancing feature selection with a novel hybrid approach incorporating genetic algorithms and swarm intelligence techniques

    Get PDF
    Computing advances in data storage are leading to rapid growth in large-scale datasets. Using all features increases temporal/spatial complexity and negatively influences performance. Feature selection is a fundamental stage in data preprocessing, removing redundant and irrelevant features to minimize the number of features and enhance the performance of classification accuracy. Numerous optimization algorithms were employed to handle feature selection (FS) problems, and they outperform conventional FS techniques. However, there is no metaheuristic FS method that outperforms other optimization algorithms in many datasets. This motivated our study to incorporate the advantages of various optimization techniques to obtain a powerful technique that outperforms other methods in many datasets from different domains. In this article, a novel combined method GASI is developed using swarm intelligence (SI) based feature selection techniques and genetic algorithms (GA) that uses a multi-objective fitness function to seek the optimal subset of features. To assess the performance of the proposed approach, seven datasets have been collected from the UCI repository and exploited to test the newly established feature selection technique. The experimental results demonstrate that the suggested method GASI outperforms many powerful SI-based feature selection techniques studied. GASI obtains a better average fitness value and improves classification performance

    Seleção de atributos usando árvores de decisão não-binárias

    Get PDF
    Mestrado em Engenharia Eletrónica e InformáticaExame público realizado em 22 de Maio de 2018A aprendizagem automática, área integrada na inteligência artificial, possui como principal objetivo a criação e o desenvolvimento de métodos e algoritmos que possuam capacidades comummente associadas aos humanos, como a aquisição e a descoberta de novos factos ou conhecimentos. Quando comparado com humanos, as principais vantagens da implementação destes métodos estão normalmente associadas a otimizações temporais e monetárias. Este trabalho apresenta um estudo de seleção de atributos/características e capacidade de previsão/classificação aplicado à monitorização de condições de ferramentas de corte (desgaste de ferramentas) e a classificação de potenciais novos clientes para serviços bancários (telemarketing bancário), usando as árvores de decisão ID3 com a capacidade de lidar com variáveis contínuas – algoritmo adaptado neste trabalho. Os resultados obtidos demonstram que este algoritmo, em comparação com as árvores de decisão convencionais, para conjuntos de dados reduzidos, apresenta o melhor desempenho. A seleção de atributos realizada pelo algoritmo adaptado provou ser uma mais-valia, quer seja para posterior classificação com a aplicação do algoritmo desenvolvido ou com a aplicação de outros algoritmos de referência na área de aprendizagem automática. Os resultados obtidos dos conjuntos de dados do desgaste de ferramentas e do telemarketing bancário apresentam uma redução de 15 para 5 e de 19 para 15 atributos, respetivamente. Em ambos os estudos ficou demonstrada a eficácia desta abordagem bem como a aplicabilidade na seleção de atributos de forma simples e transparente, mesmo na presença de dados com ruído.Machine learning, an area integrated in artificial intelligence, has as main objective the creation and development of methods and algorithms that have abilities commonly associated with humans, such as the acquisition and discovery of new facts or knowledge. When compared to humans, the main advantages of implementing these methods are usually associated with temporal and monetary optimizations. To this end, there are several models/algorithms, such as decision trees, neural networks and support vector machines, performing tasks that can also be different, such as classification and selection of attributes. In order to overcome inherent limitations to the ID3 decision trees, in relation to the manipulation of continuous variables and viability test, in this work an adaptation of the original algorithm was developed and implemented, using the same metrics, allowing, however, its application in data sets with continuous variables. This work presents a study of selection of attributes/characteristics and prediction/classification capacity applied to the monitoring of cutting tool conditions (tool wear) and the classification of potential new clients for banking services (banking telemarketing) using ID3 decision with the ability to handle continuous variables. The results show that this algorithm, in comparison to the conventional decision trees, namely the algorithms C4.5, CART and Random Forest, for reduced datasets, presents the best performance, with an improvement of 12.5% to 25%. For large data sets, despite having the lowest rating value, the difference is not at all relevant (-2%). The developed algorithm stands out because it allows a detailed analysis, contrary to C4.5 and CART that allow a general analysis. This is due to the way algorithms deal with and perform divisions when working with continuous variables. The selection of attributes performed by the adapted algorithm proved to be an asset, either for later classification with the application of the developed algorithm or with the application of other reference algorithms in the area of machine learning. The results obtained from tool wear data sets and bank telemarketing show a reduction from 15 to 5 and from 19 to 15 attributes, respectively. The applicability of decision trees has been proven both in the monitoring of multisensor processes and in the classification of new clients with continuous variables. This approach also revealed that decision trees can be applied for the purpose of selecting attributes in a simple and transparent way, even in the presence of noise data
    corecore