219 research outputs found

    Effective Feature Selection Methods for User Sentiment Analysis using Machine Learning

    Get PDF
    Text classification is the method of allocating a particular piece of text to one or more of a number of predetermined categories or labels. This is done by training a machine learning model on a labeled dataset, where the texts and their corresponding labels are provided. The model then learns to predict the labels of new, unseen texts. Feature selection is a significant step in text classification as it helps to identify the most relevant features or words in the text that are useful for predicting the label. This can include things like specific keywords or phrases, or even the frequency or placement of certain words in the text. The performance of the model can be improved by focusing on the features that are most important to the information that is most likely to be useful for classification. Additionally, feature selection can also help to reduce the dimensionality of the dataset, making the model more efficient and easier to interpret. A method for extracting aspect terms from product reviews is presented in the research paper. This method makes use of the Gini index, information gain, and feature selection in conjunction with the Machine learning classifiers. In the proposed method, which is referred to as wRMR, the Gini index and information gain are utilized for feature selection. Following that, machine learning classifiers are utilized in order to extract aspect terms from product reviews. A set of customer testimonials is used to assess how well the projected method works, and the findings indicate that in terms of the extraction of aspect terms, the method that has been proposed is superior to the method that has been traditionally used. In addition, the recommended approach is contrasted with methods that are currently thought of as being state-of-the-art, and the comparison reveals that the proposed method achieves superior performance compared to the other methods. In general, the method that was presented provides a promising solution for the extraction of aspect terms, and it can also be utilized for other natural language processing tasks

    Simple but Effective Unsupervised Classification for Specified Domain Images: A Case Study on Fungi Images

    Full text link
    High-quality labeled datasets are essential for deep learning. Traditional manual annotation methods are not only costly and inefficient but also pose challenges in specialized domains where expert knowledge is needed. Self-supervised methods, despite leveraging unlabeled data for feature extraction, still require hundreds or thousands of labeled instances to guide the model for effective specialized image classification. Current unsupervised learning methods offer automatic classification without prior annotation but often compromise on accuracy. As a result, efficiently procuring high-quality labeled datasets remains a pressing challenge for specialized domain images devoid of annotated data. Addressing this, an unsupervised classification method with three key ideas is introduced: 1) dual-step feature dimensionality reduction using a pre-trained model and manifold learning, 2) a voting mechanism from multiple clustering algorithms, and 3) post-hoc instead of prior manual annotation. This approach outperforms supervised methods in classification accuracy, as demonstrated with fungal image data, achieving 94.1% and 96.7% on public and private datasets respectively. The proposed unsupervised classification method reduces dependency on pre-annotated datasets, enabling a closed-loop for data classification. The simplicity and ease of use of this method will also bring convenience to researchers in various fields in building datasets, promoting AI applications for images in specialized domains

    Harnessing Deep Learning Techniques for Text Clustering and Document Categorization

    Get PDF
    This research paper delves into the realm of deep text clustering algorithms with the aim of enhancing the accuracy of document classification. In recent years, the fusion of deep learning techniques and text clustering has shown promise in extracting meaningful patterns and representations from textual data. This paper provides an in-depth exploration of various deep text clustering methodologies, assessing their efficacy in improving document classification accuracy. Delving into the core of deep text clustering, the paper investigates various feature representation techniques, ranging from conventional word embeddings to contextual embeddings furnished by BERT and GPT models.By critically reviewing and comparing these algorithms, we shed light on their strengths, limitations, and potential applications. Through this comprehensive study, we offer insights into the evolving landscape of document analysis and classification, driven by the power of deep text clustering algorithms.Through an original synthesis of existing literature, this research serves as a beacon for researchers and practitioners in harnessing the prowess of deep learning to enhance the accuracy of document classification endeavors

    Enhancing Intrusion Detection Systems with a Hybrid Deep Learning Model and Optimized Feature Composition

    Get PDF
    Systems for detecting intrusions (IDS) are essential for protecting network infrastructures from hostile activity. Advanced methods are required since traditional IDS techniques frequently fail to properly identify sophisticated and developing assaults. In this article, we suggest a novel method for improving IDS performance through the use of a hybrid deep learning model and feature composition optimization. RNN and CNN has strengths that the proposed hybrid deep learning model leverages to efficiently capture both spatial and temporal correlations in network traffic data. The model can extract useful features from unprocessed network packets using CNNs and RNNs, giving a thorough picture of network behaviour. To increase the IDS's ability to discriminate, we also offer feature optimization strategies. We uncover the most pertinent and instructive features that support precise intrusion detection through a methodical feature selection and engineering process. In order to reduce the computational load and improve the model's efficiency without compromising detection accuracy, we also use dimensionality reduction approaches. We carried out extensive experiments using a benchmark dataset that is frequently utilized in intrusion detection research to assess the suggested approach. The outcomes show that the hybrid deep learning model performs better than conventional IDS methods, obtaining noticeably greater detection rates and lower false positive rates. The performance of model is further improved by the optimized feature composition, which offers a more accurate depiction of network traffic patterns

    A Review of Feature Selection and Classification Approaches for Heart Disease Prediction

    Get PDF
    Cardiovascular disease has been the number one illness to cause death in the world for years. As information technology develops, many researchers have conducted studies on a computer-assisted diagnosis for heart disease. Predicting heart disease using a computer-assisted system can reduce time and costs. Feature selection can be used to choose the most relevant variables for heart disease. It includes filter, wrapper, embedded, and hybrid. The filter method excels in computation speed. The wrapper and embedded methods consider feature dependencies and interact with classifiers. The hybrid method takes advantage of several methods. Classification is a data mining technique to predict heart disease. It includes traditional machine learning, ensemble learning, hybrid, and deep learning. Traditional machine learning uses a specific algorithm. The ensemble learning combines the predictions of multiple classifiers to improve the performance of a single classifier. The hybrid approach combines some techniques and takes advantage of each method. Deep learning does not require a predetermined feature engineering. This research provides an overview of feature selection and classification methods for the prediction of heart disease in the last ten years. Thus, it can be used as a reference in choosing a method for heart disease prediction for future research

    Face Emotion Recognition Based on Machine Learning: A Review

    Get PDF
    Computers can now detect, understand, and evaluate emotions thanks to recent developments in machine learning and information fusion. Researchers across various sectors are increasingly intrigued by emotion identification, utilizing facial expressions, words, body language, and posture as means of discerning an individual's emotions. Nevertheless, the effectiveness of the first three methods may be limited, as individuals can consciously or unconsciously suppress their true feelings. This article explores various feature extraction techniques, encompassing the development of machine learning classifiers like k-nearest neighbour, naive Bayesian, support vector machine, and random forest, in accordance with the established standard for emotion recognition. The paper has three primary objectives: firstly, to offer a comprehensive overview of effective computing by outlining essential theoretical concepts; secondly, to describe in detail the state-of-the-art in emotion recognition at the moment; and thirdly, to highlight important findings and conclusions from the literature, with an emphasis on important obstacles and possible future paths, especially in the creation of state-of-the-art machine learning algorithms for the identification of emotions

    Supporting feature-level software maintenance

    Get PDF
    Software maintenance is the process of modifying a software system to fix defects, improve performance, add new functionality, or adapt the system to a new environment. A maintenance task is often initiated by a bug report or a request for new functionality. Bug reports typically describe problems with incorrect behaviors or functionalities. These behaviors or functionalities are known as features. Even in very well-designed systems, the source code that implements features is often not completely modularized. The delocalized nature of features makes maintaining them challenging. Since maintenance tasks are expressed in terms of features, the goal of this dissertation is to support software maintenance at the feature-level. We focus on two tasks in particular: feature location and impact analysis via feature coupling.;Feature location is the process of identifying the source code that implements a feature, and it is an essential first step to any maintenance task. There are many existing techniques for feature location that incorporate various types of analyses such as static, dynamic, and textual. In this dissertation, we recognize the advantages of leveraging several types of analyses and introduce a new approach to feature location based on combining dynamic analysis, textual analysis, and web mining algorithms applied to software. The use of web mining for feature location is a novel contribution, and we show that our new techniques based on web mining are significantly more effective than the current state of the art.;After using feature location to identify a feature\u27s source code, maintenance can be completed on that feature. Impact analysis should then be performed to revalidate the system and determine which other features may have been affected by the modifications. We define three feature coupling metrics that capture the relationship between features based on structural information, textual information, and their combination. Our novel feature coupling metrics can be used for impact analysis to quantify the strength of coupling between pairs of features. We performed three empirical studies on open-source software systems to assess the feature coupling metrics and established three major results. First, there is a moderate to strong statistically significant correlation between feature coupling and faults. Second, feature coupling can be used to correctly determine about half of the other features that would be affected by a change to a given feature. Finally, we found that the metrics align with developers\u27 opinions about pairs of features that are actually coupled

    A systematic review on artifact removal and classification techniques for enhanced MEG-based BCI systems

    Get PDF
    Neurological disease victims may be completely paralyzed and unable to move, but they may still be able to think. Their brain activity is the only means by which they can interact with their environment. Brain-Computer Interface (BCI) research attempts to create tools that support subjects with disabilities. Furthermore, BCI research has expanded rapidly over the past few decades as a result of the interest in creating a new kind of human-to-machine communication. As magnetoencephalography (MEG) has superior spatial and temporal resolution than other approaches, it is being utilized to measure brain activity non-invasively. The recorded signal includes signals related to brain activity as well as noise and artifacts from numerous sources. MEG can have a low signal-to-noise ratio because the magnetic fields generated by cortical activity are small compared to other artifacts and noise. By using the right techniques for noise and artifact detection and removal, the signal-to-noise ratio can be increased. This article analyses various methods for removing artifacts as well as classification strategies. Additionally, this offers a study of the influence of Deep Learning models on the BCI system. Furthermore, the various challenges in collecting and analyzing MEG signals as well as possible study fields in MEG-based BCI are examined

    Automated extraction of seed morphological traits from images

    Get PDF
    The description of biological objects, such as seeds, mainly relies on manual measurements of few characteristics, and on visual classification of structures, both of which can be subjective, error prone and time-consuming. Image analysis tools offer means to address these shortcomings, but we currently lack a method capable of automatically handling seeds from different taxa with varying morphological attributes and obtaining interpretable results. Here, we provide a simple image acquisition and processing protocol and introduce Traitor, an open-source software available as a command-line interface (CLI), which automates the extraction of seed morphological traits from images. The workflow for trait extraction consists of scanning seeds against a high-contrast background, correcting image colours, and analysing images with the software. Traitor is capable of processing hundreds of images of varied taxa simultaneously with just three commands, and without a need for training, manual fine-tuning or thresholding. The software automatically detects each object in the image and extracts size measurements, traditional morphometric descriptors widely used by scientists and practitioners, standardised shape coordinates, and colorimetric measurements. The method was tested on a dataset comprising of 91,667 images of seeds from 1228 taxa. Traitor's extracted average length and width values closely matched the average manual measurements obtained from the same collection (concordance correlation coefficient of 0.98). Further, we used a large image dataset to demonstrate how Traitor's output can be used to obtain representative seed colours for taxa, determine the phylogenetic signal of seed colour, and build objective classification categories for shape with high levels of visual interpretability. Our approach increases productivity and allows for large-scale analyses that would otherwise be unfeasible. Traitor enables the acquisition of data that are readily comparable across different taxa, opening new avenues to explore functional relevance of morphological traits and to advance on new tools for seed identification

    Data Mining in Internet of Things Systems: A Literature Review

    Get PDF
    The Internet of Things (IoT) and cloud technologies have been the main focus of recent research, allowing for the accumulation of a vast amount of data generated from this diverse environment. These data include without any doubt priceless knowledge if could correctly discovered and correlated in an efficient manner. Data mining algorithms can be applied to the Internet of Things (IoT) to extract hidden information from the massive amounts of data that are generated by IoT and are thought to have high business value. In this paper, the most important data mining approaches covering classification, clustering, association analysis, time series analysis, and outlier analysis from the knowledge will be covered. Additionally, a survey of recent work in in this direction is included. Another significant challenges in the field are collecting, storing, and managing the large number of devices along with their associated features. In this paper, a deep look on the data mining for the IoT platforms will be given concentrating on real applications found in the literatur
    • …
    corecore