15 research outputs found

    Naive Bayes vs. Decision Trees vs. Neural Networks in the Classification of Training Web Pages

    Get PDF
    Web classification has been attempted through many different technologies. In this study we concentrate on the comparison of Neural Networks (NN), NaĂŻve Bayes (NB) and Decision Tree (DT) classifiers for the automatic analysis and classification of attribute data from training course web pages. We introduce an enhanced NB classifier and run the same data sample through the DT and NN classifiers to determine the success rate of our classifier in the training courses domain. This research shows that our enhanced NB classifier not only outperforms the traditional NB classifier, but also performs similarly as good, if not better, than some more popular, rival techniques. This paper also shows that, overall, our NB classifier is the best choice for the training courses domain, achieving an impressive F-Measure value of over 97%, despite it being trained with fewer samples than any of the classification systems we have encountered

    Distributed Generation with Improved Remuneration

    Get PDF
    There are currently efforts to implement the concept of smart grids throughout the electric sector. This will bring radical changes to the entire management of the sector. The energy market does not run away from the rule. In this way, virtual power players will be required to update their business models to introduce all the concepts that the context of smart grids imposes. Thus, in this article is proposed a method that aggregates distributed generation and consumers who belong to demand response programs. Optimized scheduling, resource aggregation and classification of possible new resources, rescheduling, and remuneration are the phases of the methodology proposed and presented in this article. The focus will be on classification phase and the main objective is to create rules, through a previously trained model, to be able to classify the new resources and help with the challenges that virtual power players may face. Thus, five classification methods were tested and compared: neural networks, Bayesian naĂŻve classification, decision trees, k-nearest neighbor method, and lastly support vector machine method.The present work was done and funded in the scope of the following projects: CONTEST Project (P2020-23575), and UID/EEA/00760/2013 funded by FEDER Funds through COMPETE program and by National Funds through FCT.info:eu-repo/semantics/publishedVersio

    Sentiment Analysis of Public Opinion Towards Tourism in Bangkalan Regency Using NaĂŻve Bayes Method

    Get PDF
    Sentiment analysis is natural language processing (NLP) that uses text analysis to recognize and extract opinions in text. Analysis is used to convert unstructured information into more structured information, also to determine whether an object has a positive, negative, or neutral tendency, and is an effort to facilitate decision making for tourism managers as a recommendation in developing tourist attractions. In this study, opinions were conducted on tourism reviews in Bangkalan using the NaĂŻve Bayes method. This method is a machine learning algorithm to classify text into concepts that are easy to understand and provide accurate results with high efficiency. This method is proven to provide excellent results with a high level of accuracy, especially for large data, but has some drawbacks, sensitive to feature selection. Thus, a feature selection process is needed to improve classification efficiency by reducing the amount of data analyzed, with the Information Gain feature selection method. The word weighting method uses TF-IDF, while the data used comes from google maps reviews taken through web scraping, where tourist visitors provide reviews and ratings of places that have been visited. However, the large number of reviews can make it difficult for tourist attractions managers to manage them, so the process of labeling the sentiment class of the review data obtained 3649 reviews, with 2583 positive, 275 negative, and 457 neutral. Based on the test results that have been carried out using the Information Gain threshold of 0.0001, 0.0003, and 0.0007 can improve the accuracy of the NaĂŻve Bayes model, for the best test at threshold 0.0007, with an accuracy value of 78.68%, precision 80.44%, recall 82.59%, and f1-score 82.53%, from the test results it shows that the use of information gain feature selection and SMOTE technique has a fairly good performance in classifying public opinion sentiment data on tourism in Bangkalan Regency, meaning that tourism management is good seen from the results of visitor satisfaction sentiment

    Mixture-Based Probabilistic Graphical Models for the Label Ranking Problem

    Get PDF
    The goal of the Label Ranking (LR) problem is to learn preference models that predict the preferred ranking of class labels for a given unlabeled instance. Different well-known machine learning algorithms have been adapted to deal with the LR problem. In particular, fine-tuned instance-based algorithms (e.g., k-nearest neighbors) and model-based algorithms (e.g., decision trees) have performed remarkably well in tackling the LR problem. Probabilistic Graphical Models (PGMs, e.g., Bayesian networks) have not been considered to deal with this problem because of the difficulty of modeling permutations in that framework. In this paper, we propose a Hidden Naive Bayes classifier (HNB) to cope with the LR problem. By introducing a hidden variable, we can design a hybrid Bayesian network in which several types of distributions can be combined: multinomial for discrete variables, Gaussian for numerical variables, and Mallows for permutations. We consider two kinds of probabilistic models: one based on a Naive Bayes graphical structure (where only univariate probability distributions are estimated for each state of the hidden variable) and another where we allow interactions among the predictive attributes (using a multivariate Gaussian distribution for the parameter estimation). The experimental evaluation shows that our proposals are competitive with the start-of-the-art algorithms in both accuracy and in CPU time requirements

    Predicting customer responses to direct marketing : a Bayesian approach

    Full text link
    Direct marketing problems have been intensively reviewed in the marketing literature recently, such as purchase frequency and time, sales profit, and brand choices. However, modeling the customer response, which is an important issue in direct marketing research, remains a significant challenge. This thesis is an empirical study of predicting customer response to direct marketing and applies a Bayesian approach, including the Bayesian Binary Regression (BBR) and the Hierarchical Bayes (HB). Other classical methods, such as Logistic Regression and Latent Class Analysis (LCA), have been conducted for the purpose of comparison. The results of comparing the performance of all these techniques suggest that the Bayesian methods are more appropriate in predicting direct marketing customer responses. Specifically, when customers are analyzed as a whole group, the Bayesian Binary Regression (BBR) has greater predictive accuracy than Logistic Regression. When we consider customer heterogeneity, the Hierarchical Bayes (HB) models, which use demographic and geographic variables for clustering, do not match the performance of Latent Class Analysis (LCA). Further analyses indicate that when latent variables are used for clustering, the Hierarchical Bayes (HB) approach has the highest predictive accuracy

    Automated retrieval and extraction of training course information from unstructured web pages

    Get PDF
    Web Information Extraction (WIE) is the discipline dealing with the discovery, processing and extraction of specific pieces of information from semi-structured or unstructured web pages. The World Wide Web comprises billions of web pages and there is much need for systems that will locate, extract and integrate the acquired knowledge into organisations practices. There are some commercial, automated web extraction software packages, however their success comes from heavily involving their users in the process of finding the relevant web pages, preparing the system to recognise items of interest on these pages and manually dealing with the evaluation and storage of the extracted results. This research has explored WIE, specifically with regard to the automation of the extraction and validation of online training information. The work also includes research and development in the area of automated Web Information Retrieval (WIR), more specifically in Web Searching (or Crawling) and Web Classification. Different technologies were considered, however after much consideration, NaĂŻve Bayes Networks were chosen as the most suitable for the development of the classification system. The extraction part of the system used Genetic Programming (GP) for the generation of web extraction solutions. Specifically, GP was used to evolve Regular Expressions, which were then used to extract specific training course information from the web such as: course names, prices, dates and locations. The experimental results indicate that all three aspects of this research perform very well, with the Web Crawler outperforming existing crawling systems, the Web Classifier performing with an accuracy of over 95% and a precision of over 98%, and the Web Extractor achieving an accuracy of over 94% for the extraction of course titles and an accuracy of just under 67% for the extraction of other course attributes such as dates, prices and locations. Furthermore, the overall work is of great significance to the sponsoring company, as it simplifies and improves the existing time-consuming, labour-intensive and error-prone manual techniques, as will be discussed in this thesis. The prototype developed in this research works in the background and requires very little, often no, human assistance
    corecore