7 research outputs found

    The influence of the keywords selection method on the effectiveness of the web pages classification using the boosting algorithm

    Get PDF
    The paper concerns the issues of web pages analysis process. The classification is performed based on the analysis of the structure as well content of pages. Various characteristics are taken into account including inter alia, structural, visual, text, web and links features. During the construction of classifiers the AdaBoost algorithm was applied. This paper focuses on the impact of keyword selection methods on the effectiveness of the classification process

    Distributed learning automata-based scheme for classification using novel pursuit scheme

    Get PDF
    Author's accepted manuscript.Available from 03/03/2021.This is a post-peer-review, pre-copyedit version of an article published in Applied Intelligence. The final authenticated version is available online at: http://dx.doi.org/10.1007/s10489-019-01627-w.acceptedVersio

    An Ant Colony Optimization Based Feature Selection for Web Page Classification

    Get PDF
    The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines’ performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods

    An Investigation Into a Hybrid Genetic Programming and Ant Colony Optimization Method for Credit Scoring

    Get PDF
    This thesis proposes and investigates a new hybrid technique based on Genetic Programming (GP) and Ant Colony Optimization (ACO) techniques for inducing data classification rules. The proposed hybrid approach aims to improve on the accuracy of data classification rules produced by the original GP technique, which uses randomly generated initial populations. This hybrid technique relies on the ACO technique to produce the initial populations for the GP technique. To evaluate and compare their effectiveness in producing good data classification rules, GP, ACO, and hybrid techniques were implemented in the C programming language. The data classification rules were created and evaluated by executing these codes with two datasets for credit scoring problems, widely known as the Australian and German datasets, available from the Machine Learning Repository at the University of California, Irvine. The experimental results demonstrate that although all tree techniques yield similar accuracy during testing, on average, the hybrid ACO-GP approach performs better than either GP or ACO during training

    Web page classification with an ant colony algorithm

    Get PDF
    This paper utilizes Ant-Miner - the first Ant Colony algorithm for discovering classification rules - in the field of web content mining, and shows that it is more effective than C5.0 in two sets of BBC and Yahoo web pages used in our experiments. It also investigates the benefits and dangers of several linguistics-based text preprocessing techniques to reduce the large numbers of attributes associated with web content mining

    Anomaly detection in unknown environments using wireless sensor networks

    Get PDF
    This dissertation addresses the problem of distributed anomaly detection in Wireless Sensor Networks (WSN). A challenge of designing such systems is that the sensor nodes are battery powered, often have different capabilities and generally operate in dynamic environments. Programming such sensor nodes at a large scale can be a tedious job if the system is not carefully designed. Data modeling in distributed systems is important for determining the normal operation mode of the system. Being able to model the expected sensor signatures for typical operations greatly simplifies the human designer’s job by enabling the system to autonomously characterize the expected sensor data streams. This, in turn, allows the system to perform autonomous anomaly detection to recognize when unexpected sensor signals are detected. This type of distributed sensor modeling can be used in a wide variety of sensor networks, such as detecting the presence of intruders, detecting sensor failures, and so forth. The advantage of this approach is that the human designer does not have to characterize the anomalous signatures in advance. The contributions of this approach include: (1) providing a way for a WSN to autonomously model sensor data with no prior knowledge of the environment; (2) enabling a distributed system to detect anomalies in both sensor signals and temporal events online; (3) providing a way to automatically extract semantic labels from temporal sequences; (4) providing a way for WSNs to save communication power by transmitting compressed temporal sequences; (5) enabling the system to detect time-related anomalies without prior knowledge of abnormal events; and, (6) providing a novel missing data estimation method that utilizes temporal and spatial information to replace missing values. The algorithms have been designed, developed, evaluated, and validated experimentally in synthesized data, and in real-world sensor network applications

    Web Page Classification with an Ant Colony Algorithm

    No full text
    As the amount of information available on the internet grows so does the need for more effective data analysis methods. This paper utilizes the Ant Miner algorithm in the field of web content classification and shows that it is more effective than C5.0. It also investigates the benefits and dangers of methods to reduce the large numbers of attributes associated with web content mining such as a naïve WordNet preprocessing stage. 1
    corecore