Search CORE

7 research outputs found

The influence of the keywords selection method on the effectiveness of the web pages classification using the boosting algorithm

Author: Czajkowski Krzysztof
Gąciarz Tomasz
Publication venue: Czasopismo Techniczne
Publication date: 07/05/2015
Field of study

The paper concerns the issues of web pages analysis process. The classification is performed based on the analysis of the structure as well content of pages. Various characteristics are taken into account including inter alia, structural, visual, text, web and links features. During the construction of classifiers the AdaBoost algorithm was applied. This paper focuses on the impact of keyword selection methods on the effectiveness of the classification process

Portal Czasopism Naukowych (E-Journals)

Distributed learning automata-based scheme for classification using novel pursuit scheme

Author: Goodwin Morten
Yazidi Anis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Author's accepted manuscript.Available from 03/03/2021.This is a post-peer-review, pre-copyedit version of an article published in Applied Intelligence. The final authenticated version is available online at: http://dx.doi.org/10.1007/s10489-019-01627-w.acceptedVersio

Agder University Research Archive

An Ant Colony Optimization Based Feature Selection for Web Page Classification

Author: Esra Saraç
Selma Ayşe Özel
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines’ performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods

Çukurova University Institutional Repository

Crossref

Directory of Open Access Journals

PubMed Central

An Investigation Into a Hybrid Genetic Programming and Ant Colony Optimization Method for Credit Scoring

Author: Aliehyaei Rojin
Publication venue: CSU ePress
Publication date: 01/01/2012
Field of study

This thesis proposes and investigates a new hybrid technique based on Genetic Programming (GP) and Ant Colony Optimization (ACO) techniques for inducing data classification rules. The proposed hybrid approach aims to improve on the accuracy of data classification rules produced by the original GP technique, which uses randomly generated initial populations. This hybrid technique relies on the ACO technique to produce the initial populations for the GP technique. To evaluate and compare their effectiveness in producing good data classification rules, GP, ACO, and hybrid techniques were implemented in the C programming language. The data classification rules were created and evaluated by executing these codes with two datasets for credit scoring problems, widely known as the Australian and German datasets, available from the Machine Learning Repository at the University of California, Irvine. The experimental results demonstrate that although all tree techniques yield similar accuracy during testing, on average, the hybrid ACO-GP approach performs better than either GP or ACO during training

Columbus State University

Web page classification with an ant colony algorithm

Author: A. Abraham
I.H. Witten
K.M. Hoe
M. Cutler
M. Dorigo
R.S. Parpinelli
R.S. Parpinelli
S. Chakrabarti Mining
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

This paper utilizes Ant-Miner - the first Ant Colony algorithm for discovering classification rules - in the field of web content mining, and shows that it is more effective than C5.0 in two sets of BBC and Yahoo web pages used in our experiments. It also investigates the benefits and dangers of several linguistics-based text preprocessing techniques to reduce the large numbers of attributes associated with web content mining

CiteSeerX

Crossref

Kent Academic Repository

Anomaly detection in unknown environments using wireless sensor networks

Author: Li YuanYuan
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/05/2010
Field of study

This dissertation addresses the problem of distributed anomaly detection in Wireless Sensor Networks (WSN). A challenge of designing such systems is that the sensor nodes are battery powered, often have different capabilities and generally operate in dynamic environments. Programming such sensor nodes at a large scale can be a tedious job if the system is not carefully designed. Data modeling in distributed systems is important for determining the normal operation mode of the system. Being able to model the expected sensor signatures for typical operations greatly simplifies the human designer’s job by enabling the system to autonomously characterize the expected sensor data streams. This, in turn, allows the system to perform autonomous anomaly detection to recognize when unexpected sensor signals are detected. This type of distributed sensor modeling can be used in a wide variety of sensor networks, such as detecting the presence of intruders, detecting sensor failures, and so forth. The advantage of this approach is that the human designer does not have to characterize the anomalous signatures in advance. The contributions of this approach include: (1) providing a way for a WSN to autonomously model sensor data with no prior knowledge of the environment; (2) enabling a distributed system to detect anomalies in both sensor signals and temporal events online; (3) providing a way to automatically extract semantic labels from temporal sequences; (4) providing a way for WSNs to save communication power by transmitting compressed temporal sequences; (5) enabling the system to detect time-related anomalies without prior knowledge of abnormal events; and, (6) providing a novel missing data estimation method that utilizes temporal and spatial information to replace missing values. The algorithms have been designed, developed, evaluated, and validated experimentally in synthesized data, and in real-world sensor network applications

University of Tennessee, Knoxville: Trace

Web Page Classification with an Ant Colony Algorithm

Author
Publication venue
Publication date
Field of study

As the amount of information available on the internet grows so does the need for more effective data analysis methods. This paper utilizes the Ant Miner algorithm in the field of web content classification and shows that it is more effective than C5.0. It also investigates the benefits and dangers of methods to reduce the large numbers of attributes associated with web content mining such as a naïve WordNet preprocessing stage. 1

CiteSeerX