12,308 research outputs found
Automated retrieval and extraction of training course information from unstructured web pages
Web Information Extraction (WIE) is the discipline dealing with the discovery, processing and extraction of specific pieces of information from semi-structured or unstructured web pages. The World Wide Web comprises billions of web pages and there is much need for systems that will locate, extract and integrate the acquired knowledge into organisations practices. There are some commercial, automated web extraction software packages, however their success comes from heavily involving their users in the process of finding the relevant web pages, preparing the system to recognise items of interest on these pages and manually dealing with the evaluation and storage of the extracted results.
This research has explored WIE, specifically with regard to the automation of the extraction and validation of online training information. The work also includes research and development in the area of automated Web Information Retrieval (WIR), more specifically in Web Searching (or Crawling) and Web Classification. Different technologies were considered, however after much consideration, Naïve Bayes Networks were chosen as the most suitable for the development of the classification system. The extraction part of the system used Genetic Programming (GP) for the generation of web extraction solutions. Specifically, GP was used to evolve Regular Expressions, which were then used to extract specific training course information from the web such as: course names, prices, dates and locations.
The experimental results indicate that all three aspects of this research perform very well, with the Web Crawler outperforming existing crawling systems, the Web Classifier performing with an accuracy of over 95% and a precision of over 98%, and the Web Extractor achieving an accuracy of over 94% for the extraction of course titles and an accuracy of just under 67% for the extraction of other course attributes such as dates, prices and locations. Furthermore, the overall work is of great significance to the sponsoring company, as it simplifies and improves the existing time-consuming, labour-intensive and error-prone manual techniques, as will be discussed in this thesis. The prototype developed in this research works in the background and requires very little, often no, human assistance
Genetic programming for the automatic design of controllers for a surface ship
In this paper, the implementation of genetic programming (GP) to design a contoller structure is assessed. GP is used to evolve control strategies that, given the current and desired state of the propulsion and heading dynamics of a supply ship as inputs, generate the command forces required to maneuver the ship. The controllers created using GP are evaluated through computer simulations and real maneuverability tests in a laboratory water basin facility. The robustness of each controller is analyzed through the simulation of environmental disturbances. In addition, GP runs in the presence of disturbances are carried out so that the different controllers obtained can be compared. The particular vessel used in this paper is a scale model of a supply ship called CyberShip II. The results obtained illustrate the benefits of using GP for the automatic design of propulsion and navigation controllers for surface ships
Feature selection for modular GA-based classification
Genetic algorithms (GAs) have been used as conventional methods for classifiers to adaptively evolve solutions for classification problems. Feature selection plays an important role in finding relevant features in classification. In this paper, feature selection is explored with modular GA-based classification. A new feature selection technique, Relative Importance Factor (RIF), is proposed to find less relevant features in the input domain of each class module. By removing these features, it is aimed to reduce the classification error and dimensionality of classification problems. Benchmark classification data sets are used to evaluate the proposed approach. The experiment results show that RIF can be used to find less relevant features and help achieve lower classification error with the feature space dimension reduced
Recommended from our members
An intelligent system for risk classification of stock investment projects
The proposed paper demonstrates that a hybrid fuzzy neural network can serve as a risk classifier of stock investment projects. The training algorithm for the regular part of the network is based on bidirectional incremental evolution proving more efficient than direct evolution. The approach is compared with other crisp and soft investment appraisal and trading techniques, while building a multimodel domain representation for an intelligent decision support system. Thus the advantages of each model are utilised while looking at the investment problem from different perspectives. The empirical results are based on UK companies traded on the London Stock Exchange
Limited Evaluation Cooperative Co-evolutionary Differential Evolution for Large-scale Neuroevolution
Many real-world control and classification tasks involve a large number of
features. When artificial neural networks (ANNs) are used for modeling these
tasks, the network architectures tend to be large. Neuroevolution is an
effective approach for optimizing ANNs; however, there are two bottlenecks that
make their application challenging in case of high-dimensional networks using
direct encoding. First, classic evolutionary algorithms tend not to scale well
for searching large parameter spaces; second, the network evaluation over a
large number of training instances is in general time-consuming. In this work,
we propose an approach called the Limited Evaluation Cooperative
Co-evolutionary Differential Evolution algorithm (LECCDE) to optimize
high-dimensional ANNs.
The proposed method aims to optimize the pre-synaptic weights of each
post-synaptic neuron in different subpopulations using a Cooperative
Co-evolutionary Differential Evolution algorithm, and employs a limited
evaluation scheme where fitness evaluation is performed on a relatively small
number of training instances based on fitness inheritance. We test LECCDE on
three datasets with various sizes, and our results show that cooperative
co-evolution significantly improves the test error comparing to standard
Differential Evolution, while the limited evaluation scheme facilitates a
significant reduction in computing time
- …