4 research outputs found

    Naive Bayes vs. Decision Trees vs. Neural Networks in the Classification of Training Web Pages

    Get PDF
    Web classification has been attempted through many different technologies. In this study we concentrate on the comparison of Neural Networks (NN), NaĂŻve Bayes (NB) and Decision Tree (DT) classifiers for the automatic analysis and classification of attribute data from training course web pages. We introduce an enhanced NB classifier and run the same data sample through the DT and NN classifiers to determine the success rate of our classifier in the training courses domain. This research shows that our enhanced NB classifier not only outperforms the traditional NB classifier, but also performs similarly as good, if not better, than some more popular, rival techniques. This paper also shows that, overall, our NB classifier is the best choice for the training courses domain, achieving an impressive F-Measure value of over 97%, despite it being trained with fewer samples than any of the classification systems we have encountered

    A study of the design and analysis of feed-forward neural networks

    Get PDF
    This thesis shows that a design and analysis system for feed forward neural networks is desirable, and that the currently available techniques do not work. Methods have been presented that solve the problem of analysis, showing that analysis is possible and desirable for classification networks. The biggest limitation is the size of the network and that the analysis tools are only applicable to properly designed classification systems. A method of reducing the size of classification networks is presented along with a design methodology for non classification systems

    Automated retrieval and extraction of training course information from unstructured web pages

    Get PDF
    Web Information Extraction (WIE) is the discipline dealing with the discovery, processing and extraction of specific pieces of information from semi-structured or unstructured web pages. The World Wide Web comprises billions of web pages and there is much need for systems that will locate, extract and integrate the acquired knowledge into organisations practices. There are some commercial, automated web extraction software packages, however their success comes from heavily involving their users in the process of finding the relevant web pages, preparing the system to recognise items of interest on these pages and manually dealing with the evaluation and storage of the extracted results. This research has explored WIE, specifically with regard to the automation of the extraction and validation of online training information. The work also includes research and development in the area of automated Web Information Retrieval (WIR), more specifically in Web Searching (or Crawling) and Web Classification. Different technologies were considered, however after much consideration, NaĂŻve Bayes Networks were chosen as the most suitable for the development of the classification system. The extraction part of the system used Genetic Programming (GP) for the generation of web extraction solutions. Specifically, GP was used to evolve Regular Expressions, which were then used to extract specific training course information from the web such as: course names, prices, dates and locations. The experimental results indicate that all three aspects of this research perform very well, with the Web Crawler outperforming existing crawling systems, the Web Classifier performing with an accuracy of over 95% and a precision of over 98%, and the Web Extractor achieving an accuracy of over 94% for the extraction of course titles and an accuracy of just under 67% for the extraction of other course attributes such as dates, prices and locations. Furthermore, the overall work is of great significance to the sponsoring company, as it simplifies and improves the existing time-consuming, labour-intensive and error-prone manual techniques, as will be discussed in this thesis. The prototype developed in this research works in the background and requires very little, often no, human assistance

    A case-based reasoning methodology to formulating polyurethanes

    Get PDF
    Formulation of polyurethanes is a complex problem poorly understood as it has developed more as an art rather than a science. Only a few experts have mastered polyurethane (PU) formulation after years of experience and the major raw material manufacturers largely hold such expertise. Understanding of PU formulation is at present insufficient to be developed from first principles. The first principle approach requires time and a detailed understanding of the underlying principles that govern the formulation process (e.g. PU chemistry, kinetics) and a number of measurements of process conditions. Even in the simplest formulations, there are more that 20 variables often interacting with each other in very intricate ways. In this doctoral thesis the use of the Case-Based Reasoning and Artificial Neural Network paradigm is proposed to enable support for PUs formulation tasks by providing a framework for the collection, structure, and representation of real formulating knowledge. The framework is also aimed at facilitating the sharing and deployment of solutions in a consistent and referable way, when appropriate, for future problem solving. Two basic problems in the development of a Case-Based Reasoning tool that uses past flexible PU foam formulation recipes or cases to solve new problems were studied. A PU case was divided into a problem description (i. e. PU measured mechanical properties) and a solution description (i. e. the ingredients and their quantities to produce a PU). The problems investigated are related to the retrieval of former PU cases that are similar to a new problem description, and the adaptation of the retrieved case to meet the problem constraints. For retrieval, an alternative similarity measure based on the moment's description of a case when it is represented as a two dimensional image was studied. The retrieval using geometric, central and Legendre moments was also studied and compared with a standard nearest neighbour algorithm using nine different distance functions (e.g. Euclidean, Canberra, City Block, among others). It was concluded that when cases were represented as 2D images and matching is performed by using moment functions in a similar fashion to the approaches studied in image analysis in pattern recognition, low order geometric and Legendre moments and central moments of any order retrieve the same case as the Euclidean distance does when used in a nearest neighbour algorithm. This means that the Euclidean distance acts a low moment function that represents gross level case features. Higher order (moment's order>3) geometric and Legendre moments while enabling finer details about an image to be represented had no standard distance function counterpart. For the adaptation of retrieved cases, a feed-forward back-propagation artificial neural network was proposed to reduce the adaptation knowledge acquisition effort that has prevented building complete CBR systems and to generate a mapping between change in mechanical properties and formulation ingredients. The proposed network was trained with the differences between problem descriptions (i.e. mechanical properties of a pair of foams) as input patterns and the differences between solution descriptions (i.e. formulation ingredients) as the output patterns. A complete data set was used based on 34 initial formulations and a 16950 epochs trained network with 1102 training exemplars, produced from the case differences, gave only 4% error. However, further work with a data set consisting of a training set and a small validation set failed to generalise returning a high percentage of errors. Further tests on different training/test splits of the data also failed to generalise. The conclusion reached is that the data as such has insufficient common structure to form any general conclusions. Other evidence to suggest that the data does not contain generalisable structure includes the large number of hidden nodes necessary to achieve convergence on the complete data set.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore