682 research outputs found

    A review of associative classification mining

    Get PDF
    Associative classification mining is a promising approach in data mining that utilizes the association rule discovery techniques to construct classification systems, also known as associative classifiers. In the last few years, a number of associative classification algorithms have been proposed, i.e. CPAR, CMAR, MCAR, MMAC and others. These algorithms employ several different rule discovery, rule ranking, rule pruning, rule prediction and rule evaluation methods. This paper focuses on surveying and comparing the state-of-the-art associative classification techniques with regards to the above criteria. Finally, future directions in associative classification, such as incremental learning and mining low-quality data sets, are also highlighted in this paper

    Associative classifier coupled with unsupervised feature reduction for dengue fever classification using gene expression data

    Get PDF
    Recent studies have established the potential of classifiers designed using association rule mining methods. The current study proposes such an associative classifier to efficiently detect dengue fever using gene expression data. Labelled gene expression data has been preprocessed and discretized to mine association rules using well-established rule mining methods. Thereafter, unsupervised clustering methods have been applied to the discretized gene expression data to reduce and select the most promising features. The final feature reduced discretized gene expression data is subsequently used to mine rules in order to classify subjects into 'Dengue Fever' or 'Healthy'. Two well-known association rule mining methods, viz., Apriori and FP-Growth, have been used here along with various types of well established clustering methods. Extensive analysis has been reported with performance parameters in terms of accuracy, precision, recall and false positive rate using 5-fold cross-validation. Furthermore, a separate investigation has been conducted to find the most suitable number of features and confidence of association rule mining methods. The experimental results obtained indicate accurate detection of dengue fever patients at an early stage using the proposed associative classification method.Web of Science10883538834

    Evolutionary Decomposition of Complex Design Spaces

    Get PDF
    This dissertation investigates the support of conceptual engineering design through the decomposition of multi-dimensional search spaces into regions of high performance. Such decomposition helps the designer identify optimal design directions by the elimination of infeasible or undesirable regions within the search space. Moreover, high levels of interaction between the designer and the model increases overall domain knowledge and significantly reduces uncertainty relating to the design task at hand. The aim of the research is to develop the archetypal Cluster Oriented Genetic Algorithm (COGA) which achieves search space decomposition by using variable mutation (vmCOGA) to promote diverse search and an Adaptive Filter (AF) to extract solutions of high performance [Parmee 1996a, 1996b]. Since COGAs are primarily used to decompose design domains of unknown nature within a real-time environment, the elimination of apriori knowledge, speed and robustness are paramount. Furthermore COGA should promote the in-depth exploration of the entire search space, sampling all optima and the surrounding areas. Finally any proposed system should allow for trouble free integration within a Graphical User Interface environment. The replacement of the variable mutation strategy with a number of algorithms which increase search space sampling are investigated. Utility is then increased by incorporating a control mechanism that maintains optimal performance by adapting each algorithm throughout search by means of a feedback measure based upon population convergence. Robustness is greatly improved by modifying the Adaptive Filter through the introduction of a process that ensures more accurate modelling of the evolving population. The performance of each prospective algorithm is assessed upon a suite of two-dimensional test functions using a set of novel performance metrics. A six dimensional test function is also developed where the areas of high performance are explicitly known, thus allowing for evaluation under conditions of increased dimensionality. Further complexity is introduced by two real world models described by both continuous and discrete parameters. These relate to the design of conceptual airframes and cooling hole geometries within a gas turbine. Results are promising and indicate significant improvement over the vmCOGA in terms of all desired criteria. This further supports the utilisation of COGA as a decision support tool during the conceptual phase of design.British Aerospace plc, Warton and Rolls Royce plc, Filto

    Annales Mathematicae et Informaticae (48.)

    Get PDF

    Machine Learning Techniques for Screening and Diagnosis of Diabetes: a Survey

    Get PDF
    Diabetes has become one of the major causes of national disease and death in most countries. By 2015, diabetes had affected more than 415 million people worldwide. According to the International Diabetes Federation report, this figure is expected to rise to more than 642 million in 2040, so early screening and diagnosis of diabetes patients have great significance in detecting and treating diabetes on time. Diabetes is a multifactorial metabolic disease, its diagnostic criteria is difficult to cover all the ethology, damage degree, pathogenesis and other factors, so there is a situation for uncertainty and imprecision under various aspects of medical diagnosis process. With the development of Data mining, researchers find that machine learning is playing an increasingly important role in diabetes research. Machine learning techniques can find the risky factors of diabetes and reasonable threshold of physiological parameters to unearth hidden knowledge from a huge amount of diabetes-related data, which has a very important significance for diagnosis and treatment of diabetes. So this paper provides a survey of machine learning techniques that has been applied to diabetes data screening and diagnosis of the disease. In this paper, conventional machine learning techniques are described in early screening and diagnosis of diabetes, moreover deep learning techniques which have a significance of biomedical effect are also described

    Dimensionality reduction methods for microarray cancer data using prior knowledge

    No full text
    Microarray studies are currently a very popular source of biological information. They allow the simultaneous measurement of hundreds of thousands of genes, drastically increasing the amount of data that can be gathered in a small amount of time and also decreasing the cost of producing such results. Large numbers of high dimensional data sets are currently being generated and there is an ongoing need to find ways to analyse them to obtain meaningful interpretations. Many microarray experiments are concerned with answering specific biological or medical questions regarding diseases and treatments. Cancer is one of the most popular research areas and there is a plethora of data available requiring in depth analysis. Although the analysis of microarray data has been thoroughly researched over the past ten years, new approaches still appear regularly, and may lead to a better understanding of the available information. The size of the modern data sets presents considerable difficulties to traditional methodologies based on hypothesis testing, and there is a new move towards the use of machine learning in microarray data analysis. Two new methods of using prior genetic knowledge in machine learning algorithms have been developed and their results are compared with existing methods. The prior knowledge consists of biological pathway data that can be found in on-line databases, and gene ontology terms. The first method, called ``a priori manifold learning'' uses the prior knowledge when constructing a manifold for non-linear feature extraction. It was found to perform better than both linear principal components analysis (PCA) and the non-linear Isomap algorithm (without prior knowledge) in both classification accuracy and quality of the clusters. Both pathway and GO terms were used as prior knowledge, and results showed that using GO terms can make the models over-fit the data. In the cases where the use of GO terms does not over-fit, the results are better than PCA, Isomap and a priori manifold learning using pathways. The second method, called ``the feature selection over pathway segmentation algorithm'', uses the pathway information to split a big dataset into smaller ones. Then, using AdaBoost, decision trees are constructed for each of the smaller sets and the sets that achieve higher classification accuracy are identified. The individual genes in these subsets are assessed to determine their role in the classification process. Using data sets concerning chronic myeloid leukaemia (CML) two subsets based on pathways were found to be strongly associated with the response to treatment. Using a different data set from measurements on lower grade glioma (LGG) tumours, four informative gene sets were discovered. Further analysis based on the Gini importance measure identified a set of genes for each cancer type (CML, LGG) that could predict the response to treatment very accurately (> 90%). Moreover a single gene that can predict the response to CML treatment accurately was identified.Open Acces
    corecore