6 research outputs found
Feature subset selection problem on microarray data
Recent advance of technology gave birth to tools such as microarray chips. The use of microarray chips enabled the scientists to measure the amount of protein production from their genes in a cell, known as the gene expression data. The classification of cell samples by means of their gene expression data is a hot research area. The data used for the analysis is massive and therefore the features, i.e., the genes, must be reduced to a reasonable level due to the computational cost of experiments and the possibility of misleading irrelevant genes. Therefore, usually, the analysis based on the classification of cell samples includes a feature subset selection phase. This thesis aims to develop a tool that can be used during the feature subset selection phase of such analyses. Three novel algorithms are proposed for the gene selection problem based on basic association rule mining. The first algorithm starts with fuzzy partitioning of the gene expression data and discovers highly confident IF-THEN rules that enable the classification of sample tissues. The second algorithm search the possible IFTHEN rules based on a heuristic pruning approach which is based on the beam search algorithm. Finally, the third algorithm focuses on the hierarchical information carried through gene expressions by constructing decision trees based on different performance measures. We found satisfactory results in Leukemia Dataset. In addition, in colon cancer dataset, algorithm that is based on construction of decision trees showed good performance
Recommended from our members
Enhancing association rules algorithms for mining distributed databases. Integration of fast BitTable and multi-agent association rules mining in distributed medical databases for decision support.
Over the past few years, mining data located in heterogeneous and geographically distributed sites have been designated as one of the key important issues. Loading distributed data into centralized location for mining interesting rules is not a good approach. This is because it violates common issues such as data privacy and it imposes network overheads. The situation becomes worse when the network has limited bandwidth which is the case in most of the real time systems. This has prompted the need for intelligent data analysis to discover the hidden information in these huge amounts of distributed databases.
In this research, we present an incremental approach for building an efficient Multi-Agent based algorithm for mining real world databases in geographically distributed sites. First, we propose the Distributed Multi-Agent Association Rules algorithm (DMAAR) to minimize the all-to-all broadcasting between distributed sites. Analytical calculations show that DMAAR reduces the algorithm complexity and minimizes the message communication cost. The proposed Multi-Agent based algorithm complies with the Foundation for Intelligent Physical Agents (FIPA), which is considered as the global standards in communication between agents, thus, enabling the proposed algorithm agents to cooperate with other standard agents.
Second, the BitTable Multi-Agent Association Rules algorithm (BMAAR) is proposed. BMAAR includes an efficient BitTable data structure which helps in compressing the database thus can easily fit into the memory of the local sites. It also includes two BitWise AND/OR operations for quick candidate itemsets generation and support counting. Moreover, the algorithm includes three transaction trimming techniques to reduce the size of the mined data.
Third, we propose the Pruning Multi-Agent Association Rules algorithm (PMAAR) which includes three candidate itemsets pruning techniques for reducing the large number of generated candidate itemsets, consequently, reducing the total time for the mining process.
The proposed PMAAR algorithm has been compared with existing Association Rules algorithms against different benchmark datasets and has proved to have better performance and execution time. Moreover, PMAAR has been implemented on real world distributed medical databases obtained from more than one hospital in Egypt to discover the hidden Association Rules in patients¿ records to demonstrate the merits and capabilities of the proposed model further. Medical data was anonymously obtained without the patients¿ personal details. The analysis helped to identify the existence or the absence of the disease based on minimum number of effective examinations and tests. Thus, the proposed algorithm can help in providing accurate medical decisions based on cost effective treatments, improving the medical service for the patients, reducing the real time response for the health system and improving the quality of clinical decision making
Recommended from our members
Enhancing Fuzzy Associative Rule Mining Approaches for Improving Prediction Accuracy. Integration of Fuzzy Clustering, Apriori and Multiple Support Approaches to Develop an Associative Classification Rule Base
Building an accurate and reliable model for prediction for different application domains, is one of the most significant challenges in knowledge discovery and data mining. This thesis focuses on building and enhancing a generic predictive model for estimating a future value by extracting association rules (knowledge) from a quantitative database. This model is applied to several data sets obtained from different benchmark problems, and the results are evaluated through extensive experimental tests.
The thesis presents an incremental development process for the prediction model with three stages. Firstly, a Knowledge Discovery (KD) model is proposed by integrating Fuzzy C-Means (FCM) with Apriori approach to extract Fuzzy Association Rules (FARs) from a database for building a Knowledge Base (KB) to predict a future value. The KD model has been tested with two road-traffic data sets.
Secondly, the initial model has been further developed by including a diversification method in order to improve a reliable FARs to find out the best and representative rules. The resulting Diverse Fuzzy Rule Base (DFRB) maintains high quality and diverse FARs offering a more reliable and generic model. The model uses FCM to transform quantitative data into fuzzy ones, while a Multiple Support Apriori (MSapriori) algorithm is adapted to extract the FARs from fuzzy data. The correlation values for these FARs are calculated, and an efficient orientation for filtering FARs is performed as a post-processing method. The FARs diversity is maintained through the clustering of FARs, based on the concept of the sharing function technique used in multi-objectives optimization. The best and the most diverse FARs are obtained as the DFRB to utilise within the Fuzzy Inference System (FIS) for prediction.
The third stage of development proposes a hybrid prediction model called Fuzzy Associative Classification Rule Mining (FACRM) model. This model integrates the
ii
improved Gustafson-Kessel (G-K) algorithm, the proposed Fuzzy Associative Classification Rules (FACR) algorithm and the proposed diversification method. The improved G-K algorithm transforms quantitative data into fuzzy data, while the FACR generate significant rules (Fuzzy Classification Association Rules (FCARs)) by employing the improved multiple support threshold, associative classification and vertical scanning format approaches. These FCARs are then filtered by calculating the correlation value and the distance between them. The advantage of the proposed FACRM model is to build a generalized prediction model, able to deal with different application domains. The validation of the FACRM model is conducted using different benchmark data sets from the University of California, Irvine (UCI) of machine learning and KEEL (Knowledge Extraction based on Evolutionary Learning) repositories, and the results of the proposed FACRM are also compared with other existing prediction models. The experimental results show that the error rate and generalization performance of the proposed model is better in the majority of data sets with respect to the commonly used models.
A new method for feature selection entitled Weighting Feature Selection (WFS) is also proposed. The WFS method aims to improve the performance of FACRM model. The prediction performance is improved by minimizing the prediction error and reducing the number of generated rules. The prediction results of FACRM by employing WFS have been compared with that of FACRM and Stepwise Regression (SR) models for different data sets. The performance analysis and comparative study show that the proposed prediction model provides an effective approach that can be used within a decision support system.Applied Science University (ASU) of Jorda
Design for Customer - Methodik für nachhaltige Kundenlösungen unter Zuhilfenahme eines bedürfnisorientierten Leistungskonfigurators
Zunehmender Wettbewerbsdruck sowie die Internationalisierung der Märkte stellen Unternehmen vor neue Herausforderungen. Eine Möglichkeit sich diesen Herausforderungen zu stellen ist die Vermarktung von Lösungen als Kombination aus Produkten und Dienstleistungen. Die entwickelte Methodik stellt einen Ansatz zur Integration des Kunden in die Prozesse von Lösungsanbietern bereit. Neben der Differenzierung vom Wettbewerb fällt darunter auch die Steigerung von Umsatz und Gewinnmarge