20 research outputs found

    A hybrid decision tree/genetic algorithm method for data mining

    Get PDF

    Optimization techniques to detect early ventilation extubation in intensive care units

    Get PDF
    The decision support models in intensive care units are developed to support medical staff in their decision making process. However, the optimization of these models is particularly difficult to apply due to dynamic, complex and multidisciplinary nature. Thus, there is a constant research and development of new algorithms capable of extracting knowledge from large volumes of data, in order to obtain better predictive results than the current algorithms. To test the optimization techniques a case study with real data provided by INTCare project was explored. This data is concerning to extubation cases. In this dataset, several models like Evolutionary Fuzzy Rule Learning, Lazy Learning, Decision Trees and many others were analysed in order to detect early extubation. The hydrids Decision Trees Genetic Algorithm, Supervised Classifier System and KNNAdaptive obtained the most accurate rate 93.2%, 93.1%, 92.97% respectively, thus showing their feasibility to work in a real environment.This work has been supported by FCT-Fundação para a Ciência e Tecnologia within the Project Scope UID/CEC/00319/2013. The authors would like to thank FCT for the financial support through the contract PTDC/EEI - SII/1302/2012 (INTCare II

    Have the Major U.S. Air Carriers Finally Turned the Corner? A Financial Condition Assessment

    Get PDF
    Rare prior to the deregulation of the airline industry, air carrier bankruptcies became rather endemic in the period 1982-2005. Since 1982, over 175 airlines have filed under the bankruptcy codes. This number includes eight of the carriers that were formerly referred to as “trunk carriers,” now known as “Majors.” Major carriers are defined as those with annual revenues exceeding $1.0 billion. The purpose of this paper is to analyze the recent performance of these carriers using a statistical model specifically designed to predict the likelihood of financial stress for airlines. The paper will also update past research in this important industry to demonstrate the very precarious nature of profitability. The major reasons for the improvement of the industry’s profitability will be briefly discussed. The analysis will show that the current financial condition of the industry has improved significantly due to increased concentration and the market domination of some carriers, very low fuel costs facing the carriers, and the record low interest rates resulting from the Federal Reserve’s easy monetary policy. the industry may still be fragile or vulnerable to changes in these input factors

    Data analytics in SDN and NFV: Techniques and Challenges

    Get PDF
    Software defined networking and network function virtualization are drawing huge attention from researchers both in industry and academia. NFV reduces the capital and opera- tional expenditure of the organization by decoupling the network functions from physical hardware on which they run, which poses new challenges in the perspective of network management such as data management, resource management and performance analysis. Consequentially, novel techniques and strategies are required to address these challenges in efficient way. This paper discusses the most widely used data analytics techniques like machine learning and time series data analysis. Further it describes the review of data mining tools and frameworks. Machine learning helps to overcome the challenges of network management by providing intelligence in network. Hence, in this paper we describe an overview of high level architecture of machine learning analysis framework, the challenges of applying machine learning algorithms in virtual environment and also some of the interesting problems of network management which can be solved by using machine learning

    Understading Black Boxes: Knowledge Induction From Models

    Get PDF
    Due to regurations and laws prohibiting uses of private data on customers and their transactions in customer data base, most customer data sets are not easily accessable even in the same organizations. A solutio for this reguatory problems can be providing statistical summary of the data or models induced from the dat, instead of providing raw data sets. The models, however, have limited information on the original raw data set. This study explores possible solutions for these problems. The study uses prediction models from data on credit information of customers provided by a local bank in Seoul, S. Korea. This study suggests approaches in figuring what is inside of the non-rules based models such as regression models or neural network models. The study proposes several rule accumulation algorithms such as (RAA) and a GA-based rule refinement algorithm (GA-RRA) as possible solutions for the problems. The experiments show the performance of the random dataset, RAA, elimination of redundant rules (ERR), and GA-RRA

    Application of data mining techniques in bioinformatics

    Get PDF
    With the widespread use of databases and the explosive growth in their sizes, there is a need to effectively utilize these massive volumes of data. This is where data mining comes in handy, as it scours the databases for extracting hidden patterns, finding hidden information, decision making and hypothesis testing. Bioinformatics, an upcoming field in today’s world, which involves use of large databases can be effectively searched through data mining techniques to derive useful rules. Based on the type of knowledge that is mined, data mining techniques [1] can be mainly classified into association rules, decision trees and clustering. Until recently, biology lacked the tools to analyze massive repositories of information such as the human genome database [3]. The data mining techniques are effectively used to extract meaningful relationships from these data.Data mining is especially used in microarray analysis which is used to study the activity of different cells under different conditions. Two algorithms under each mining techniques were implemented for a large database and compared with each other. 1. Association Rule Mining: - (a) a priori (b) partition 2. Clustering: - (a) k-means (b) k-medoids 3. Classification Rule Mining:- Decision tree generation using (a) gini index (b) entropy value. Genetic algorithms were applied to association and classification techniques. Further, kmeans and Density Based Spatial Clustering of Applications of Noise (DBSCAN) clustering techniques [1] were applied to a microarray dataset and compared. The microarray dataset was downloaded from internet using the Gene Array Analyzer Software(GAAS).The clustering was done on the basis of the signal color intensity of the genes in the microarray experiment. The following results were obtained:- 1. Association:- For smaller databases, the a priori algorithm works better than partition algorithm and for larger databases partition works better. 2. Clustering:- With respect to the number of interchanges, k-medoids algorithm works better than k-means algorithm. 3. Classification:- The results were similar for both the indices (gini index and entropy value). The application of genetic algorithm improved the efficiency of the association and classification techniques. For the microarray dataset, it was found that DBSCAN is less efficient than k-means when the database is small but for larger database DBSCAN is more accurate and efficient in terms of no. of clusters and time of execution. DBSCAN execution time increases linearly with the increase in database and was much lesser than that of k-means for larger database. Owing to the involvement of large datasets and the need to derive results from them, data mining techniques can be effectively put in use in the field of Bio-informatics [2]. The techniques can be applied to find associations among the genes, cluster similar gene and protein sequences and draw decision trees to classify the genes. Further, the data mining techniques can be made more efficient by applying genetic algorithms which greatly improves the search procedure and reduces the execution time

    Yet Another Representation of Binary Decision Trees: A Mathematical Demonstration

    Full text link
    A decision tree looks like a simple computational graph without cycles, where only the leaf nodes specify the output values and the non-terminals specify their tests or split conditions. From the numerical perspective, we express decision trees in the language of computational graph. We explicitly parameterize the test phase, traversal phase and prediction phase of decision trees based on the bitvectors of non-terminal nodes. As shown later, the decision tree is a shallow binary network in some sense. Especially, we introduce the bitvector matrix to implement the tree traversal in numerical approach, where the core is to convert the logical `AND' operation to arithmetic operations. And we apply this numerical representation to extend and unify diverse decision trees in concept
    corecore