8 research outputs found

    Ensemble of a subset of kNN classifiers

    Get PDF
    Combining multiple classifiers, known as ensemble methods, can give substantial improvement in prediction performance of learning algorithms especially in the presence of non-informative features in the data sets. We propose an ensemble of subset of kNN classifiers, ESkNN, for classification task in two steps. Firstly, we choose classifiers based upon their individual performance using the out-of-sample accuracy. The selected classifiers are then combined sequentially starting from the best model and assessed for collective performance on a validation data set. We use bench mark data sets with their original and some added non-informative features for the evaluation of our method. The results are compared with usual kNN, bagged kNN, random kNN, multiple feature subset method, random forest and support vector machines. Our experimental comparisons on benchmark classification problems and simulated data sets reveal that the proposed ensemble gives better classification performance than the usual kNN and its ensembles, and performs comparable to random forest and support vector machines

    Learning differential diagnosis of erythemato-squamous diseases using voting feature intervals

    Get PDF
    Cataloged from PDF version of article.A new classification algorithm, called VFI5 (for Voting Feature Intervals), is developed and applied to problem of differential diagnosis of erythemato-squamous diseases. The domain contains records of patients with known diagnosis. Given a training set of such records, the VFI5 classifier learns how to differentiate a new case in the domain. VFI5 represents a concept in the form of feature intervals on each feature dimension separately. classification in the VFI5 algorithm is based on a real-valued voting. Each feature equally participates in the voting process and the class that receives the maximum amount of votes is declared to be the predicted class. The performance of the VFI5 classifier is evaluated empirically in terms of classification accuracy and running time. (C) 1998 Elsevier Science B.V. All rights reserved

    Text categorization using feature projections

    Full text link

    A classification learning algorithm robust to irrelevant features

    Get PDF
    Presence of irrelevant features is a fact of life in many realworld applications of classification learning. Although nearest-neighbor classification algorithms have emerged as a promising approach to machine learning tasks with their high predictive accuracy, they are adversely affected by the presence of such irrelevant features. In this paper, we describe a recently proposed classification algorithm called VFI5, which achieves comparable accuracy to nearest-neighbor classifiers while it is robust with respect to irrelevant features. The paper compares both the nearest-neighbor classifier and the VFI5 algorithms in the presence of irrelevant features on both artificially generated and real-world data sets selected from the UCI repository

    Classification by voting feature intervals

    Get PDF
    A new classification algorithm called VFI (for Voting Feature Intervals) is proposed. A concept is represented by a set of feature intervals on each feature dimension separately. Each feature participates in the classification by distributing real-valued votes among classes. The class receiving the highest vote is declared to be the predicted class. VFI is compared with the Naive Bayesian Classifier, which also considers each feature separately. Experiments on real-world datasets show that VFI achieves comparably and even better than NBC in terms of classification accuracy. Moreover, VFI is faster than NBC on all datasets. © Springer-Verlag Berlin Heidelberg 1997

    Modélisation multi-agent dans un processus de gestion multi acteur, application au maintien à domicile

    Get PDF
    Les systèmes de maintien ou de surveillance à domicile existants cherchent à répondre aux besoins de ce domaine, mais souffrent néanmoins de quelques limites, une de ces limites étant que ces systèmes sont centrés sur une seule personne et ne permettent pas la surveillance de plusieurs personnes en même temps. Notre objectif est de construire des patrons de comportement à partir des informations provenant du domicile des personnes suivies à l'aide des capteurs de mouvement, des capteurs physiologiques, des cahiers de liaison, et d'autres sources, dans le but d'avoir une vision macroscopique des personnes suivies. Pour ce faire nous déployons une architecture de classification utilisable à grande échelle et basée sur les technologies multi-agent. Nous avons opté pour une méthode de classification multi-agents car l'application des méthodes classiques centralisées (statistiques, neuronales, de formation de concept...) ne sont pas possibles quand les données nécessaires pour faire la classification, sont distribuées. De telles méthodes ne permettent pas le passage à l'échelle qui suppose de pouvoir prendre en compte de nombreuses personnes situées dans des environnements différents et suivies par de nombreux indicateurs dont le nombre et le domaine de valeur peuvent évoluer dans le temps. Un tel passage à l'échelle est possible avec les méthodes multi-agents où chaque agent gère une partie de données sur un sous-ensemble de la population suivie. L'évolution du nombre ou du domaine des indicateurs peut induire la suppression ou l'ajout d'un nouvel agent sans l'obligation de refaire tout le calcul.This research can be seen as a macroscopic approach to a large-scale distributed data gathering. We propose a software architecture to monitor elderly or dependent people in their own house. Many studies have been done on hardware aspects resulting in operational products. But there is a lack of adaptive algorithms to handle all the data generated by these products, because such data is distributed and heterogeneous in a large scale environment. We propose a multi-agent classification method to collect and to aggregate data about activity, movements and physiological information of the monitored people: agent's know-how consists in a simple classification algorithm. Data generated at this local level are communicated and adjusted between agents to obtain a set of patterns. This data is dynamic; the system has to store the built patterns and has to create new patterns when new data is available. Therefore, the system is adaptive and can be spread on a large scale. The generated data is used at a local level, for example to raise an alert, but also to evaluate global risks. We present the specification choices and the massively multi-agent architecture we developed

    BUILDING DSS USING KNOWLEDGE DISCOVERY IN DATABASE APPLIED TO ADMISSION & REGISTRATION FUNCTIONS

    Get PDF
    This research investigates the practical issues surrounding the development and implementation of Decision Support Systems (DSS). The research describes the traditional development approaches analyzing their drawbacks and introduces a new DSS development methodology. The proposed DSS methodology is based upon four modules; needs' analysis, data warehouse (DW), knowledge discovery in database (KDD), and a DSS module. The proposed DSS methodology is applied to and evaluated using the admission and registration functions in Egyptian Universities. The research investigates the organizational requirements that are required to underpin these functions in Egyptian Universities. These requirements have been identified following an in-depth survey of the recruitment process in the Egyptian Universities. This survey employed a multi-part admission and registration DSS questionnaire (ARDSSQ) to identify the required data sources together with the likely users and their information needs. The questionnaire was sent to senior managers within the Egyptian Universities (both private and government) with responsibility for student recruitment, in particular admission and registration. Further, access to a large database has allowed the evaluation of the practical suitability of using a data warehouse structure and knowledge management tools within the decision making framework. 1600 students' records have been analyzed to explore the KDD process, and another 2000 records have been used to build and test the data mining techniques within the KDD process. Moreover, the research has analyzed the key characteristics of data warehouses and explored the advantages and disadvantages of such data structures. This evaluation has been used to build a data warehouse for the Egyptian Universities that handle their admission and registration related archival data. The decision makers' potential benefits of the data warehouse within the student recruitment process will be explored. The design of the proposed admission and registration DSS (ARDSS) will be developed and tested using Cool: Gen (5.0) CASE tools by Computer Associates (CA), connected to a MSSQL Server (6.5), in a Windows NT (4.0) environment. Crystal Reports (4.6) by Seagate will be used as a report generation tool. CLUST AN Graphics (5.0) by CLUST AN software will also be used as a clustering package. Finally, the contribution of this research is found in the following areas: A new DSS development methodology; The development and validation of a new research questionnaire (i.e. ARDSSQ); The development of the admission and registration data warehouse; The evaluation and use of cluster analysis proximities and techniques in the KDD process to find knowledge in the students' records; And the development of the ARDSS software that encompasses the advantages of the KDD and DW and submitting these advantages to the senior admission and registration managers in the Egyptian Universities. The ARDSS software could be adjusted for usage in different countries for the same purpose, it is also scalable to handle new decision situations and can be integrated with other systems
    corecore