7 research outputs found

    Clustering Algorithms For High Dimensional Data – A Survey Of Issues And Existing Approaches

    Get PDF
    Clustering is the most prominent data mining technique used for grouping the data into clusters based on distance measures. With the advent growth of high dimensional data such as microarray gene expression data, and grouping high dimensional data into clusters will encounter the similarity between the objects in the full dimensional space is often invalid because it contains different types of data. The process of grouping into high dimensional data into clusters is not accurate and perhaps not up to the level of expectation when the dimension of the dataset is high. It is now focusing tremendous attention towards research and development. The performance issues of the data clustering in high dimensional data it is necessary to study issues like dimensionality reduction, redundancy elimination, subspace clustering, co-clustering and data labeling for clusters are to analyzed and improved. In this paper, we presented a brief comparison of the existing algorithms that were mainly focusing at clustering on high dimensional data

    Auto Insurance Business Analytics Approach for Customer Segmentation Using Multiple Mixed-Type Data Clustering Algorithms

    Get PDF
    Customer segmentation is critical for auto insurance companies to gain competitive advantage by mining useful customer related information. While some efforts have been made for customer segmentation to support auto insurance decision making, their customer segmentation results tend to be affected by the characteristics of the algorithm used and lack multiple validation from multiple algorithms. To this end, we propose an auto insurance business analytics approach that segments customers by using three mixed-type data clustering algorithms including k-prototypes, improved k-prototypes and similarity-based agglomerative clustering. The customer segmentation results of these algorithms can complement and reinforce each other and demonstrate as much information as possible to support decision-making. To confirm its practical value, the proposed approach extracts seven rules for an auto insurance company that may support the company to make customer related decisions and develop insurance products

    Rule Extraction on Numeric Datasets Using Hyper-rectangles

    Get PDF
    When there is a need to understand the data stored in a database, one of the main requirements is being able to extract knowledge in the form of rules. Classification strategies allow extracting rules almost naturally. In this paper, a new classification strategy is presented that uses hyper-rectangles as data descriptors to achieve a model that allows extracting knowledge in the form of classification rules. The participation of an expert for training the model is discussed. Finally, the results obtained using the databases from the UCI repository are presented and compared with other existing classification models, showing that the algorithm presented requires less computational resources and achieves the same accuracy level and number of extracted rules.Fil: Hasperué, Waldo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina. Universidad Nacional de La Plata. Facultad de Informática. Instituto de Investigación en Informática Lidi; ArgentinaFil: Lanzarini, Laura Cristina. Universidad Nacional de La Plata. Facultad de Informática. Instituto de Investigación en Informática Lidi; ArgentinaFil: de Giusti, Armando Eduardo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina. Universidad Nacional de La Plata. Facultad de Informática. Instituto de Investigación en Informática Lidi; Argentin

    CLUIN – A new method for extracting rules for large databases

    Get PDF
    When there is a need to understand the data stored in a database, one of the main requirements is being able to extract knowledge in the form of rules. Classification strategies allow extracting rules almost naturally. In this paper, the CLUHR classification strategy is extended to work with databases that have nominal attributes. Finally, the results obtained using the databases from the UCI repository are presented and compared with other existing classification models, showing that the algorithm presented requires less computational resources and achieves the same accuracy level and number of extracted rules.Eje: Workshop Agentes y sistemas inteligentes (WASI)Red de Universidades con Carreras en Informática (RedUNCI

    Computer Science & Technology Series : XVIII Argentine Congress of Computer Science. Selected papers

    Get PDF
    CACIC’12 was the eighteenth Congress in the CACIC series. It was organized by the School of Computer Science and Engineering at the Universidad Nacional del Sur. The Congress included 13 Workshops with 178 accepted papers, 5 Conferences, 2 invited tutorials, different meetings related with Computer Science Education (Professors, PhD students, Curricula) and an International School with 5 courses. CACIC 2012 was organized following the traditional Congress format, with 13 Workshops covering a diversity of dimensions of Computer Science Research. Each topic was supervised by a committee of 3-5 chairs of different Universities. The call for papers attracted a total of 302 submissions. An average of 2.5 review reports were collected for each paper, for a grand total of 752 review reports that involved about 410 different reviewers. A total of 178 full papers, involving 496 authors and 83 Universities, were accepted and 27 of them were selected for this book.Red de Universidades con Carreras en Informática (RedUNCI

    On Data Labeling for Clustering Categorical Data

    No full text

    On Data Labeling for Clustering Categorical Data

    No full text
    corecore