16 research outputs found

    Integration of Data Mining and Data Warehousing: a practical methodology

    Get PDF
    The ever growing repository of data in all fields poses new challenges to the modern analytical systems. Real-world datasets, with mixed numeric and nominal variables, are difficult to analyze and require effective visual exploration that conveys semantic relationships of data. Traditional data mining techniques such as clustering clusters only the numeric data. Little research has been carried out in tackling the problem of clustering high cardinality nominal variables to get better insight of underlying dataset. Several works in the literature proved the likelihood of integrating data mining with warehousing to discover knowledge from data. For the seamless integration, the mined data has to be modeled in form of a data warehouse schema. Schema generation process is complex manual task and requires domain and warehousing familiarity. Automated techniques are required to generate warehouse schema to overcome the existing dependencies. To fulfill the growing analytical needs and to overcome the existing limitations, we propose a novel methodology in this paper that permits efficient analysis of mixed numeric and nominal data, effective visual data exploration, automatic warehouse schema generation and integration of data mining and warehousing. The proposed methodology is evaluated by performing case study on real-world data set. Results show that multidimensional analysis can be performed in an easier and flexible way to discover meaningful knowledge from large datasets

    Clutter Reduction in Parallel Coordinates using Binning Approach for Improved Visualization

    Get PDF
    As the data and number of information sources keeps on mounting, the mining of necessary information and their presentation in a human delicate form becomes a great challenge. Visualization helps us to pictorially represent, evaluate and uncover the knowledge from the data under consideration. Data visualization offers its immense opportunity in the fields of trade, banking, finance, insurance, energy etc. With the data explosion in various fields, there is a large importance for visualization techniques. But when the quantity of data becomes elevated, the visualization methods may take away the competency. Parallel coordinates is an eminent and often used method for data visualization. However the efficiency of this method will be abridged if there are large amount of instances in the dataset, thereby making the visualization clumsier and the data retrieval very inefficient. Here we introduced a data summarization approach as a preprocessing step to the existing parallel coordinate method to make the visualization more proficient

    Escalabilidad visual en coordenadas paralelas

    Get PDF
    En sistemas de áreas tan diversas como simulación, salud, la Web, meteorología, o testeo de productos, los volúmenes de datos son cada vez mayores y se agigantan constantemente. A la hora de analizar e interpretar estos datos las limitaciones humanas y la falta de software adecuado para complementarlo, son los mayores problemas. Esta carencia de software se debe principalmente a la complejidad computacional que implica procesar tales conjuntos de datos y además a que las técnicas de visualización eficaces para volúmenes de datos reducidos no son aplicables en estos casos. El desarrollo de técnicas escalables visualmente es sustancial a la hora de producir herramientas adaptables a conjuntos de información de gran magnitud. El principal objetivo del trabajo de investigación que se está desarrollando es realizar el análisis de la factibilidad de escalar visualmente las coordenadas paralelas, que es una de las técnicas más poderosas de visualización n–dimensional. De este modo se podrá extender su uso a visualizaciones de grandes conjuntos de datos.Eje: Computación gráfica, visualización e imágenesRed de Universidades con Carreras en Informática (RedUNCI

    Escalabilidad visual en coordenadas paralelas

    Get PDF
    En sistemas de áreas tan diversas como simulación, salud, la Web, meteorología, o testeo de productos, los volúmenes de datos son cada vez mayores y se agigantan constantemente. A la hora de analizar e interpretar estos datos las limitaciones humanas y la falta de software adecuado para complementarlo, son los mayores problemas. Esta carencia de software se debe principalmente a la complejidad computacional que implica procesar tales conjuntos de datos y además a que las técnicas de visualización eficaces para volúmenes de datos reducidos no son aplicables en estos casos. El desarrollo de técnicas escalables visualmente es sustancial a la hora de producir herramientas adaptables a conjuntos de información de gran magnitud. El principal objetivo del trabajo de investigación que se está desarrollando es realizar el análisis de la factibilidad de escalar visualmente las coordenadas paralelas, que es una de las técnicas más poderosas de visualización n–dimensional. De este modo se podrá extender su uso a visualizaciones de grandes conjuntos de datos.Eje: Computación gráfica, visualización e imágenesRed de Universidades con Carreras en Informática (RedUNCI

    A Visual Approach To Exploratory Data Mining

    Get PDF
    As the first step upon commencing an in-depth data mining analysis, students should become intimately acquainted with the data under study.  In this paper, we present a methodology and set of custom tools that we have designed and developed for use in our data mining courses that allows students to efficiently and effectively accomplish this task.  The tools create interactive visual presentations of the data, encouraging students to explore the data in search of patterns or relationships that would then be investigated in subsequent steps using sophisticated statistical and machine learning tools

    Discovering Interpretable Machine Learning Models in Parallel Coordinates

    Full text link
    This paper contributes to interpretable machine learning via visual knowledge discovery in parallel coordinates. The concepts of hypercubes and hyper-blocks are used as easily understandable by end-users in the visual form in parallel coordinates. The Hyper algorithm for classification with mixed and pure hyper-blocks (HBs) is proposed to discover hyper-blocks interactively and automatically in individual, multiple, overlapping, and non-overlapping setting. The combination of hyper-blocks with linguistic description of visual patterns is presented too. It is shown that Hyper models generalize decision trees. The Hyper algorithm was tested on the benchmark data from UCI ML repository. It allowed discovering pure and mixed HBs with all data and then with 10-fold cross validation. The links between hyper-blocks, dimension reduction and visualization are established. Major benefits of hyper-block technology and the Hyper algorithm are in their ability to discover and observe hyper-blocks by end-users including side by side visualizations making patterns visible for all classes. Another advantage of sets of HBs relative to the decision trees is the ability to avoid both data overgeneralization and overfitting.Comment: 8 pages, 18 figure

    ANÁLISE DE DADOS DE FISIOLOGIA DE PLANTAS APOIADA POR TÉCNICAS DE VISUALIZAÇÃO DE INFORMAÇÕES

    Get PDF
    Este artigo apresenta uma proposta alternativa para a análise de dados de fisiologia de plantas, usando como apoio, técnicas de visualização de informações. Estas técnicas conseguem gerar visualizações diretamente a partir de dados de alta dimensionalidade, que auxiliam a compreensão dos dados e servem para auxiliar a identificação dos atributos mais relevantes para a discriminação das espécies estudadas. A identificação dos parâmetros mais relevantes é fundamental para a construção de um modelo de classificação, seja usando um classificador Bayesiano ou mesmo uma Rede Neural Artificial
    corecore