120,096 research outputs found

    Incorporación de técnicas multivariantes en un sistema gestor de bases de datos

    Get PDF
    El objetivo principal de la presente tesis de maestría es la incorporación de las técnicas de regresión lineal y logística multivariante en un sistema gestor de bases de datos, con el propósito de facilitar el descubrimiento de conocimientos en bases de datos y promover el enfoque de inteligencia del negocio con una herramienta con la inteligencia suficiente para interpretar y presentar los resultados de manera amigable para apoyar la toma de decisiones. Se propuso un modelo conceptual para la incorporación de las técnicas multivariantes en un sistema gestor de bases de datos, adicionalmente, se presenta un modelo para la visualización de resultados y se desarrolló un prototipo de una aplicación Web para verificar la factibilidad técnica del modelo propuesto. Se muestra cómo el modelo propuesto para la visualización de los resultados posee la potencia expresiva para facilitar la asimilación del nuevo conocimiento generado con el análisis de regresión. Se demostró que el prototipo desarrollado facilita la selección de los datos para un análisis de regresión e interpreta por sí mismo los resultados, facilitando el descubrimiento de conocimiento en bases de datos a usuarios no expertos. Finalmente, el modelo conceptual para la incorporación de las técnicas de regresión multivariantes en un sistema gestor de bases de datos y el modelo para la visualización de los resultados, presentan las características apropiadas para brindar soporte a proyectos de descubrimiento de conocimiento en bases de datos. / Abstract: The principal objective of this thesis is to incorporate techniques of multivariate linear and logistic regressions in a database management system, with the purpose of facilitate Knowledge Discovery in Databases and to promote the Business Intelligence with an intelligent tool that allows to interpret and to present the results in friendly way to support the decision making. A conceptual model which incorporates techniques related with multivariate regressions in database management system is proposed. Additionally, a model for the visualization of results is presented and a Web application prototype to verify the technical feasibility of the model is developed. The visualization model of regression results showed expressive power to facilitate the assimilation of new knowledge produced with regression analysis. It was demonstrated that the developed prototype facilitates the selection of data for a regression analysis and interprets the results by itself, making easy to non expert users the Knowledge Discovery in Databases. Finally, the conceptual model for incorporating techniques of multivariate regressions in a database management system and the model for the visualization of results presented characteristics appropriated to support the projects to Knowledge Discovery in Databases.Maestrí

    On the role of pre and post-processing in environmental data mining

    Get PDF
    The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

    Using Visualization to Support Data Mining of Large Existing Databases

    Get PDF
    In this paper. we present ideas how visualization technology can be used to improve the difficult process of querying very large databases. With our VisDB system, we try to provide visual support not only for the query specification process. but also for evaluating query results and. thereafter, refining the query accordingly. The main idea of our system is to represent as many data items as possible by the pixels of the display device. By arranging and coloring the pixels according to the relevance for the query, the user gets a visual impression of the resulting data set and of its relevance for the query. Using an interactive query interface, the user may change the query dynamically and receives immediate feedback by the visual representation of the resulting data set. By using multiple windows for different parts of the query, the user gets visual feedback for each part of the query and, therefore, may easier understand the overall result. To support complex queries, we introduce the notion of approximate joins which allow the user to find data items that only approximately fulfill join conditions. We also present ideas how our technique may be extended to support the interoperation of heterogeneous databases. Finally, we discuss the performance problems that are caused by interfacing to existing database systems and present ideas to solve these problems by using data structures supporting a multidimensional search of the database

    Visual Integration of Data and Model Space in Ensemble Learning

    Full text link
    Ensembles of classifier models typically deliver superior performance and can outperform single classifier models given a dataset and classification task at hand. However, the gain in performance comes together with the lack in comprehensibility, posing a challenge to understand how each model affects the classification outputs and where the errors come from. We propose a tight visual integration of the data and the model space for exploring and combining classifier models. We introduce a workflow that builds upon the visual integration and enables the effective exploration of classification outputs and models. We then present a use case in which we start with an ensemble automatically selected by a standard ensemble selection algorithm, and show how we can manipulate models and alternative combinations.Comment: 8 pages, 7 picture

    Data mining as a tool for environmental scientists

    Get PDF
    Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields such as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems. Certain techniques such as Artificial Neural Networks, Clustering, Case-Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling while other methods, for example classification and association rule extraction, have not yet been taken up on any wide scale. We propose that these and other data mining techniques could be usefully applied to difficult problems in the field. This paper introduces several data mining concepts and briefly discusses their application to environmental modelling, where data may be sparse, incomplete, or heterogenous
    corecore