7 research outputs found

    Object-oriented data mining

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    A knowledge discovery approach to urban analysis

    Get PDF
    Enhancing our knowledge of the complexities of cities in order to empower ourselves to make more informed decisions has always been a challenge for urban research. Recent developments in large-scale computing, together with the new techniques and automated tools for data collection and analysis are opening up promising opportunities for addressing this problem. The main motivation that served as the driving force behind this research is how these developments may contribute to urban data analysis. On this basis, the thesis focuses on urban data analysis in order to search for findings that can enhance our knowledge of urban environments, using the generic process of knowledge discovery using data mining. A knowledge discovery process based on data mining is a fully automated or semi-automated process which involves the application of computational tools and techniques to explore the “previously unknown, and potentially useful information” (Witten & Frank, 2005) hidden in large and often complex and multi-dimensional databases. This information can be obtained in the form of correlations amongst variables, data groupings (classes and clusters) or more complex hypotheses (probabilistic rules of co-occurrence, performance vectors of prediction models etc.). This research targets researchers and practitioners working in the field of urban studies who are interested in quantitative/ computational approaches to urban data analysis and specifically aims to engage the interest of architects, urban designers and planners who do not have a background in statistics or in using data mining methods in their work. Accordingly, the overall aim of the thesis is the development of a knowledge discovery approach to urban analysis; a domain-specific adaptation of the generic process of knowledge discovery using data mining enabling the analyst to discover ‘relational urban knowledge’. ‘Relational urban knowledge’ is a term employed in this thesis to refer to the potentially ‘useful’ and/or ‘valuable’ information patterns and relationships that can be discovered in urban databases by applying data mining algorithms. A knowledge discovery approach to urban analysis through data mining can help us to understand site-specific characteristics of urban environments in a more profound and useful way. On a more specific level, the thesis aims towards ‘knowledge discovery’ in traditional thematic maps published in 2008 by the Istanbul Metropolitan Municipality as a basis of the Master Plan for the Beyoğlu Preservation Area. These thematic maps, which represent urban components, namely buildings, streets, neighbourhoods and their various attributes such as floor space use of the buildings, land price, population density or historical importance, do not really extend our knowledge of Beyoğlu Preservation Area beyond documenting its current state and do not contribute to the interventions presented in the master plan. However it is likely that ‘useful’ and ‘valuable’ information patterns discoverable using data mining algorithms are hidden in them. In accordance with the stated aims, three research questions of the thesis concerns (1) the development of a general process model to adapt the generic process of knowledge discovery using data mining for urban data analysis, (2) the investigation of information patterns and relationships that can be extracted from the traditional thematic maps of the Beyoğlu Preservation Area by further developing and implementing this model and (3) the investigation of how could this ‘relational urban knowledge’ support architects, urban designers or urban planners whilst developing intervention proposals for urban regeneration. A Knowledge Discovery Process Model (KDPM) for urban analysis was developed, as an answer to the the first research question. The KDPM for urban analysis is a domain-specific adaptation of the widely accepted process of knowledge discovery in databases defined by Fayyad, Piatetsky-Shapiro, and Smyth (1996b). The model describes a semi-automated process of database formulation, analysis and evaluation for extracting information patterns and relationships from raw data by combining both GIS and data mining functionalities in a complementary way. The KDPM for urban analysis suggests that GIS functionalities can be used to formulate a database, and GIS and data mining can complement each other in analyzing the database and evaluating the outcomes. The model illustrates that the output of a GIS platform can become the input for a data mining platform and vice versa, resulting in an interlinked analytical process which allows for a more sophisticated analysis of urban data. To investigate the second and third research questions, firstly the KDPM for urban analysis was further developed to construct a GIS database of the Beyoğlu Preservation Area from the thematic maps. Then, three implementations were performed using this GIS database; the Beyoğlu Preservation Area Building Features Database consisting of multiple features attributed to the buildings. In Implementation (1), the KDPM for urban analysis was used to investigate a variety of patterns and relationships that can be extracted from the database using three different data mining methods. In Implementations (2) and (3), the KDPM for urban analysis was implemented to test how the knowledge discovery approach through data mining proposed in this thesis can assist in developing draft plans for the regeneration of a run-down neighbourhood in the Beyoğlu Preservation Area (Tarlabaşı). In Implementation (2), the KDPM for urban analysis is implemented in combination with an evolutionary process to apply a regeneration approach developed by the author; a computational process which generates draft plans for ground floor use, user-profile and tenure-type allocation was developed. In Implementation (3), students applied the KDPM for urban analysis during the course of an international workshop. The model enabled them to explore site-specific particularities of Tarlabaşı that would support their urban intervention proposals. Among the outputs of the thesis three of them are considered as utilizable outputs that distinguish this thesis from previous studies: The KDPM for urban analysis. Although there have been other studies which make use of data mining methods and techniques combined with GIS technology, to the best of our knowledge no previous research has implemented a process model to depict this process and used the model to extract ‘knowledge’ from traditional thematic maps. Researchers and practitioners can re-use this process model to analyze other urban environments. The KDPM for urban analysis is, therefore, one of the main utilizable outputs of the thesis and an important scientific contribution of this study. The Beyoğlu Preservation Area Building Features Database. A large and quite comprehensive GIS database which consists of 45 spatial and non-spatial features attributed to the 11,984 buildings located in the Beyoğlu Preservation Area was constructed. This database is one of the original features of this study. To the best of our knowledge, there are no other examples of applications of data mining using such a comprehensive GIS database, constructed from a range of actual micro-scale data representing such a variety of features attributed to the buildings. This database can be re-used by analysts interested in studying the Beyoğlu Preservation Area. The Beyoğlu Preservation Area Building Features Database is therefore one of the main utilizable outputs of the thesis and represents a scientific contribution to the research material on the Beyoğlu Preservation Area. A computational process which generates draft plans for ground floor use, user-profile and tenure-type allocation, using GIS and data mining functionalities with evolutionary computation. This output of the thesis was generated by Implementation (2), which aimed to investigate Research Question (3). The overall process involved the successive application of Naïve Bayes Classification, Association Rule Analysis and an Evolutionary Algorithm to a subset of the Beyoğlu Preservation Area Building Features Database representing the Tarlabaşı neighbourhood. Briefly, the findings of the data mining analysis were used to formulate a set of rules for assigning ground floor use information to the buildings. These rules were then used for fitness measurements of an Evolutionary Algorithm, together with other fitness measurements for assigning user-profile and tenure-type information (defined by the author according to the regeneration approach developed by the author). As a result, the algorithm transformed the existing allocation of the ground floor use in the buildings located in Tarlabaşı in accordance with the given rules and assigned user-profile and tenure type information for each building. This computational process demonstrated one way to use the data mining analysis findings in developing intervention proposals for urban regeneration. A similar computational process can be implemented in other urban contexts by researchers and practitioners. To the best of our knowledge, no prior research has used data mining analysis findings for fitness measurements of an Evolutionary Algorithm in order to produce draft plans for ground floor use, user-profile and tenure-type allocation. This is, therefore, the most original scientific contribution and utilizable output of the thesis. As a result of the research, on the basis of the data that is available in the thematic maps of the Beyoğlu Preservation Area, the potential of a knowledge discovery approach to urban analysis in revealing the relationships between various components of urban environments and their various attributes is demonstrated. It is also demonstrated that these relationships can reveal site-specific characteristics of urban environments and if found ‘valuable’ by the the targeted researchers and practitioners, these can lead to the development of more informed intervention proposals. Thereby the knowledge discovery approach to urban analysis developed in this thesis may help to improve the quality of urban intervention proposals and consequently the quality of built environments. On the other hand, the implementations carried out in the thesis also exposed the major limitation of the knowledge discovery approach to urban analysis through data mining, which is the fact that the findings discoverable by this approach are limited by the relevant data that is collectable and accessible

    Framework de integração para o modelo estratégico de colaboração e mineração de dados espaciais na WEB

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia Civil, Florianópolis, 2011Após o levantamento da situação de alguns municípios brasileiros com relação a produção e ao tratamento de dados espaciais, ficou detectada a carência de infraestruturas, de informações e por consequência, a falta de mecanismos colaborativos com suporte a mineração de dados para análise espacial. As dificuldades aumentam com a disseminação de diferentes estruturas de dados espaciais a exemplo de padrões CAD/GIS produzidas através do rápido avanço das tecnologias de informação, sendo reais os desafios para implementação de uma infraestrutura interoperável e foco de várias discussões. Entretanto o acesso a esses dados via internet e os problemas ocasionados na troca dos mesmos estão relacionados diretamente a natureza particular de cada padrão adotado, por isso devem ser analisados e adequados para colaboração. Inicialmente a hipótese do trabalho visa intensificar a interoperabilidade entre dados espaciais e a integração de sistemas, tornando possível estabelecer canais de comunicação para um ambiente colaborativo visando ações potenciais e cooperativas. A partir disso, a pesquisa apresenta uma investigação sobre os aspectos relevantes que influenciam na engenharia de projetos, originando o desenvolvimento do protótipo denominado OpenCGFW (Collaborative Geospatial Framework Web), visando o reconhecimento de estruturas, integração, manipulação e colaboração, em sintonia com esforços da INDE, OGC e W3C. Inicialmente são realizados estudos e revisões sobre os assuntos diretamente relacionados à interoperabilidade. Também são abordados temas relacionados ao armazenamento, tratamento e colaboração computacional especificamente entre os dados geográficos produzidos por diferentes instituições públicas. Para construção do framework foi aplicado o método MCDA-C (Multicritério de Apoio à Decisão - Construtivista) para identificação dos aspectos fundamentais e elementares. A partir disso o trabalho também descreve os resultados obtidos na implementação das etapas de um padrão de projeto para apoiar nas atividades e na avaliação de geosoluções livres. Durante a discussão, são apresentados os resultados através experimentos e aplicações para mapas digitais na web visando a integração de várias bases de dados distribuídas ao cadastro técnico multifinalitário para uso das principais técnicas de mineração de dados espaciais. Ao final, o trabalho discute a hipótese e a contribuição da pesquisa, visando atender principalmente às características regionais, buscando contribuir para o avanço tecnológico do país ao intensificar o uso de padrões abertos e geotecnologias livres na colaboração e gestão do conhecimentoAfter surveying the situation in some municipals Brazilian with respect to production and processing of spatial data, it was detected the lack of infrastructure, of information, and therefore the lack of mechanisms to support collaborative for data mining and spatial analysis. The difficulties increase with the spread of different structures of spatial data standards like ie: CAD / GIS produced by the rapid advancement of information technology, and real challenges to implementation of an interoperable infrastructure and it focus of several discussions. However access to this data via the Internet and the problems caused in the same exchange are directly related to the particular nature of each standard adopted, so it they should must be analyzed and appropriate for collaboration. Initially, the hypothesis of the study aims to enhance interoperability between spatial data and systems integration, making it possible to establish communication channels for a collaborative environment aimed at potential and cooperative actions. From this, the study presents an investigation into the relevant aspects that influence the projects engineering, resulting in the development of the prototype called OpenCGFW (Collaborative Geospatial Framework Web), to the recognition of structures, integration, manipulation and collaboration, in tuning with efforts GSDI-INDE, OGC and W3C. Initially, studies and reviews on subjects directly related to interoperability. Are also discussed issues related to storage, processing between collaboration computational and specifically geographic data produced by different public institutions. For construction of the framework was applied MCDA-C method (Multicriteria Decision Aid - Constructivist) to identify the fundamental and elementary. From this work also describes the results obtained in implementing the steps of a design pattern to support the activities and evaluating free geo-solutions. During the discussion, are present the results through experiments and applications of the web mapping for digital maps to integrate multiple databases distributed of the multipurpose cadaster and use of the main techniques of spatial data mining. At the end, the work discusses the hypothesis and the contribution of research, mainly to meet the regional characteristics, seeking to contribute to the technological advancement of the country intensifying the use of open standards, the free geo-solutions collaboration and knowledge managemen

    On the Design of a Parallel Object-Oriented Data Mining Toolkit On the Design of a Parallel Object-Oriented Data Mining Toolkit

    No full text
    Abstract As data mining techniques are applied to ever larger data sets, it is becoming clear that parallel processors will play an important role in reducing the turn around time for data analysis. In this paper, we describe the design of a parallel object-oriented toolkit for mining scientific data sets. After a brief discussion of our design goals, we describe our overall system design that uses data mining to find useful information in raw data in an iterative and interactive manner. Using decision trees as an example, we illustrate how the need to support flexibility and extensibility can make the parallel implementation of our algorithms very challenging. As this is work in progress, we also describe the solution approaches we are considering to address these challenges

    On the Design of a Parallel Object-Oriented Data Mining Toolkit

    No full text
    As data mining techniques are applied to ever larger data sets, it is becoming clear that parallel processors will play an important role in reducing the turn-around time for data analysis. In this paper, we describe the design of a parallel object-oriented toolkit for mining scientic data sets. After a brief discussion of our design goals, we describe our overall system design that uses data mining to nd useful information in raw data in an iterative and interactive manner. Using decision trees as an example, we illustrate how the need to support exibility and extensibility can make the parallel implementation of our algorithms very challenging. We describe the solution approaches we are considering to address these challenges. As this is work in progress, we also present some preliminary results using an astronomy data set. 1. INTRODUCTION Parallel data mining is the exploitation of ne grained parallelism in data mining, using tightly-coupled processors connected by a high-band..

    A Framework for Object-Oriented Data Mining Based on Higher-Order Logic Programming

    No full text
    corecore