110 research outputs found

    Phenotyping Risk Profiles of Substance Use and Exploring the Dynamic Transitions in Use Patterns: Machine Learning Models using the COMPASS Data

    Get PDF
    Background Polysubstance use is on the rise among Canadian youth. Examining risk profiles and understanding how the transition occurs in use patterns can inform the design and implementation of polysubstance risk reduction intervention. The COMPASS study is longitudinal research examining health-related behaviours among Canadian secondary school students, capturing data from multiple sources. Machine learning (ML) techniques can reveal non-linearity and multivariate couplings associated with population-level longitudinal data to inform public health policies. Objectives The overarching goal of this thesis is to identify phenotypes of risk profiles of youth polysubstance use and examine the dynamic transitions of use patterns across time, utilizing both unsupervised ML methods and a latent variable modelling approach. This thesis also aims to understand how ML techniques are best used in modelling transitions and discovering the “hidden” patterns from large complex population-based health survey data, using the COMPASS dataset as a showcase. Methods A linked sample (N = 8824) of three annual waves of the COMPASS data collected starting from the school year of 2016-17 was used. Multiple imputations for missing values were performed. Substance use indicators, including cigarette smoking, e-cigarette use, alcohol drinking, and marijuana consumption, were categorized into “never use,” “occasional use,” and “current use.” To examine phenotypes of risk profiles, hierarchical clustering, partitioning around medoids (PAM), and fuzzy clustering algorithms were applied. The Boruta algorithm was used to identify a subset of features for cluster analysis. Both the internal and external indices were employed to evaluate the clustering validity. A multivariate latent Markov model (LMM) was implemented to explore the dynamic transitions of use patterns over time. The least absolute shrinkage and selection operator (LASSO) approach was applied to select the appropriate covariates for entering the LMM. Model selection was based on the Bayesian information criterion (BIC) and the goodness-of-fit test. Results The top factors impacting youth polysubstance use included the number of smoking friends, the number of skipped classes, the weekly money to spend/save oneself, and others. Four risk profiles of polysubstance use were identified across the three waves: low, medium-low, medium-high, and high-risk profiles. The heterogeneity in the prevalence and phenotype across these four risk profiles was confirmed. The internal measures of clustering performance measured by average silhouette width ranged from 0.51 to 0.55 across the three waves using different clustering algorithms. The clustering algorithms achieved a relatively high degree of agreement on cluster membership. Comparing the fuzzy (FANNY) clustering with PAM clustering, the adjusted Rand indices were 0.9698, 0.7676, and 0.6452 for the three waves. Four distinct use patterns were identified: no use (S1), occasional single-use of alcohol (S2), dual-use of e-cigarette and alcohol (S3), and current multi-use (S4). The initial probabilities of each subgroup were 0.5887, 0.2156, 0.1487, and 0.0470. The marginal distribution of S1 decreased, while that of S3 and S4 increased over time, indicating a tendency towards increased substance use as the students grew older. Although, generally, most students remained in the same subgroup across time, particularly the individuals in S4 with the highest transition probability (0.8668). Over time, those who transitioned typically moved towards a more severe use pattern group, e.g., S3 -> S4. Factors that impact the initial membership of use patterns and the dynamic transitions were multifaceted and complex across the four use patterns across the three waves. Not only do use patterns change with time, but so does the evidence in use patterns. Conclusion As the first study of its kind to ascertain risk profiles and dynamics of use patterns in youth polysubstance use, by employing ML approaches to the COMPASS dataset, this thesis provides insights into the opportunities and possibilities ahead for ML in Public Health. Findings from this thesis can be beneficial to practitioners in the field, such as school program managers or policymakers, in their capacity to develop interventions to prevent or remedy polysubstance use among youth

    Methodology for high resolution spatial analysis of the physical flood susceptibility of buildings in large river floodplains

    Get PDF
    The impacts of floods on buildings in urban areas are increasing due to the intensification of extreme weather events, unplanned or uncontrolled settlements and the rising vulnerability of assets. There are some approaches available for assessing the flood damage to buildings and critical infrastructure. To this point, however, it is extremely difficult to adapt these methods widely, due to the lack of high resolution classification and characterisation approaches for built structures. To overcome this obstacle, this work presents: first, a conceptual framework for understanding the physical flood vulnerability and the physical flood susceptibility of buildings, second, a methodological framework for the combination of methods and tools for a large-scale and high-resolution analysis and third, the testing of the methodology in three pilot sites with different development conditions. The conceptual framework narrows down an understanding of flood vulnerability, physical flood vulnerability and physical flood susceptibility and its relation to social and economic vulnerabilities. It describes the key features causing the physical flood susceptibility of buildings as a component of the vulnerability. The methodological framework comprises three modules: (i) methods for setting up a building topology, (ii) methods for assessing the susceptibility of representative buildings of each building type and (iii) the integration of the two modules with technological tools. The first module on the building typology is based on a classification of remote sensing data and GIS analysis involving seven building parameters, which appeared to be relevant for a classification of buildings regarding potential flood impacts. The outcome is a building taxonomic approach. A subsequent identification of representative buildings is based on statistical analyses and membership functions. The second module on the building susceptibility for representative buildings bears on the derivation of depth-physical impact functions. It relates the principal building components, including their heights, dimensions and materials, to the damage from different water levels. The material’s susceptibility is estimated based on international studies on the resistance of building materials and a fuzzy expert analysis. Then depth-physical impact functions are calculated referring to the principal components of the buildings which can be affected by different water levels. Hereby, depth-physical impact functions are seen as a means for the interrelation between the water level and the physical impacts. The third module provides the tools for implementing the methodology. This tool compresses the architecture for feeding the required data on the buildings with their relations to the building typology and the building-type specific depth-physical impact function supporting the automatic process. The methodology is tested in three flood plains pilot sites: (i) in the settlement of the Barrio Sur in Magangué and (ii) in the settlement of La Peña in Cicuco located on the flood plain of Magdalena River, Colombia and (iii) in a settlement of the city of Dresden, located on the Elbe River, Germany. The testing of the methodology covers the description of data availability and accuracy, the steps for deriving the depth-physical impact functions of representative buildings and the final display of the spatial distribution of the physical flood susceptibility. The discussion analyses what are the contributions of this work evaluating the findings of the methodology’s testing with the dissertation goals. The conclusions of the work show the contributions and limitations of the research in terms of methodological and empirical advancements and the general applicability in flood risk management.:1 INTRODUCTION 1 1.1 Background 1 1.2 State of the art 2 1.3 Problem statement 6 1.4 Objectives 6 1.5 Approach and outline 6 2 CONCEPTUAL FRAMEWORK 9 2.1 Flood vulnerability 10 2.2 Physical flood vulnerability 12 2.3 Physical flood susceptibility 14 3 METHODOLOGICAL FRAMEWORK 23 3.1 Module 1: Building taxonomy for settlements 24 3.1.1 Extraction of building features 24 3.1.2 Derivation of building parameters for setting up a building taxonomy 38 3.1.3 Selection of representative buildings for a building susceptibility assessment 51 3.2 Module 2: Physical susceptibility of representative buildings 57 3.2.1 Identification of building components 57 3.2.2 Qualification of building material susceptibility 62 3.2.3 Derivation of a depth-physical impact function 71 3.3 Module 3: Technological integration 77 3.3.1 Combination of the depth-physical impact function with the building taxonomic code 77 3.3.2 Tools supporting the physical susceptibility analysis 78 3.3.3 The users and their requirements 79 4 RESULTS OF THE METHODOLOGY TESTING 83 4.1 Pilot site “Kleinzschachwitz” – Dresden, Germany – Elbe River 83 4.1.1 Module 1: Building taxonomy – “Kleinzschachwitz” 85 4.1.2 Module 2: Physical susceptibility of representative buildings – “Kleinzschachwitz” 97 4.1.3 Module 3: Technological integration – “Kleinzschachwitz” 103 4.2 Pilot site “La Peña” – Cicuco, Colombia – Magdalena River 107 4.2.1 Module 1: Building taxonomy – “La Peña” 108 4.2.2 Module 2: Physical susceptibility of representative buildings – “La Peña” 121 4.2.3 Module 3: Technological integration– “La Peña” 129 4.3 Pilot site “Barrio Sur” – Magangué, Colombia – Magdalena River 133 4.3.1 Module 1: Building taxonomy – “Barrio Sur” 133 4.3.2 Module 2: Physical susceptibility of representative buildings – “Barrio Sur” 141 4.3.3 Module 3: Technological integration – “Barrio Sur” 147 4.4 Empirical findings 151 4.4.1 Empirical findings of Module 1 151 4.4.2 Empirical findings of Module 2 155 4.4.3 Empirical findings of Module 3 157 4.4.4 Guidance of the methodology 157 5 DISCUSSION 161 5.1 Discussion on the conceptual framework 161 5.2 Discussion on the methodological framework 161 5.2.1 Discussion on Module 1: the building taxonomic approach 162 5.2.2 Discussion on Module 2: the depth-physical impact function 164 6 CONCLUSIONS AND OUTLOOK 167 6.1 Conclusions 167 6.2 Outlook 168 REFERENCES 171 INDEX OF FIGURES 199 INDEX OF TABLES 201 APPENDICES 203In vielen Städten nehmen die Auswirkungen von Hochwasser auf Gebäude aufgrund immer extremerer Wetterereignisse, unkontrollierbarer Siedlungsbauten und der steigenden Vulnerabilität von Besitztümern stetig zu. Es existieren zwar bereits Ansätze zur Beurteilung von Wasserschäden an Gebäuden und Infrastrukturknotenpunkten. Doch ist es bisher schwierig, diese Methoden großräumig anzuwenden, da es an einer präzisen Klassifizierung und Charakterisierung von Gebäuden und anderen baulichen Anlagen fehlt. Zu diesem Zweck sollen in dieser Arbeit erstens ein Konzept für ein genaueres Verständnis der physischen Vulnerabilität von Gebäuden gegenüber Hochwasser dargelegt, zweitens ein methodisches Verfahren zur Kombination der bestehenden Methoden und Hilfsmittel mit dem Ziel einer großräumigen und hochauflösenden Analyse erarbeitet und drittens diese Methode an drei Pilotstandorten mit unterschiedlichem Ausbauzustand erprobt werden. Die Rahmenbedingungen des Konzepts grenzen die Begriffe der Vulnerabilität, der physischen Vulnerabilität und der physischen Anfälligkeit gegenüber Hochwasser ein und erörtern deren Beziehung zur sozialen und ökonomischen Vulnerabilität. Es werden die Merkmale der physischen Anfälligkeit von Gebäuden gegenüber Hochwasser als Bestandteil der Vulnerabilität definiert. Das methodische Verfahren umfasst drei Module: (i) Methoden zur Erstellung einer Gebäudetypologie, (ii) Methoden zur Bewertung der Anfälligkeit repräsentativer Gebäude jedes Gebäudetyps und (iii) die Kombination der beiden Module mit Hilfe technologischer Hilfsmittel. Das erste Modul zur Gebäudetypologie basiert auf der Klassifizierung von Fernerkundungsdaten und GIS-Analysen anhand von sieben Gebäudeparametern, die sich für die Klassifizierung von Gebäuden bezüglich ihres Risikopotenzials bei Hochwasser als wichtig erweisen. Daraus ergibt sich ein Ansatz zur Gebäudeklassifizierung. Die anschließende Ermittlung repräsentativer Gebäude beruht auf statistischen Analysen und Zugehörigkeitsfunktionen. Das zweite Modul zur Anfälligkeit repräsentativer Gebäude beruht auf der Ableitung von Funktion von Wasserstand und physischer Einwirkung. Es setzt die relevanten Gebäudemerkmale, darunter Höhe, Maße und Materialien, in Beziehung zum erwartbaren Schaden bei unterschiedlichen Wasserständen. Die Materialanfälligkeit wird aufgrund internationaler Studien zur Festigkeit von Baustoffen sowie durch Anwendung eines Fuzzy-Logic-Expertensystems eingeschätzt. Anschließend werden Wasserstand-Schaden-Funktionen unter Einbeziehung der Hauptgebäudekomponenten berechnet, die durch unterschiedliche Wasserstände in Mitleidenschaft gezogen werden können. Funktion von Wasserstand und physischer Einwirkung dienen hier dazu, den jeweiligen Wasserstand und die physischen Auswirkung in Beziehung zueinander zu setzen. Das dritte Modul stellt die zur Umsetzung der Methoden notwendigen Hilfsmittel vor. Zur Unterstützung des automatisierten Verfahrens dienen Hilfsmittel, die die Gebäudetypologie mit der Funktion von Wasserstand und physischer Einwirkung für Gebäude in Hochwassergebieten kombinieren. Die Methoden wurden anschließend in drei hochwassergefährdeten Pilotstandorten getestet: (i) in den Siedlungsgebieten von Barrio Sur in Magangué und (ii) von La Pena in Cicuco, zwei Überschwemmungsgebiete des Magdalenas in Kolumbien, und (iii) im Stadtgebiet von Dresden, das an der Elbe liegt. Das Testverfahren umfasst die Beschreibung der Datenverfügbarkeit und genauigkeit, die einzelnen Schritte zur Analyse der. Funktion von Wasserstand und physischer Einwirkung repräsentativer Gebäude sowie die Darstellung der räumlichen Verteilung der physischen Anfälligkeit für Hochwasser. In der Diskussion wird der Beitrag dieser Arbeit zur Beurteilung der Erkenntnisse der getesteten Methoden anhand der Ziele dieser Dissertation analysiert. Die Folgerungen beleuchten abschließend die Fortschritte und auch Grenzen der Forschung hinsichtlich methodischer und empirischer Entwicklungen sowie deren allgemeine Anwendbarkeit im Bereich des Hochwasserschutzes.:1 INTRODUCTION 1 1.1 Background 1 1.2 State of the art 2 1.3 Problem statement 6 1.4 Objectives 6 1.5 Approach and outline 6 2 CONCEPTUAL FRAMEWORK 9 2.1 Flood vulnerability 10 2.2 Physical flood vulnerability 12 2.3 Physical flood susceptibility 14 3 METHODOLOGICAL FRAMEWORK 23 3.1 Module 1: Building taxonomy for settlements 24 3.1.1 Extraction of building features 24 3.1.2 Derivation of building parameters for setting up a building taxonomy 38 3.1.3 Selection of representative buildings for a building susceptibility assessment 51 3.2 Module 2: Physical susceptibility of representative buildings 57 3.2.1 Identification of building components 57 3.2.2 Qualification of building material susceptibility 62 3.2.3 Derivation of a depth-physical impact function 71 3.3 Module 3: Technological integration 77 3.3.1 Combination of the depth-physical impact function with the building taxonomic code 77 3.3.2 Tools supporting the physical susceptibility analysis 78 3.3.3 The users and their requirements 79 4 RESULTS OF THE METHODOLOGY TESTING 83 4.1 Pilot site “Kleinzschachwitz” – Dresden, Germany – Elbe River 83 4.1.1 Module 1: Building taxonomy – “Kleinzschachwitz” 85 4.1.2 Module 2: Physical susceptibility of representative buildings – “Kleinzschachwitz” 97 4.1.3 Module 3: Technological integration – “Kleinzschachwitz” 103 4.2 Pilot site “La Peña” – Cicuco, Colombia – Magdalena River 107 4.2.1 Module 1: Building taxonomy – “La Peña” 108 4.2.2 Module 2: Physical susceptibility of representative buildings – “La Peña” 121 4.2.3 Module 3: Technological integration– “La Peña” 129 4.3 Pilot site “Barrio Sur” – Magangué, Colombia – Magdalena River 133 4.3.1 Module 1: Building taxonomy – “Barrio Sur” 133 4.3.2 Module 2: Physical susceptibility of representative buildings – “Barrio Sur” 141 4.3.3 Module 3: Technological integration – “Barrio Sur” 147 4.4 Empirical findings 151 4.4.1 Empirical findings of Module 1 151 4.4.2 Empirical findings of Module 2 155 4.4.3 Empirical findings of Module 3 157 4.4.4 Guidance of the methodology 157 5 DISCUSSION 161 5.1 Discussion on the conceptual framework 161 5.2 Discussion on the methodological framework 161 5.2.1 Discussion on Module 1: the building taxonomic approach 162 5.2.2 Discussion on Module 2: the depth-physical impact function 164 6 CONCLUSIONS AND OUTLOOK 167 6.1 Conclusions 167 6.2 Outlook 168 REFERENCES 171 INDEX OF FIGURES 199 INDEX OF TABLES 201 APPENDICES 203El impacto de las inundaciones sobre los edificios en zonas urbanas es cada vez mayor debido a la intensificación de los fenómenos meteorológicos extremos, asentamientos no controlados o no planificados y su creciente vulnerabilidad. Hay métodos disponibles para evaluar los daños por inundación en edificios e infraestructuras críticas. Sin embargo, es muy difícil implementar estos métodos sistemáticamente en grandes áreas debido a la falta de clasificación y caracterización de estructuras construidas en resoluciones detalladas. Para superar este obstáculo, este trabajo se enfoca, en primer lugar, en desarrollar un marco conceptual para comprender la vulnerabilidad y susceptibilidad física de edificios por inudaciones, en segundo lugar, en desarrollar un marco metodológico para la combinación de los métodos y herramientas para una análisis de alta resolución y en tercer lugar, la prueba de la metodología en tres sitios experimentales, con distintas condiciones de desarrollo. El marco conceptual se enfoca en comprender la vulnerabilidad y susceptibility de las edificaciones frente a inundaciones, y su relación con la vulnerabilidad social y económica. En él se describen las principales características físicas de la susceptibilidad de edificicaiones como un componente de la vulnerabilidad. El marco metodológico consta de tres módulos: (i) métodos para la derivación de topología de construcciones, (ii) métodos para evaluar la susceptibilidad de edificios representativos y (iii) la integración de los dos módulos a través herramientas tecnológicas. El primer módulo de topología de construcciones se basa en una clasificación de datos de sensoramiento rémoto y procesamiento SIG para la extracción de siete parámetros de las edficaciones. Este módulo parece ser aplicable para una clasificación de los edificios en relación con los posibles impactos de las inundaciones. El resultado es una taxonomía de las edificaciones y una posterior identificación de edificios representativos que se basa en análisis estadísticos y funciones de pertenencia. El segundo módulo consiste en el análisis de susceptibilidad de las construcciones representativas a través de funciones de profundidad del impacto físico. Las cuales relacionan los principales componentes de la construcción, incluyendo sus alturas, dimensiones y materiales con los impactos físicos a diferentes niveles de agua. La susceptibilidad del material se calcula con base a estudios internacionales sobre la resistencia de los materiales y un análisis a través de sistemas expertos difusos. Aquí, las funciones de profundidad de impacto físico son considerados como un medio para la interrelación entre el nivel del agua y los impactos físicos. El tercer módulo proporciona las herramientas necesarias para la aplicación de la metodología. Estas herramientas tecnológicas consisten en la arquitectura para la alimentación de los datos relacionados a la tipología de construcciones con las funciones de profundidad del impacto físico apoyado en procesos automáticos. La metodología es probada en tres sitios piloto: (i) en el Barrio Sur en Magangué y (ii) en la barrio de La Peña en Cicuco situado en la llanura inundable del Río Magdalena, Colombia y (iii) en barrio Kleinzschachwitz de la ciudad de Dresden, situado a orillas del río Elba, en Alemania. Las pruebas de la metodología abarca la descripción de la disponibilidad de los datos y la precisión, los pasos a seguir para obtener las funciones profundidad de impacto físico de edificios representativos y la presentación final de la distribución espacial de la susceptibilidad física frente inundaciones El discusión analiza las aportaciones de este trabajo y evalua los resultados de la metodología con relación a los objetivos. Las conclusiones del trabajo, muestran los aportes y limitaciones de la investigación en términos de avances metodológicos y empíricos y la aplicabilidad general de gestión del riesgo de inundaciones.:1 INTRODUCTION 1 1.1 Background 1 1.2 State of the art 2 1.3 Problem statement 6 1.4 Objectives 6 1.5 Approach and outline 6 2 CONCEPTUAL FRAMEWORK 9 2.1 Flood vulnerability 10 2.2 Physical flood vulnerability 12 2.3 Physical flood susceptibility 14 3 METHODOLOGICAL FRAMEWORK 23 3.1 Module 1: Building taxonomy for settlements 24 3.1.1 Extraction of building features 24 3.1.2 Derivation of building parameters for setting up a building taxonomy 38 3.1.3 Selection of representative buildings for a building susceptibility assessment 51 3.2 Module 2: Physical susceptibility of representative buildings 57 3.2.1 Identification of building components 57 3.2.2 Qualification of building material susceptibility 62 3.2.3 Derivation of a depth-physical impact function 71 3.3 Module 3: Technological integration 77 3.3.1 Combination of the depth-physical impact function with the building taxonomic code 77 3.3.2 Tools supporting the physical susceptibility analysis 78 3.3.3 The users and their requirements 79 4 RESULTS OF THE METHODOLOGY TESTING 83 4.1 Pilot site “Kleinzschachwitz” – Dresden, Germany – Elbe River 83 4.1.1 Module 1: Building taxonomy – “Kleinzschachwitz” 85 4.1.2 Module 2: Physical susceptibility of representative buildings – “Kleinzschachwitz” 97 4.1.3 Module 3: Technological integration – “Kleinzschachwitz” 103 4.2 Pilot site “La Peña” – Cicuco, Colombia – Magdalena River 107 4.2.1 Module 1: Building taxonomy – “La Peña” 108 4.2.2 Module 2: Physical susceptibility of representative buildings – “La Peña” 121 4.2.3 Module 3: Technological integration– “La Peña” 129 4.3 Pilot site “Barrio Sur” – Magangué, Colombia – Magdalena River 133 4.3.1 Module 1: Building taxonomy – “Barrio Sur” 133 4.3.2 Module 2: Physical susceptibility of representative buildings – “Barrio Sur” 141 4.3.3 Module 3: Technological integration – “Barrio Sur” 147 4.4 Empirical findings 151 4.4.1 Empirical findings of Module 1 151 4.4.2 Empirical findings of Module 2 155 4.4.3 Empirical findings of Module 3 157 4.4.4 Guidance of the methodology 157 5 DISCUSSION 161 5.1 Discussion on the conceptual framework 161 5.2 Discussion on the methodological framework 161 5.2.1 Discussion on Module 1: the building taxonomic approach 162 5.2.2 Discussion on Module 2: the depth-physical impact function 164 6 CONCLUSIONS AND OUTLOOK 167 6.1 Conclusions 167 6.2 Outlook 168 REFERENCES 171 INDEX OF FIGURES 199 INDEX OF TABLES 201 APPENDICES 20

    Qluster: An easy-to-implement generic workflow for robust clustering of health data

    Get PDF
    The exploration of heath data by clustering algorithms allows to better describe the populations of interest by seeking the sub-profiles that compose it. This therefore reinforces medical knowledge, whether it is about a disease or a targeted population in real life. Nevertheless, contrary to the so-called conventional biostatistical methods where numerous guidelines exist, the standardization of data science approaches in clinical research remains a little discussed subject. This results in a significant variability in the execution of data science projects, whether in terms of algorithms used, reliability and credibility of the designed approach. Taking the path of parsimonious and judicious choice of both algorithms and implementations at each stage, this article proposes Qluster, a practical workflow for performing clustering tasks. Indeed, this workflow makes a compromise between (1) genericity of applications (e.g. usable on small or big data, on continuous, categorical or mixed variables, on database of high-dimensionality or not), (2) ease of implementation (need for few packages, few algorithms, few parameters, ...), and (3) robustness (e.g. use of proven algorithms and robust packages, evaluation of the stability of clusters, management of noise and multicollinearity). This workflow can be easily automated and/or routinely applied on a wide range of clustering projects. It can be useful both for data scientists with little experience in the field to make data clustering easier and more robust, and for more experienced data scientists who are looking for a straightforward and reliable solution to routinely perform preliminary data mining. A synthesis of the literature on data clustering as well as the scientific rationale supporting the proposed workflow is also provided. Finally, a detailed application of the workflow on a concrete use case is provided, along with a practical discussion for data scientists. An implementation on the Dataiku platform is available upon request to the authors

    Pertanika Journal of Science & Technology

    Get PDF

    Pertanika Journal of Science & Technology

    Get PDF

    Pertanika Journal of Science & Technology

    Get PDF

    Sustainable Agriculture and Advances of Remote Sensing (Volume 2)

    Get PDF
    Agriculture, as the main source of alimentation and the most important economic activity globally, is being affected by the impacts of climate change. To maintain and increase our global food system production, to reduce biodiversity loss and preserve our natural ecosystem, new practices and technologies are required. This book focuses on the latest advances in remote sensing technology and agricultural engineering leading to the sustainable agriculture practices. Earth observation data, in situ and proxy-remote sensing data are the main source of information for monitoring and analyzing agriculture activities. Particular attention is given to earth observation satellites and the Internet of Things for data collection, to multispectral and hyperspectral data analysis using machine learning and deep learning, to WebGIS and the Internet of Things for sharing and publication of the results, among others

    A survey on generative adversarial networks for imbalance problems in computer vision tasks

    Get PDF
    Any computer vision application development starts off by acquiring images and data, then preprocessing and pattern recognition steps to perform a task. When the acquired images are highly imbalanced and not adequate, the desired task may not be achievable. Unfortunately, the occurrence of imbalance problems in acquired image datasets in certain complex real-world problems such as anomaly detection, emotion recognition, medical image analysis, fraud detection, metallic surface defect detection, disaster prediction, etc., are inevitable. The performance of computer vision algorithms can significantly deteriorate when the training dataset is imbalanced. In recent years, Generative Adversarial Neural Networks (GANs) have gained immense attention by researchers across a variety of application domains due to their capability to model complex real-world image data. It is particularly important that GANs can not only be used to generate synthetic images, but also its fascinating adversarial learning idea showed good potential in restoring balance in imbalanced datasets. In this paper, we examine the most recent developments of GANs based techniques for addressing imbalance problems in image data. The real-world challenges and implementations of synthetic image generation based on GANs are extensively covered in this survey. Our survey first introduces various imbalance problems in computer vision tasks and its existing solutions, and then examines key concepts such as deep generative image models and GANs. After that, we propose a taxonomy to summarize GANs based techniques for addressing imbalance problems in computer vision tasks into three major categories: 1. Image level imbalances in classification, 2. object level imbalances in object detection and 3. pixel level imbalances in segmentation tasks. We elaborate the imbalance problems of each group, and provide GANs based solutions in each group. Readers will understand how GANs based techniques can handle the problem of imbalances and boost performance of the computer vision algorithms
    corecore