1,324 research outputs found

    Deep learning models for road passability detection during flood events using social media data

    Get PDF
    During natural disasters, situational awareness is needed to understand the situation and respond accordingly. A key need is assessing open roads for transporting emergency support to victims. This can be done via analysis of photos from affected areas with known location. This paper studies the problem of detecting blocked / open roads from photos during floods by applying a two-step approach based on classifiers: does the image have evidence of road? If it does, is the road passable or not? We propose a single double-ended neural network (NN) architecture which addresses both tasks at the same time. Both problems are treated as a single class classification problem by the usage of a compactness loss. The study is performed on a set of tweets, posted during flooding events, that contain (i)~metadata and (ii)~visual information. We study the usefulness of each source of data and the combination of both. Finally, we do a study of the performance gain from ensembling different networks. Through the experimental results we prove that the proposed double-ended NN makes the model almost two times faster and memory lighter while improving the results with respect to training two separate networks to solve each problem independently

    Generalized weighting for bagged ensembles

    Get PDF
    Ensemble learning is a popular classification method where many individual simple learners contribute to a final prediction. Constructing an ensemble of learners has been shown to consistently improve prediction accuracy over a single learner. The most common types of ensembles include: bootstrap aggregated (bagged), boosted, and stacked. Each are different, yet has the same foundation of combining multiple learners. In this dissertation, we focus our attention to bagged ensembles; namely we propose a generalization by way of model weighting. The new method is motivated by the potential instability of averaging predictions of trees that may be of highly variable quality. To alleviate this, we replace the usual arithmetic average with a Cesaro average for weighted trees in the random forest. We provide both a theoretical analysis that gives exact conditions under which we would expect this weighted ensemble approach to do well, and numerical analysis that shows the new approach is competitive to other bagged ensembles when training a classification model on numerous realistic data sets. Going a step further we generalize our weights such that we allow simultaneous control over bias and variance. In particular, we introduce a regularization term that controls the variance reduction for bagged ensembles. Therefore, a new tunable weighted bagged ensemble framework is proposed, resulting in a very flexible method for classification. Using this methodology, we explore the impact tunable weighting has on the votes of each learner in an ensemble. To aid in the applicability of this body of work, the author discusses an R package that allows users to implement our proposed weighting scheme to arbitrary bagged ensembles. The package provides tools for constructing tunable bagged ensembles in the form of weights and is titled wbensembleR

    Multi-tier framework for the inferential measurement and data-driven modeling

    Get PDF
    A framework for the inferential measurement and data-driven modeling has been proposed and assessed in several real-world application domains. The architecture of the framework has been structured in multiple tiers to facilitate extensibility and the integration of new components. Each of the proposed four tiers has been assessed in an uncoupled way to verify their suitability. The first tier, dealing with exploratory data analysis, has been assessed with the characterization of the chemical space related to the biodegradation of organic chemicals. This analysis has established relationships between physicochemical variables and biodegradation rates that have been used for model development. At the preprocessing level, a novel method for feature selection based on dissimilarity measures between Self-Organizing maps (SOM) has been developed and assessed. The proposed method selected more features than others published in literature but leads to models with improved predictive power. Single and multiple data imputation techniques based on the SOM have also been used to recover missing data in a Waste Water Treatment Plant benchmark. A new dynamic method to adjust the centers and widths of in Radial basis Function networks has been proposed to predict water quality. The proposed method outperformed other neural networks. The proposed modeling components have also been assessed in the development of prediction and classification models for biodegradation rates in different media. The results obtained proved the suitability of this approach to develop data-driven models when the complex dynamics of the process prevents the formulation of mechanistic models. The use of rule generation algorithms and Bayesian dependency models has been preliminary screened to provide the framework with interpretation capabilities. Preliminary results obtained from the classification of Modes of Toxic Action (MOA) indicate that this could be a promising approach to use MOAs as proxy indicators of human health effects of chemicals.Finally, the complete framework has been applied to three different modeling scenarios. A virtual sensor system, capable of inferring product quality indices from primary process variables has been developed and assessed. The system was integrated with the control system in a real chemical plant outperforming multi-linear correlation models usually adopted by chemical manufacturers. A model to predict carcinogenicity from molecular structure for a set of aromatic compounds has been developed and tested. Results obtained after the application of the SOM-dissimilarity feature selection method yielded better results than models published in the literature. Finally, the framework has been used to facilitate a new approach for environmental modeling and risk management within geographical information systems (GIS). The SOM has been successfully used to characterize exposure scenarios and to provide estimations of missing data through geographic interpolation. The combination of SOM and Gaussian Mixture models facilitated the formulation of a new probabilistic risk assessment approach.Aquesta tesi proposa i avalua en diverses aplicacions reals, un marc general de treball per al desenvolupament de sistemes de mesurament inferencial i de modelat basats en dades. L'arquitectura d'aquest marc de treball s'organitza en diverses capes que faciliten la seva extensibilitat així com la integració de nous components. Cadascun dels quatre nivells en que s'estructura la proposta de marc de treball ha estat avaluat de forma independent per a verificar la seva funcionalitat. El primer que nivell s'ocupa de l'anàlisi exploratòria de dades ha esta avaluat a partir de la caracterització de l'espai químic corresponent a la biodegradació de certs compostos orgànics. Fruit d'aquest anàlisi s'han establert relacions entre diverses variables físico-químiques que han estat emprades posteriorment per al desenvolupament de models de biodegradació. A nivell del preprocés de les dades s'ha desenvolupat i avaluat una nova metodologia per a la selecció de variables basada en l'ús del Mapes Autoorganitzats (SOM). Tot i que el mètode proposat selecciona, en general, un major nombre de variables que altres mètodes proposats a la literatura, els models resultants mostren una millor capacitat predictiva. S'han avaluat també tot un conjunt de tècniques d'imputació de dades basades en el SOM amb un conjunt de dades estàndard corresponent als paràmetres d'operació d'una planta de tractament d'aigües residuals. Es proposa i avalua en un problema de predicció de qualitat en aigua un nou model dinàmic per a ajustar el centre i la dispersió en xarxes de funcions de base radial. El mètode proposat millora els resultats obtinguts amb altres arquitectures neuronals. Els components de modelat proposat s'han aplicat també al desenvolupament de models predictius i de classificació de les velocitats de biodegradació de compostos orgànics en diferents medis. Els resultats obtinguts demostren la viabilitat d'aquesta aproximació per a desenvolupar models basats en dades en aquells casos en els que la complexitat de dinàmica del procés impedeix formular models mecanicistes. S'ha dut a terme un estudi preliminar de l'ús de algorismes de generació de regles i de grafs de dependència bayesiana per a introduir una nova capa que faciliti la interpretació dels models. Els resultats preliminars obtinguts a partir de la classificació dels Modes d'acció Tòxica (MOA) apunten a que l'ús dels MOA com a indicadors intermediaris dels efectes dels compostos químics en la salut és una aproximació factible.Finalment, el marc de treball proposat s'ha aplicat en tres escenaris de modelat diferents. En primer lloc, s'ha desenvolupat i avaluat un sensor virtual capaç d'inferir índexs de qualitat a partir de variables primàries de procés. El sensor resultant ha estat implementat en una planta química real millorant els resultats de les correlacions multilineals emprades habitualment. S'ha desenvolupat i avaluat un model per a predir els efectes carcinògens d'un grup de compostos aromàtics a partir de la seva estructura molecular. Els resultats obtinguts desprès d'aplicar el mètode de selecció de variables basat en el SOM milloren els resultats prèviament publicats. Aquest marc de treball s'ha usat també per a proporcionar una nova aproximació al modelat ambiental i l'anàlisi de risc amb sistemes d'informació geogràfica (GIS). S'ha usat el SOM per a caracteritzar escenaris d'exposició i per a desenvolupar un nou mètode d'interpolació geogràfica. La combinació del SOM amb els models de mescla de gaussianes dona una nova formulació al problema de l'anàlisi de risc des d'un punt de vista probabilístic

    Ensemble of classifiers based data fusion of EEG and MRI for diagnosis of neurodegenerative disorders

    Get PDF
    The prevalence of Alzheimer\u27s disease (AD), Parkinson\u27s disease (PD), and mild cognitive impairment (MCI) are rising at an alarming rate as the average age of the population increases, especially in developing nations. The efficacy of the new medical treatments critically depends on the ability to diagnose these diseases at the earliest stages. To facilitate the availability of early diagnosis in community hospitals, an accurate, inexpensive, and noninvasive diagnostic tool must be made available. As biomarkers, the event related potentials (ERP) of the electroencephalogram (EEG) - which has previously shown promise in automated diagnosis - in addition to volumetric magnetic resonance imaging (MRI), are relatively low cost and readily available tools that can be used as an automated diagnosis tool. 16-electrode EEG data were collected from 175 subjects afflicted with Alzheimer\u27s disease, Parkinson\u27s disease, mild cognitive impairment, as well as non-disease (normal control) subjects. T2 weighted MRI volumetric data were also collected from 161 of these subjects. Feature extraction methods were used to separate diagnostic information from the raw data. The EEG signals were decomposed using the discrete wavelet transform in order to isolate informative frequency bands. The MR images were processed through segmentation software to provide volumetric data of various brain regions in order to quantize potential brain tissue atrophy. Both of these data sources were utilized in a pattern recognition based classification algorithm to serve as a diagnostic tool for Alzheimer\u27s and Parkinson\u27s disease. Support vector machine and multilayer perceptron classifiers were used to create a classification algorithm trained with the EEG and MRI data. Extracted features were used to train individual classifiers, each learning a particular subset of the training data, whose decisions were combined using decision level fusion. Additionally, a severity analysis was performed to diagnose between various stages of AD as well as a cognitively normal state. The study found that EEG and MRI data hold complimentary information for the diagnosis of AD as well as PD. The use of both data types with a decision level fusion improves diagnostic accuracy over the diagnostic accuracy of each individual data source. In the case of AD only diagnosis, ERP data only provided a 78% diagnostic performance, MRI alone was 89% and ERP and MRI combined was 94%. For PD only diagnosis, ERP only performance was 67%, MRI only was 70%, and combined performance was 78%. MCI only diagnosis exhibited a similar effect with a 71% ERP performance, 82% MRI performance, and 85% combined performance. Diagnosis among three subject groups showed the same trend. For PD, AD, and normal diagnosis ERP only performance was 43%, MRI only was 66%, and combined performance was 71%. The severity analysis for mild AD, severe AD, and normal subjects showed the same combined effect

    Tiger sharks support the characterization of the world’s largest seagrass ecosystem

    Get PDF
    Seagrass conservation is critical for mitigating climate change due to the large stocks of carbon they sequester in the seafloor. However, effective conservation and its potential to provide nature-based solutions to climate change is hindered by major uncertainties regarding seagrass extent and distribution. Here, we describe the characterization of the world’s largest seagrass ecosystem, located in The Bahamas. We integrate existing spatial estimates with an updated empirical remote sensing product and perform extensive ground-truthing of seafloor with 2,542 diver surveys across remote sensing tiles. We also leverage seafloor assessments and movement data obtained from instrument-equipped tiger sharks, which have strong fidelity to seagrass ecosystems, to augment and further validate predictions. We report a consensus area of at least 66,000 km and up to 92,000 km of seagrass habitat across The Bahamas Banks. Sediment core analysis of stored organic carbon further confirmed the global relevance of the blue carbon stock in this ecosystem. Data from tiger sharks proved important in supporting mapping and ground-truthing remote sensing estimates. This work provides evidence of major knowledge gaps in the ocean ecosystem, the benefits in partnering with marine animals to address these gaps, and underscores support for rapid protection of oceanic carbon sinks

    Tiger sharks support the characterization of the world’s largest seagrass ecosystem

    Get PDF
    Seagrass conservation is critical formitigating climate change due to the large stocks of carbon they sequester in the seafloor. However, effective conservation and its potential to provide nature-based solutions to climate change is hindered by major uncertainties regarding seagrass extent and distribution. Here, we describe the characterization of the world’s largest seagrass ecosystem, located in The Bahamas. We integrate existing spatial estimates with an updated empirical remote sensing product and perform extensive groundtruthing of seafloor with 2,542 diver surveys across remote sensing tiles. We also leverage seafloor assessments and movement data obtained from instrument-equipped tiger sharks, which have strong fidelity to seagrass ecosystems, to augment and further validate predictions. We report a consensus area of at least 66,000 km2 and up to 92,000km2 of seagrass habitat across The Bahamas Banks. Sediment core analysis of stored organic carbon further confirmed the global relevance of the blue carbon stock in this ecosystem. Data from tiger sharks proved important in supporting mapping and groundtruthing remote sensing estimates. This work provides evidence of major knowledge gaps in the ocean ecosystem, the benefits in partnering with marine animals to address these gaps, and underscores support for rapid protection of oceanic carbon sinks.Beneath The WavesDisney Conservation FundBarry and Mimi Sternlicht FoundationSant FamilyPictet FoundationPacific Treasure FoundationKing FamilyD. and J. Harris, B. Coughlin and FamilyP. Nicholson and WCPD FoundationSouthern TideHillsdaleThayer AcademyDiscovery CommunicationsMary O'Malley and Lupo Dion TrustNational Geographic SocietyJ. Lake and JDL, Inc.Towle Family Ocean FoundationKaro Family TrustScience Foundation Ireland 18/SIRG/5549King Abdullah University of Science & Technology36Pi

    Decision-based data fusion of complementary features for the early diagnosis of Alzheimer\u27s disease

    Get PDF
    As the average life expectancy increases, particularly in developing countries, the prevalence of Alzheimer\u27s disease (AD), which is the most common form of dementia worldwide, has increased dramatically. As there is no cure to stop or reverse the effects of AD, the early diagnosis and detection is of utmost concern. Recent pharmacological advances have shown the ability to slow the progression of AD; however, the efficacy of these treatments is dependent on the ability to detect the disease at the earliest stage possible. Many patients are limited to small community clinics, by geographic and/or financial constraints. Making diagnosis possible at these clinics through an accurate, inexpensive, and noninvasive tool is of great interest. Many tools have been shown to be effective at the early diagnosis of AD. Three in particular are focused upon in this study: event-related potentials (ERPs) in electroencephalogram (EEG) recordings, magnetic resonance imaging (MRI), as well as positron emission tomography (PET). These biomarkers have been shown to contain diagnostically useful information regarding the development of AD in an individual. The combination of these biomarkers, if they provide complementary information, can boost overall diagnostic accuracy of an automated system. EEG data acquired from an auditory oddball paradigm, along with volumetric T2 weighted MRI data and PET imagery representative of metabolic glucose activity in the brain was collected from a cohort of 447 patients, along with other biomarkers and metrics relating to neurodegenerative disease. This study in particular focuses on AD versus control diagnostic ability from the cohort, in addition to AD severity analysis. An assortment of feature extraction methods were employed to extract diagnostically relevant information from raw data. EEG signals were decomposed into frequency bands of interest hrough the discrete wavelet transform (DWT). MRI images were reprocessed to provide volumetric representations of specific regions of interest in the cranium. The PET imagery was segmented into regions of interest representing glucose metabolic rates within the brain. Multi-layer perceptron neural networks were used as the base classifier for the augmented stacked generalization algorithm, creating three overall biomarker experts for AD diagnosis. The features extracted from each biomarker were used to train classifiers on various subsets of the cohort data; the decisions from these classifiers were then combined to achieve decision-based data fusion. This study found that EEG, MRI and PET data each hold complementary information for the diagnosis of AD. The use of all three in tandem provides greater diagnostic accuracy than using any single biomarker alone. The highest accuracy obtained through the EEG expert was 86.1 ±3.2%, with MRI and PET reaching 91.1 +3.2% and 91.2 ±3.9%, respectively. The maximum diagnostic accuracy of these systems averaged 95.0 ±3.1% when all three biomarkers were combined through the decision fusion algorithm described in this study. The severity analysis for AD showed similar results, with combination performance exceeding that of any biomarker expert alone
    • …
    corecore