4,515 research outputs found
Machine Learning for Fluid Mechanics
The field of fluid mechanics is rapidly advancing, driven by unprecedented
volumes of data from field measurements, experiments and large-scale
simulations at multiple spatiotemporal scales. Machine learning offers a wealth
of techniques to extract information from data that could be translated into
knowledge about the underlying fluid mechanics. Moreover, machine learning
algorithms can augment domain knowledge and automate tasks related to flow
control and optimization. This article presents an overview of past history,
current developments, and emerging opportunities of machine learning for fluid
mechanics. It outlines fundamental machine learning methodologies and discusses
their uses for understanding, modeling, optimizing, and controlling fluid
flows. The strengths and limitations of these methods are addressed from the
perspective of scientific inquiry that considers data as an inherent part of
modeling, experimentation, and simulation. Machine learning provides a powerful
information processing framework that can enrich, and possibly even transform,
current lines of fluid mechanics research and industrial applications.Comment: To appear in the Annual Reviews of Fluid Mechanics, 202
Financial time series analysis with competitive neural networks
L’objectif principal de mémoire est la modélisation des données temporelles non stationnaires. Bien que les modèles statistiques classiques tentent de corriger les données non stationnaires en différenciant et en ajustant pour la tendance, je tente de créer des grappes localisées de données de séries temporelles stationnaires grâce à l’algorithme du « self-organizing map ». Bien que de nombreuses techniques aient été développées pour les séries chronologiques à l’aide du « self- organizing map », je tente de construire un cadre mathématique qui justifie son utilisation dans la prévision des séries chronologiques financières. De plus, je compare les méthodes de prévision existantes à l’aide du SOM avec celles pour lesquelles un cadre mathématique a été développé et qui n’ont pas été appliquées dans un contexte de prévision. Je compare ces méthodes avec la méthode ARIMA bien connue pour la prévision des séries chronologiques. Le deuxième objectif de mémoire est de démontrer la capacité du « self-organizing map » à regrouper des données vectorielles, puisqu’elle a été développée à l’origine comme un réseau neuronal avec l’objectif de regroupement. Plus précisément, je démontrerai ses capacités de regroupement sur les données du « limit order book » et présenterai diverses méthodes de visualisation de ses sorties.The main objective of this Master’s thesis is in the modelling of non-stationary time series data. While classical statistical models attempt to correct non- stationary data through differencing and de-trending, I attempt to create localized clusters of stationary time series data through the use of the self-organizing map algorithm. While numerous techniques have been developed that model time series using the self-organizing map, I attempt to build a mathematical framework that justifies its use in the forecasting of financial times series. Additionally, I compare existing forecasting methods using the SOM with those for which a framework has been developed and which have not been applied in a forecasting context. I then compare these methods with the well known ARIMA method of time series forecasting. The second objective of this thesis is to demonstrate the self-organizing map’s ability to cluster data vectors as it was originally developed as a neural network approach to clustering. Specifically I will demonstrate its clustering abilities on limit order book data and present various visualization methods of its output
Nonlinear data driven techniques for process monitoring
The goal of this research is to develop process monitoring technology capable of taking advantage of the large stores of data accumulating in modern chemical plants. There is demand for new techniques for the monitoring of non-linear topology and behavior, and this research presents a topological preservation method for process monitoring using Self Organizing Maps (SOM). The novel architecture presented adapts SOM to a full spectrum of process monitoring tasks including fault detection, fault identification, fault diagnosis, and soft sensing. The key innovation of the new technique is its use of multiple SOM (MSOM) in the data modeling process as well as the use of a Gaussian Mixture Model (GMM) to model the probability density function of classes of data. For comparison, a linear process monitoring technique based on Principal Component Analysis (PCA) is also used to demonstrate the improvements SOM offers. Data for the computational experiments was generated using a simulation of the Tennessee Eastman process (TEP) created in Simulink by (Ricker 1996). Previous studies focus on step changes from normal operations, but this work adds operating regimes with time dependent dynamics not previously considered with a SOM. Results show that MSOM improves upon both linear PCA as well as the standard SOM technique using one map for fault diagnosis, and also shows a superior ability to isolate which variables in the data are responsible for the faulty condition. With respect to soft sensing, SOM and MSOM modeled the compositions equally well, showing that no information was lost in dividing the map representation of process data. Future research will attempt to validate the technique on a real chemical process
Complexity and Uncertainty in Human and Ecological Risk Assessment
Multiple interacting stressors in the environment present increasingly complex risks to human health. Too often, however, the data required for traditional risk assessment are either lacking or unavailable at the necessary spatial or temporal scale. In addition, assessment practices and management policies need to move away from single factor approaches in order to accommodate the reality of complex chemical mixtures and environmental stressors. Recent literature suggests that a paradigm shift is under way. This points to a need for the development of new techniques both for rapid data collection and flexible risk assessment strategies that can adapt to make use of readily available data. This dissertation presents two types of methods for improving the risk assessment process given these evolving challenges: predictive analytics and integrated effect-directed toxicity screening.
The first technique addresses the characterization of environmental health using toxicological screening tools. Environmental influences on ecological and human health are often studied using indicators that represent important risk components such as chemical contamination, hazards, exposures, and biological stress. Unfortunately, studies are frequently constrained by the lack of calibrated indicators constructed from standardized metrics.
The second technique is a novel method for population-level risk assessment that uses self-organizing feature maps (SOM) to generate multivariate clusters of cause-of-death and birth outcome metrics, in combination with the use of and supervised learning risk-propagation modelling to evaluate predictability of available indicators. I apply this method to identify exposure-outcome linkages at the county level for Wisconsin, USA and civil divisions in Dobrogea, Romania; thereby providing a dynamic visualization of public health risk relationships with behavioral risk factors (e.g. smoking, heavy drinking) and environmental factors (e.g. land cover, nitrates and faecal coliform in drinking water). These risk relationships do not demonstrate cause-effect, but provide guidance for targeted investigations and for risk-management prioritization.
To investigate a unique way of measuring environmental health, a sediment contact assay using zebrafish (Danio rerio) embryos was adapted from Hollert et al. (2003) as an indicator of teratogenic stress within river sediments. Sediment samples were collected from Lake Michigan tributary watersheds. Sediment contact assay responses were then compared to prevalence of congenital heart disease (CHD) and vital statistic birth indicators aggregated from civil divisions associated with these same watersheds. Significant risk relationships were detected between variation in early life-stage (ELS) endpoints of zebrafish embryos 72 hour post-fertilization and the birth prevalence of human congenital heart disease and infant mortality. Examination of principal components of ELS endpoints suggests that variance related to zebrafish embryonic heart and circulatory malformations is most closely associated with human CHD prevalence.
This study demonstrates a novel application of effect-based toxicity testing for ecological and human health risk assessments. These results support the hypothesis that bioassays normally used for ecological screening can be useful as indicators of environmental stress to humans so as to expand our understanding of environmental - human health linkages. Finally, next steps and new directions for these lines of thinking are discussed
A survey of machine learning techniques applied to self organizing cellular networks
In this paper, a survey of the literature of the past fifteen years involving Machine Learning (ML) algorithms applied to self organizing cellular networks is performed. In order for future networks to overcome the current limitations and address the issues of current cellular systems, it is clear that more intelligence needs to be deployed, so that a fully autonomous and flexible network can be enabled. This paper focuses on the learning perspective of Self Organizing Networks (SON) solutions and provides, not only an overview of the most common ML techniques encountered in cellular networks, but also manages to classify each paper in terms of its learning solution, while also giving some examples. The authors also classify each paper in terms of its self-organizing use-case and discuss how each proposed solution performed. In addition, a comparison between the most commonly found ML algorithms in terms of certain SON metrics is performed and general guidelines on when to choose each ML algorithm for each SON function are proposed. Lastly, this work also provides future research directions and new paradigms that the use of more robust and intelligent algorithms, together with data gathered by operators, can bring to the cellular networks domain and fully enable the concept of SON in the near future
Predictive modeling of clinical outcomes for hospitalized COVID-19 patients utilizing CyTOF and clinical data.
In December 2019, an outbreak of a novel coronavirus initiated a global pandemic. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a virus that causes the disease coronavirus disease 2019 (COVID-19). Symptoms of infection with COVID-19 vary widely between individuals. While some infected individuals are asymptomatic, others need more extensive care and require hospitalization. Indeed, the COVID-19 pandemic was characterized by a shortage of hospital beds which presented additional complications in providing adequate care for patients. In this study, we used a combination of T cell population data collected from mass cytometry analysis and clinical markers to form a predictive model of clinical outcomes for hospitalized COVID19 patients. This thesis details the steps and analysis towards the design of the final model including data acquirement and preprocessing, missing data handling via multiple imputation, and repeated imputations inferences
Modeling of soil weathering on hillslopes : coping with nonlinearity and coupled processes using a data-driven approach
Orientadores: Carlos Roberto de Souza Filho, Michael James FriedelTese (doutorado) - Universidade Estadual de Campinas, Instituto de GeociênciasResumo: Esta tese de doutorado tem como objetivo aprofundar o conhecimento sobre as relações das propriedades físico-quimicas do solo com a morfometria do relevo, buscando quantificar essas relações para a construção de modelos conceituais e preditivos. Mapas auto-organizáveis e modelos de sistemas de informação geográfica foram utilizados para investigar as relações não lineares associadas ao intemperismo químico e físico, fatores associados a fenômenos hidrológicos e à evolução dos solos. Três estudos de caso são apresentados: o intemperismo químico de solo no estado do Paraná (22 variáveis e 304 amostras), o transporte físico de sedimentos em Poços de Caldas (9 variáveis e 29 amostras), e hidroquímica de aqüíferos na Formação Serra Geral no Estado do Paraná (27 variáveis e 976 amostras). O método combinando simulação estocástica e mineração de dados permitiu explorar as relações entre relevo, granulometria e geoquímica dos solos. Regiões mais elevadas e com morfometria convexa apresentaram alta denudação de elementos móveis (e.g., Ca) e baixa de elementos pouco móveis (e.g., Al). O mesmo padrão foi observado para granulometria de solos, ou seja, alta proporção de areia em áreas altas e convexas da bacia e altos teores de argila, com baixa condutividade hidráulica, em regiões convexas próximas aos canais de drenagem. O comportamento espacial da hidroquímica das águas do aqüífero Serra Geral apontou áreas de potencial conectividade entre aqüíferos, áreas de recarga recente e de alto tempo de residência. Foram construídos modelos preditivos não tendenciosos das propriedades do solo em subsuperfície partindo da premissa de que o intemperismo e a morfometria se relacionam através de um processo duplamente dependente, onde a denudação física e química atua no delineamento do relevo e a morfometria do terreno é um fator que caracteriza as condições físico-químicas do soloAbstract: This Doctoral thesis aims to explore the relationship between soil physical-chemical properties and relief morphometry, and quantifying these relationships to build conceptual and predictive models. Self-organizing maps and Geographic Information Systems modeling are here used to investigate nonlinear correlations associated with chemical and physical denudation; which are factors connected with hydrological phenomena and soil evolution. Three study cases are presented: soil chemical weathering within the limits of the Parana State, southern Brazil (22 variables and 304 samples), physical transport of sediments in the alkaline intrusive complex of Poços de Caldas, southeastern Brazil (9 variables and 29 samples), and hydrochemistry of Serra Geral aquifers also in the Parana State (27 variables and 976 samples). The method combining stochastic simulation and data mining allows exploring the relationships between topography, soil texture and soil geochemistry. In the Parana State, higher regions and areas with convex morphometry shows, respectively, higher and lower denudation rates of mobile (e.g., Ca) and less mobile (e.g., Al) elements. The same pattern is observed for soil particle size. In this case, high proportion of sand is found in highlands and convex areas inside the basin, and high clay content, with low hydraulic conductivity, occurs in convex regions, near drainage channels. The spatial behavior of the Serra Geral aquifer?s hydrochemistry pointed out to areas with potential connectivity with the Guarani aquifer system, recent recharge areas, and long-standing waters. Predictive, unbiased models are built for soil properties on the premise that weathering and morphology are related through a two-way dependent process, where the physical and chemical denudation delineates the elevations of the land surface, and terrain morphometry is a factor that characterizes the physical-chemical conditions of the soilDoutoradoGeologia e Recursos NaturaisDoutor em Ciência
- …