9 research outputs found

    Information Theory for Nonparametric Learning and Probabilistic Prediction : Applications in Earth Science and Geostatistics

    Get PDF
    Interessant, aber herausfordernd: Erdsysteme sind oft komplex und ihre Probleme unterbestimmt. Lückenhaftes Verständnis relevanter Teilsysteme (Komplexitätsfrage) und die Unmöglichkeit, alles, überall und zu jeder Zeit beobachten zu können (Unterbestimmtheitsfrage), führen zu einer erheblichen inferentiellen und prädiktiven Unsicherheit. Tatsächlich ist diese Unsicherheit eines der Probleme der Erdsystemforschung, und ihre Quantifizierung ist folglich ein wesentlicher Aspekt der geowissenschaftlichen Analyse und Prognose. Zusätzlich erhöht das Nichtberücksichtigen von Unsicherheit durch deterministische Modelle oder starke parametrische Annahmen die Starrheit des Modells (als Gegenpol zur Allgemeinheit). Infolgedessen können starre Modelle zu sowohl übermäßig eingeschränkten als auch übermäßig zuversichtlichen Lösungen und damit einer suboptimalen Nutzung der verfügbaren Daten führen. Um vor diesem Hintergrund mit der Unsicherheit, die sich aus dem Mangel an Wissen oder Daten ergibt, umzugehen, spielen probabilistische Inferenz und Unsicherheitsquantifizierung eine zentrale Rolle in der Modellierung oder Analyse solcher komplexen und unterbestimmten Systeme. Unsicherheit und Information können durch Maße aus der Informationstheorie objektiv quantifiziert werden, die in Verbindung mit nichtparametrischer probabilistischer Modellierung einen geeigneten Rahmen für die Bewertung des Informationsgehalts von Daten und Modellen bietet. Außerdem hilft es, das Problem der Verwendung starrer Modelle zu überwinden, die zu einem gewissen Grad Unsicherheiten ignorieren, nicht in den Daten vorhandene Informationen hinzufügen, oder verfügbare Informationen verlieren. Diese Doktorarbeit befasst sich mit der oben skizzierten Fragestellung: Einen nichtparametrischen und probabilistischen Rahmen für geowissenschaftliche Probleme vorzuschlagen und zu validieren, der auf den Konzepten der Informationstheorie aufbaut. Prädiktive Beziehungen werden durch multivariate und empirische Wahrscheinlichkeitsverteilungen ausgedrückt, die direkt aus Daten abgeleitet werden. Die Informationstheorie wird verwendet, um den Informationsgehalt aus verschiedenen Quellen in einer universellen Einheit explizit zu berechnen und zu vergleichen. Drei typische geowissenschaftliche Probleme werden durch die Sichtweise der Informationstheorie neu betrachtet. Die Testumgebungen umfassen deskriptive und inferentielle Problemstellungen und befassen sich mit unterschiedlichen Datentypen (kontinuierlich oder kategorial), Domänen (räumliche oder zeitliche Daten), Stichprobengrößen und räumlichen Abhängigkeitseigenschaften. Zunächst wird ein nichtparametrischer Ansatz zur Identifikation von Niederschlags-Abfluss-Ereignissen entwickelt, an einem realen Datensatz getestet und mit einem physikalisch basierten Modell verglichen (Kapitel 2). Die Ergebnisse dieser Studie (Kapitel 3) bilden die Grundlage für die Entwicklung eines verteilungsfreien Ansatzes für geostatistische Fragestellungen, dessen Eigenschaften an einem synthetischen Datensatz getestet und mit Ordinary Kriging verglichen werden. Schließlich wird in Kapitel 4 die vorgeschlagene Methode für den Umgang mit kategorischen Daten und für die Simulation von Feldeigenschaften angepasst. Sie wird an einem realen Datensatz zur Klassifizierung des Bodenkontaminationsrisikos durch Blei getestet und ihre Eigenschaften mit Indicator Kriging verglichen. Jede Testanwendung befasst sich mit bestimmten Themen, die seit langem von geowissenschaftlichem Interesse sind, und beinhaltet gleichzeitig die übergreifenden Probleme der Unbestimmtheit und Komplexität. Aus den drei in dieser Arbeit vorgestellten Anwendungen ergeben sich mehrere Erkenntnisse. Der vorgeschlagene nichtparametrische Rahmen aus Basis der Informationstheorie (i) vermeidet die Einführung unerwünschter Nebeninformationen oder den Verlust vorhandener Informationen; (ii) ermöglicht die direkte Quantifizierung der Unsicherheit und des Informationsgehalts von Datensätzen sowie die Analyse von Mustern und Datenbeziehungen; (iii) beschreibt die Einflussfaktoren eines Systems; (iv) ermöglicht die Auswahl des informativsten Modells je nach Verfügbarkeit des Datensatzes; (v) reduziert die Notwendigkeit für Annahmen und minimiert Unsicherheiten; (vi) ermöglicht den Umgang mit kategorischen oder kontinuierlichen Daten; und (vii) ist anwendbar auf jede Art von Datenbeziehungen. Aufgrund der Fortschritte in der Rechenleistung und der hochentwickelten Instrumentierung, die heutzutage zur Verfügung stehen, nimmt die Verknüpfung der Geowissenschaften mit verwandten Disziplinen deutlich zu. Die Integration von Wahrscheinlichkeits- und Informationstheorie in einem nichtparametrischen Kontext garantiert einerseits die nötige Allgemeinheit und Flexibilität, um jede Art von Datenbeziehungen und Begrenzungen des Datenumfangs zu handhaben, und bietet andererseits ein Werkzeug für die Interpretation in Bezug auf den Informationsgehalt oder auf sein Gegenstück, die Unsicherheit. Diese inhärente Interdisziplinarität ermöglicht auch eine größere Flexibilität bei der Modellierung in Bezug auf die Zielgröße und die Freiheitsgrade. Beim Vorhandensein genügender Daten liegt das Potential datengetriebener Modellierungsansätze darin, dass sie ohne große Einschränkungen durch funktionale oder parametrische Annahmen und Entscheidungen auskommen. Die in dieser Arbeit vorgestellten Anwendungsbeispiele für den vorgeschlagenen Rahmen sind nur einige von vielen möglichen Anwendungen. Insgesamt trägt diese Doktorarbeit mit dem darin vorgeschlagenen Rahmen dazu bei, Konzeptualisierung und Komprimierung von Datenbeziehungen bei der Modellbildung zu vermeiden, wodurch der Informationsgehalt der Daten erhalten wird. Gleichzeitig ermöglicht er eine realistischere Berücksichtigung der damit verbundenen Unsicherheiten. In einem erweiterten Kontext bietet er einen Perspektivenwechsel bei der Darstellung und Nutzung von geowissenschaftlichem Wissen aus Sicht der Informationstheorie

    Assessing local and spatial uncertainty with nonparametric geostatistics

    Get PDF
    Uncertainty quantification is an important topic for many environmental studies, such as identifying zones where potentially toxic materials exist in the soil. In this work, the nonparametric geostatistical framework of histogram via entropy reduction (HER) is adapted to address local and spatial uncertainty in the context of risk of soil contamination. HER works with empirical probability distributions, coupling information theory and probability aggregation methods to estimate conditional distributions, which gives it the flexibility to be tailored for different data and application purposes. To explore how HER can be used for estimating threshold-exceeding probabilities, it is applied to map the risk of soil contamination by lead in the well-known dataset of the region of Swiss Jura. Its results are compared to indicator kriging (IK) and to an ordinary kriging (OK) model available in the literature. For the analyzed dataset, IK and HER predictions achieve the best performance and exhibit comparable accuracy and precision. Compared to IK, advantages of HER for uncertainty estimation in a fine resolution are that it does not require modeling of multiple indicator variograms, correcting order-relation violations, or defining interpolation/extrapolation of distributions. Finally, to avoid the well-known smoothing effect when using point estimations (as is the case with both kriging and HER), and to provide maps that reflect the spatial fluctuation of the observed reality, we demonstrate how HER can be used in combination with sequential simulation to assess spatial uncertainty (uncertainty jointly over several locations)

    Aplicação de ferramenta SIG para mapeamento geotécnico e cartas de aptidão para fundação a partir de ensaios SPT : um estudo de caso em Blumenau/SC

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia Civil, Florianópolis, 2016.A possibilidade de realização de análises envolvendo informações quantitativas e qualitativas associadas a elementos geográficos garante aos Sistemas de Informações Geográficas (SIG) um atrativo diferencial no planejamento em diversas instâncias. Por meio de um estudo de caso, o presente trabalho apresenta o descritivo dos procedimentos de tratamentos e processamento de dados geotécnicos georreferenciados para a elaboração do mapeamento geotécnico do município de Blumenau/SC, com base na metodologia Davison Dias (1995), e de mapas de aptidão para fundação a partir de 537 sondagens do tipo SPT (standard penetration test). Para o desenvolvimento do estudo, foram analisadas as 27 microbacias do município de Blumenau/SC. Dentre as informações geradas por meio dos recursos de geoprocessamento e como resultados parciais do estudo de caso, foram elaborados o Modelo Digital do Terreno (MDT), mapas de hidrografia, declividade, geologia, litologia e pedologia. Finalmente, além da obtenção do mapeamento geotécnico em si, o mesmo foi associado a informações geomecânicas de laudos de sondagens SPT, resultando em perfis estratigráficos de referência para as unidades geotécnicas estimadas para a região: Cambissolo substrato gnaisse (Cgn), Cambissolo substrato siltito, folhelho e arenito (Cs,f,a), Glei substrato sedimentos quaternários (GHsq) e Podzólico Vermelho-Amarelo substrato gnaisse (PVgn). Foram elaborados mapas de profundidade do impenetrável a sondagem à percussão, de profundidade do lençol freático, de tensão admissível para fundação superficial e de isolinhas de NSPT orientado para fundação profunda, a partir do qual se extraiu o comprimento máximo para alguns tipos de estacas para a região urbanizada de Blumenau/SC. Os resultados mostraram-se válidos não somente por possibilitar a aplicação prática de manipulação e modelagem do banco de dados em ambiente SIG, mas também por propiciarem resultados estratégicos para tomada de decisão no âmbito público, favorecendo políticas de uso e ocupação do solo, e privado, orientando e fornecendo informações em linguagem técnica para profissionais da área de fundações e geólogos.Abstract: The possibility of analysis involving quantitative and qualitative information associated with geographic features guarantees Geographic Information Systems (GIS) as a solid foundation for planning in all instances. Through a case study, this paper describes procedures for the treatment and processing of geotechnical data, which were georeferenced to develop the geotechnical engineering mapping for the city of Blumenau/SC, based on the methodology proposed by Davison Dias (1995), and foundation suitability maps from 537 SPT boreholes (standard penetration test boreholes). To develop the study, the 27 watersheds from Blumenau/SC were analyzed. Among the information generated by means of the geoprocessing resources and as partial case study results, the Digital Terrain Model (DTM), hydrographic, slope, geological, lithological and pedological maps were developed. Finally, in addition to obtaining the geotechnical mapping itself, it was associated with geomechanical information from SPT reports, resulting in stratigraphic reference profiles for the geotechnical units of the region: Cambisol substrate gneiss (Cgn), Cambisol substrate siltstone, shale and sandstone (Cs,f,a), Gleysol substrate quaternary sediments (GHsq) and Red-Yellow Podzolic substrate gneiss (PVgn). Maps were developed portraying: SPT impenetrable layer depth, groundwater level, allowable stress for shallow foundation and NSPT contour lines oriented for deep foundation, from which the maximum length for common types of piles for the urbanized region of Blumenau/SC were extracted. The results were valid not only for enabling the practical application of manipulation and modeling of database in GIS environment, but also in providing strategic results for decision making in the public sphere, promoting land use and occupation policies, and in the private sector, guiding and providing information in technical language for professionals in the field of foundations and geologists

    Geological-Geotechnical Database from Standard Penetration Test Investigations Using Geographic Information Systems

    Get PDF
    The study describes applications of Geographic Information Systems (GIS) associated with Standard Penetration Test (SPT) reports as a support tool for planning and decision-making in public and private spheres. The chapter begins with a bibliography compilation showing recent applications carried out around the world. Following this, the description of the geological-geotechnical method using SPT information applied in two case studies is presented in particular. For that, an extensive detailing of SPT reports treatment is done to enable the composition of a geological-geotechnical database. Two cases are shown to exemplify the method application and its results, including the characterization of the topographic relief through Digital Elevation Model (DEM), slope and hydrographic map and the development of soil, groundwater and foundation maps using a geological-geotechnical database composed basically by SPT data. The cases approach a larger scale using 507 SPT boreholes to analyze a university campus with 1 km2 and on a smaller scale using 537 SPT boreholes to analyze a city with 207.2 km2 of urban area. Different possibilities of applications for information management are discussed over the chapter

    Histogram via entropy reduction (HER): an information-theoretic alternative for geostatistics

    Get PDF
    Interpolation of spatial data has been regarded in many different forms, varying from deterministic to stochastic, parametric to nonparametric, and purely data-driven to geostatistical methods. In this study, we propose a nonparametric interpolator, which combines information theory with probability aggregation methods in a geostatistical framework for the stochastic estimation of unsampled points. Histogram via entropy reduction (HER) predicts conditional distributions based on empirical probabilities, relaxing parameterizations and, therefore, avoiding the risk of adding information not present in data. By construction, it provides a proper framework for uncertainty estimation since it accounts for both spatial configuration and data values, while allowing one to introduce or infer properties of the field through the aggregation method. We investigate the framework using synthetically generated data sets and demonstrate its efficacy in ascertaining the underlying field with varying sample densities and data properties. HER shows a comparable performance to popular benchmark models, with the additional advantage of higher generality. The novel method brings a new perspective of spatial interpolation and uncertainty analysis to geostatistics and statistical learning, using the lens of information theory

    Identifying rainfall-runoff events in discharge time series: a data-driven method based on information theory

    Get PDF
    In this study, we propose a data-driven approach for automatically identifying rainfall-runoff events in discharge time series. The core of the concept is to construct and apply discrete multivariate probability distributions to obtain probabilistic predictions of each time step that is part of an event. The approach permits any data to serve as predictors, and it is non-parametric in the sense that it can handle any kind of relation between the predictor(s) and the target. Each choice of a particular predictor data set is equivalent to formulating a model hypothesis. Among competing models, the best is found by comparing their predictive power in a training data set with user-classified events. For evaluation, we use measures from information theory such as Shannon entropy and conditional entropy to select the best predictors and models and, additionally, measure the risk of overfitting via cross entropy and Kullback–Leibler divergence. As all these measures are expressed in “bit”, we can combine them to identify models with the best tradeoff between predictive power and robustness given the available data. We applied the method to data from the Dornbirner Ach catchment in Austria, distinguishing three different model types: models relying on discharge data, models using both discharge and precipitation data, and recursive models, i.e., models using their own predictions of a previous time step as an additional predictor. In the case study, the additional use of precipitation reduced predictive uncertainty only by a small amount, likely because the information provided by precipitation is already contained in the discharge data. More generally, we found that the robustness of a model quickly dropped with the increase in the number of predictors used (an effect well known as the curse of dimensionality) such that, in the end, the best model was a recursive one applying four predictors (three standard and one recursive): discharge from two distinct time steps, the relative magnitude of discharge compared with all discharge values in a surrounding 65 h time window and event predictions from the previous time step. Applying the model reduced the uncertainty in event classification by 77.8 %, decreasing conditional entropy from 0.516 to 0.114 bits. To assess the quality of the proposed method, its results were binarized and validated through a holdout method and then compared to a physically based approach. The comparison showed similar behavior of both models (both with accuracy near 90 %), and the cross-validation reinforced the quality of the proposed model. Given enough data to build data-driven models, their potential lies in the way they learn and exploit relations between data unconstrained by functional or parametric assumptions and choices. And, beyond that, the use of these models to reproduce a hydrologist\u27s way of identifying rainfall-runoff events is just one of many potential applications

    Analysis of driven nanorod transport through a biopolymer matrix

    No full text
    Applying magnetic fields to guide and retain drug-loaded magnetic particles in vivo has been proposed as a way of treating illnesses. Largely, these efforts have been targeted at tumors. One significant barrier to long range transport within tumors is the extracellular matrix (ECM). We perform single particle measurements of 18 nm diameter nanorods undergoing magnetophoresis through ECM, and analyze the motion of these nanorods in two dimensions. We observe intra-particle magnetophoresis in this viscoelastic environment and measure the fraction of time these nanorods spend effectively hindered, versus effectively translating
    corecore