87 research outputs found

    Semisupervised Tangent Space Discriminant Analysis

    Get PDF
    A novel semisupervised dimensionality reduction method named Semisupervised Tangent Space Discriminant Analysis (STSD) is presented, where we assume that data can be well characterized by a linear function on the underlying manifold. For this purpose, a new regularizer using tangent spaces is developed, which not only can capture the local manifold structure from both labeled and unlabeled data, but also has the complementarity with the Laplacian regularizer. Furthermore, STSD has an analytic form of the global optimal solution which can be computed by solving a generalized eigenvalue problem. To perform nonlinear dimensionality reduction and process structured data, a kernel extension of our method is also presented. Experimental results on multiple real-world data sets demonstrate the effectiveness of the proposed method

    Visual Techniques for Geological Fieldwork Using Mobile Devices

    Get PDF
    Visual techniques in general and 3D visualisation in particular have seen considerable adoption within the last 30 years in the geosciences and geology. Techniques such as volume visualisation, for analysing subsurface processes, and photo-coloured LiDAR point-based rendering, to digitally explore rock exposures at the earth’s surface, were applied within geology as one of the first adopting branches of science. A large amount of digital, geological surface- and volume data is nowadays available to desktop-based workflows for geological applications such as hydrocarbon reservoir exploration, groundwater modelling, CO2 sequestration and, in the future, geothermal energy planning. On the other hand, the analysis and data collection during fieldwork has yet to embrace this ”digital revolution”: sedimentary logs, geological maps and stratigraphic sketches are still captured in each geologist’s individual fieldbook, and physical rocks samples are still transported to the lab for subsequent analysis. Is this still necessary, or are there extended digital means of data collection and exploration in the field ? Are modern digital interpretation techniques accurate and intuitive enough to relevantly support fieldwork in geology and other geoscience disciplines ? This dissertation aims to address these questions and, by doing so, close the technological gap between geological fieldwork and office workflows in geology. The emergence of mobile devices and their vast array of physical sensors, combined with touch-based user interfaces, high-resolution screens and digital cameras provide a possible digital platform that can be used by field geologists. Their ubiquitous availability increases the chances to adopt digital workflows in the field without additional, expensive equipment. The use of 3D data on mobile devices in the field is furthered by the availability of 3D digital outcrop models and the increasing ease of their acquisition. This dissertation assesses the prospects of adopting 3D visual techniques and mobile devices within field geology. The research of this dissertation uses previously acquired and processed digital outcrop models in the form of textured surfaces from optical remote sensing and photogrammetry. The scientific papers in this thesis present visual techniques and algorithms to map outcrop photographs in the field directly onto the surface models. Automatic mapping allows the projection of photo interpretations of stratigraphy and sedimentary facies on the 3D textured surface while providing the domain expert with simple-touse, intuitive tools for the photo interpretation itself. The developed visual approach, combining insight from all across the computer sciences dealing with visual information, merits into the mobile device Geological Registration and Interpretation Toolset (GRIT) app, which is assessed on an outcrop analogue study of the Saltwick Formation exposed at Whitby, North Yorkshire, UK. Although being applicable to a diversity of study scenarios within petroleum geology and the geosciences, the particular target application of the visual techniques is to easily provide field-based outcrop interpretations for subsequent construction of training images for multiple point statistics reservoir modelling, as envisaged within the VOM2MPS project. Despite the success and applicability of the visual approach, numerous drawbacks and probable future extensions are discussed in the thesis based on the conducted studies. Apart from elaborating on more obvious limitations originating from the use of mobile devices and their limited computing capabilities and sensor accuracies, a major contribution of this thesis is the careful analysis of conceptual drawbacks of established procedures in modelling, representing, constructing and disseminating the available surface geometry. A more mathematically-accurate geometric description of the underlying algebraic surfaces yields improvements and future applications unaddressed within the literature of geology and the computational geosciences to this date. Also, future extensions to the visual techniques proposed in this thesis allow for expanded analysis, 3D exploration and improved geological subsurface modelling in general.publishedVersio

    Nonlinear Dimensionality Reduction Methods in Climate Data Analysis

    Full text link
    Linear dimensionality reduction techniques, notably principal component analysis, are widely used in climate data analysis as a means to aid in the interpretation of datasets of high dimensionality. These linear methods may not be appropriate for the analysis of data arising from nonlinear processes occurring in the climate system. Numerous techniques for nonlinear dimensionality reduction have been developed recently that may provide a potentially useful tool for the identification of low-dimensional manifolds in climate data sets arising from nonlinear dynamics. In this thesis I apply three such techniques to the study of El Nino/Southern Oscillation variability in tropical Pacific sea surface temperatures and thermocline depth, comparing observational data with simulations from coupled atmosphere-ocean general circulation models from the CMIP3 multi-model ensemble. The three methods used here are a nonlinear principal component analysis (NLPCA) approach based on neural networks, the Isomap isometric mapping algorithm, and Hessian locally linear embedding. I use these three methods to examine El Nino variability in the different data sets and assess the suitability of these nonlinear dimensionality reduction approaches for climate data analysis. I conclude that although, for the application presented here, analysis using NLPCA, Isomap and Hessian locally linear embedding does not provide additional information beyond that already provided by principal component analysis, these methods are effective tools for exploratory data analysis.Comment: 273 pages, 76 figures; University of Bristol Ph.D. thesis; version with high-resolution figures available from http://www.skybluetrades.net/thesis/ian-ross-thesis.pdf (52Mb download

    Information Theory for Nonparametric Learning and Probabilistic Prediction : Applications in Earth Science and Geostatistics

    Get PDF
    Interessant, aber herausfordernd: Erdsysteme sind oft komplex und ihre Probleme unterbestimmt. Lückenhaftes Verständnis relevanter Teilsysteme (Komplexitätsfrage) und die Unmöglichkeit, alles, überall und zu jeder Zeit beobachten zu können (Unterbestimmtheitsfrage), führen zu einer erheblichen inferentiellen und prädiktiven Unsicherheit. Tatsächlich ist diese Unsicherheit eines der Probleme der Erdsystemforschung, und ihre Quantifizierung ist folglich ein wesentlicher Aspekt der geowissenschaftlichen Analyse und Prognose. Zusätzlich erhöht das Nichtberücksichtigen von Unsicherheit durch deterministische Modelle oder starke parametrische Annahmen die Starrheit des Modells (als Gegenpol zur Allgemeinheit). Infolgedessen können starre Modelle zu sowohl übermäßig eingeschränkten als auch übermäßig zuversichtlichen Lösungen und damit einer suboptimalen Nutzung der verfügbaren Daten führen. Um vor diesem Hintergrund mit der Unsicherheit, die sich aus dem Mangel an Wissen oder Daten ergibt, umzugehen, spielen probabilistische Inferenz und Unsicherheitsquantifizierung eine zentrale Rolle in der Modellierung oder Analyse solcher komplexen und unterbestimmten Systeme. Unsicherheit und Information können durch Maße aus der Informationstheorie objektiv quantifiziert werden, die in Verbindung mit nichtparametrischer probabilistischer Modellierung einen geeigneten Rahmen für die Bewertung des Informationsgehalts von Daten und Modellen bietet. Außerdem hilft es, das Problem der Verwendung starrer Modelle zu überwinden, die zu einem gewissen Grad Unsicherheiten ignorieren, nicht in den Daten vorhandene Informationen hinzufügen, oder verfügbare Informationen verlieren. Diese Doktorarbeit befasst sich mit der oben skizzierten Fragestellung: Einen nichtparametrischen und probabilistischen Rahmen für geowissenschaftliche Probleme vorzuschlagen und zu validieren, der auf den Konzepten der Informationstheorie aufbaut. Prädiktive Beziehungen werden durch multivariate und empirische Wahrscheinlichkeitsverteilungen ausgedrückt, die direkt aus Daten abgeleitet werden. Die Informationstheorie wird verwendet, um den Informationsgehalt aus verschiedenen Quellen in einer universellen Einheit explizit zu berechnen und zu vergleichen. Drei typische geowissenschaftliche Probleme werden durch die Sichtweise der Informationstheorie neu betrachtet. Die Testumgebungen umfassen deskriptive und inferentielle Problemstellungen und befassen sich mit unterschiedlichen Datentypen (kontinuierlich oder kategorial), Domänen (räumliche oder zeitliche Daten), Stichprobengrößen und räumlichen Abhängigkeitseigenschaften. Zunächst wird ein nichtparametrischer Ansatz zur Identifikation von Niederschlags-Abfluss-Ereignissen entwickelt, an einem realen Datensatz getestet und mit einem physikalisch basierten Modell verglichen (Kapitel 2). Die Ergebnisse dieser Studie (Kapitel 3) bilden die Grundlage für die Entwicklung eines verteilungsfreien Ansatzes für geostatistische Fragestellungen, dessen Eigenschaften an einem synthetischen Datensatz getestet und mit Ordinary Kriging verglichen werden. Schließlich wird in Kapitel 4 die vorgeschlagene Methode für den Umgang mit kategorischen Daten und für die Simulation von Feldeigenschaften angepasst. Sie wird an einem realen Datensatz zur Klassifizierung des Bodenkontaminationsrisikos durch Blei getestet und ihre Eigenschaften mit Indicator Kriging verglichen. Jede Testanwendung befasst sich mit bestimmten Themen, die seit langem von geowissenschaftlichem Interesse sind, und beinhaltet gleichzeitig die übergreifenden Probleme der Unbestimmtheit und Komplexität. Aus den drei in dieser Arbeit vorgestellten Anwendungen ergeben sich mehrere Erkenntnisse. Der vorgeschlagene nichtparametrische Rahmen aus Basis der Informationstheorie (i) vermeidet die Einführung unerwünschter Nebeninformationen oder den Verlust vorhandener Informationen; (ii) ermöglicht die direkte Quantifizierung der Unsicherheit und des Informationsgehalts von Datensätzen sowie die Analyse von Mustern und Datenbeziehungen; (iii) beschreibt die Einflussfaktoren eines Systems; (iv) ermöglicht die Auswahl des informativsten Modells je nach Verfügbarkeit des Datensatzes; (v) reduziert die Notwendigkeit für Annahmen und minimiert Unsicherheiten; (vi) ermöglicht den Umgang mit kategorischen oder kontinuierlichen Daten; und (vii) ist anwendbar auf jede Art von Datenbeziehungen. Aufgrund der Fortschritte in der Rechenleistung und der hochentwickelten Instrumentierung, die heutzutage zur Verfügung stehen, nimmt die Verknüpfung der Geowissenschaften mit verwandten Disziplinen deutlich zu. Die Integration von Wahrscheinlichkeits- und Informationstheorie in einem nichtparametrischen Kontext garantiert einerseits die nötige Allgemeinheit und Flexibilität, um jede Art von Datenbeziehungen und Begrenzungen des Datenumfangs zu handhaben, und bietet andererseits ein Werkzeug für die Interpretation in Bezug auf den Informationsgehalt oder auf sein Gegenstück, die Unsicherheit. Diese inhärente Interdisziplinarität ermöglicht auch eine größere Flexibilität bei der Modellierung in Bezug auf die Zielgröße und die Freiheitsgrade. Beim Vorhandensein genügender Daten liegt das Potential datengetriebener Modellierungsansätze darin, dass sie ohne große Einschränkungen durch funktionale oder parametrische Annahmen und Entscheidungen auskommen. Die in dieser Arbeit vorgestellten Anwendungsbeispiele für den vorgeschlagenen Rahmen sind nur einige von vielen möglichen Anwendungen. Insgesamt trägt diese Doktorarbeit mit dem darin vorgeschlagenen Rahmen dazu bei, Konzeptualisierung und Komprimierung von Datenbeziehungen bei der Modellbildung zu vermeiden, wodurch der Informationsgehalt der Daten erhalten wird. Gleichzeitig ermöglicht er eine realistischere Berücksichtigung der damit verbundenen Unsicherheiten. In einem erweiterten Kontext bietet er einen Perspektivenwechsel bei der Darstellung und Nutzung von geowissenschaftlichem Wissen aus Sicht der Informationstheorie

    Low-Dimensional Representations of Earth System Processes

    Get PDF
    In times of global change, we must closely monitor the state of our planet in order to understand gradual or abrupt changes early on. In fact, each of the Earth's subsystems-i.e. the biosphere, atmosphere, hydrosphere, cryosphere, and anthroposphere-can be analyzed from a multitude of data streams. However, since it is very hard to jointly interpret multiple monitoring data streams in parallel, one often aims for some summarizing indicator. Climate indices, for example, summarize the state of atmospheric circulation in a region, e.g. the Multivariate ENSO (El Ñino-Southern Oscillation) Index. Indicator approaches have been used extensively to describe socioeconomic data too, and a range of indices have been proposed to synthesize and interpret this information. For instance the "Human Development Index" (HDI) by the United Nations Development Programme was designed to capture specific aspects of development. "Dimensionality reduction" (DR) is a widely used approach to find low dimensional and interpretable representations of data that are natively embedded in high-dimensional spaces. Here, we propose a robust method to create indicators using dimensionality reduction to better represent the terrestrial biosphere and the global socioeconomic system. We aim to explore the performance of the approach and to interpret the resulting indicators. For biosphere indicators, the concept was tested using 12 explanatory variables representing the biophysical states of ecosystems and land-atmosphere water, energy, and carbon fluxes. We find that two indicators account for 73% of the variance of the state of the biosphere in space and time. While the first indicator summarizes productivity patterns, the second indicator summarizes variables representing water and energy availability. Anomalies in the indicators clearly identify extreme events, such as the Amazon droughts (2005 and 2010) and the Russian heatwave (2010), they also allow us to interpret the impacts of these events. The indicators also reveal changes in the seasonal cycle, e.g. increasing seasonal amplitudes of productivity in agricultural areas and in arctic regions. We also apply the method on the "World Development Indicators", a database with more than 1500 variables, to track the socioeconomic development at a country level. The aim was to extract the core dimensions of development in a highly efficient way, using a method of nonlinear dimensionality reduction. We find that over 90% of variance in the WDIs can be represented by five uncorrelated and nonlinear dimensions. The first dimension (explaining 74%) represents the state of education, health, income, infrastructure, trade, population, and pollution. The second dimension (explaining 10%) differentiates countries by gender ratios, labor market, and energy production patterns. Overall, we find that the data contained in the WDIs are highly nonlinear therefore requiring nonlinear methods to extract the main patterns of development. Globally, most countries show rather consistent temporal trends towards wealthier and aging societies. Deviations from the long-term trajectories are detected with our approach during warfare, environmental disasters, or fundamental political changes. In general we find that the indicator approach is able to extract general patterns from complex databases and that it can be applied to databases of varying characteristics. We also find that indicators are can different kinds of changes occurring in the system, such as extreme events, permanent changes or trends. Therefore it is a useful tool for general monitoring and exploratory data analysis. The approach is flexible and can be applied to complex datasets, such as large data, nonlinear data, as well as data with many missing values.In times of global change, we must closely monitor the state of our planet in order to understand gradual or abrupt changes early on. In fact, each of the Earth's subsystems-i.e. the biosphere, atmosphere, hydrosphere, cryosphere, and anthroposphere-can be analyzed from a multitude of data streams. However, since it is very hard to jointly interpret multiple monitoring data streams in parallel, one often aims for some summarizing indicator. Climate indices, for example, summarize the state of atmospheric circulation in a region, e.g. the Multivariate ENSO (El Ñino-Southern Oscillation) Index. Indicator approaches have been used extensively to describe socioeconomic data too, and a range of indices have been proposed to synthesize and interpret this information. For instance the "Human Development Index" (HDI) by the United Nations Development Programme was designed to capture specific aspects of development. "Dimensionality reduction" (DR) is a widely used approach to find low dimensional and interpretable representations of data that are natively embedded in high-dimensional spaces. Here, we propose a robust method to create indicators using dimensionality reduction to better represent the terrestrial biosphere and the global socioeconomic system. We aim to explore the performance of the approach and to interpret the resulting indicators. For biosphere indicators, the concept was tested using 12 explanatory variables representing the biophysical states of ecosystems and land-atmosphere water, energy, and carbon fluxes. We find that two indicators account for 73% of the variance of the state of the biosphere in space and time. While the first indicator summarizes productivity patterns, the second indicator summarizes variables representing water and energy availability. Anomalies in the indicators clearly identify extreme events, such as the Amazon droughts (2005 and 2010) and the Russian heatwave (2010), they also allow us to interpret the impacts of these events. The indicators also reveal changes in the seasonal cycle, e.g. increasing seasonal amplitudes of productivity in agricultural areas and in arctic regions. We also apply the method on the "World Development Indicators", a database with more than 1500 variables, to track the socioeconomic development at a country level. The aim was to extract the core dimensions of development in a highly efficient way, using a method of nonlinear dimensionality reduction. We find that over 90% of variance in the WDIs can be represented by five uncorrelated and nonlinear dimensions. The first dimension (explaining 74%) represents the state of education, health, income, infrastructure, trade, population, and pollution. The second dimension (explaining 10%) differentiates countries by gender ratios, labor market, and energy production patterns. Overall, we find that the data contained in the WDIs are highly nonlinear therefore requiring nonlinear methods to extract the main patterns of development. Globally, most countries show rather consistent temporal trends towards wealthier and aging societies. Deviations from the long-term trajectories are detected with our approach during warfare, environmental disasters, or fundamental political changes. In general we find that the indicator approach is able to extract general patterns from complex databases and that it can be applied to databases of varying characteristics. We also find that indicators are can different kinds of changes occurring in the system, such as extreme events, permanent changes or trends. Therefore it is a useful tool for general monitoring and exploratory data analysis. The approach is flexible and can be applied to complex datasets, such as large data, nonlinear data, as well as data with many missing values

    Visual Analysis of Variability and Features of Climate Simulation Ensembles

    Get PDF
    This PhD thesis is concerned with the visual analysis of time-dependent scalar field ensembles as occur in climate simulations. Modern climate projections consist of multiple simulation runs (ensemble members) that vary in parameter settings and/or initial values, which leads to variations in the resulting simulation data. The goal of ensemble simulations is to sample the space of possible futures under the given climate model and provide quantitative information about uncertainty in the results. The analysis of such data is challenging because apart from the spatiotemporal data, also variability has to be analyzed and communicated. This thesis presents novel techniques to analyze climate simulation ensembles visually. A central question is how the data can be aggregated under minimized information loss. To address this question, a key technique applied in several places in this work is clustering. The first part of the thesis addresses the challenge of finding clusters in the ensemble simulation data. Various distance metrics lend themselves for the comparison of scalar fields which are explored theoretically and practically. A visual analytics interface allows the user to interactively explore and compare multiple parameter settings for the clustering and investigate the resulting clusters, i.e. prototypical climate phenomena. A central contribution here is the development of design principles for analyzing variability in decadal climate simulations, which has lead to a visualization system centered around the new Clustering Timeline. This is a variant of a Sankey diagram that utilizes clustering results to communicate climatic states over time coupled with ensemble member agreement. It can reveal several interesting properties of the dataset, such as: into how many inherently similar groups the ensemble can be divided at any given time, whether the ensemble diverges in general, whether there are different phases in the time lapse, maybe periodicity, or outliers. The Clustering Timeline is also used to compare multiple climate simulation models and assess their performance. The Hierarchical Clustering Timeline is an advanced version of the above. It introduces the concept of a cluster hierarchy that may group the whole dataset down to the individual static scalar fields into clusters of various sizes and densities recording the nesting relationship between them. One more contribution of this work in terms of visualization research is, that ways are investigated how to practically utilize a hierarchical clustering of time-dependent scalar fields to analyze the data. To this end, a system of different views is proposed which are linked through various interaction possibilities. The main advantage of the system is that a dataset can now be inspected at an arbitrary level of detail without having to recompute a clustering with different parameters. Interesting branches of the simulation can be expanded to reveal smaller differences in critical clusters or folded to show only a coarse representation of the less interesting parts of the dataset. The last building block of the suit of visual analysis methods developed for this thesis aims at a robust, (largely) automatic detection and tracking of certain features in a scalar field ensemble. Techniques are presented that I found can identify and track super- and sub-levelsets. And I derive “centers of action” from these sets which mark the location of extremal climate phenomena that govern the weather (e.g. Icelandic Low and Azores High). The thesis also presents visual and quantitative techniques to evaluate the temporal change of the positions of these centers; such a displacement would be likely to manifest in changes in weather. In a preliminary analysis with my collaborators, we indeed observed changes in the loci of the centers of action in a simulation with increased greenhouse gas concentration as compared to pre-industrial concentration levels
    corecore