Search CORE

668 research outputs found

Analyzing complex data using domain constraints

Author: Mauder Markus
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 19/06/2017
Field of study

Data-driven research approaches are becoming increasingly popular in a growing number of scientific disciplines. While a data-driven research approach can yield superior results, generating the required data can be very costly. This frequently leads to small and complex data sets, in which it is impossible to rely on volume alone to compensate for all shortcomings of the data. To counter this problem, other reliable sources of information must be incorporated. In this work, domain knowledge, as a particularly reliable type of additional information, is used to inform data-driven analysis methods. This domain knowledge is represented as constraints on the possible solutions, which the presented methods can use to inform their analysis. It focusses on spatial constraints as a particularly common type of constraint, but the proposed techniques are general enough to be applied to other types of constraints. In this thesis, new methods using domain constraints for data-driven science applications are discussed. These methods have applications in feature evaluation, route database repair, and Gaussian Mixture modeling of spatial data. The first application focuses on feature evaluation. The presented method receives two representations of the same data: one as the intended target and the other for investigation. It calculates a score indicating how much the two representations agree. A presented application uses this technique to compare a reference attribute set with different subsets to determine the importance and relevance of individual attributes. A second technique analyzes route data for constraint compliance. The presented framework allows the user to specify constraints and possible actions to modify the data. The presented method then uses these inputs to generate a version of the data, which agrees with the constraints, while otherwise reducing the impact of the modifications as much as possible. Two extensions of this schema are presented: an extension to continuously valued costs, which are minimized, and an extension to constraints involving more than one moving object. Another addressed application area is modeling of multivariate measurement data, which was measured at spatially distributed locations. The spatial information recorded with the data can be used as the basis for constraints. This thesis presents multiple approaches to building a model of this kind of data while complying with spatial constraints. The first approach is an interactive tool, which allows domain scientists to generate a model of the data, which complies with their knowledge about the data. The second is a Monte Carlo approach, which generates a large number of possible models, tests them for compliance with the constraints, and returns the best one. The final two approaches are based on the EM algorithm and use different ways of incorporating the information into their models. At the end of the thesis, two applications of the models, which have been generated in the previous chapter, are presented. The first is prediction of the origin of samples and the other is the visual representation of the extracted models on a map. These tools can be used by domain scientists to augment their tried and tested tools. The developed techniques are applied to a real-world data set collected in the archaeobiological research project FOR 1670 (Transalpine mobility and cultural transfer) of the German Science Foundation. The data set contains isotope ratio measurements of samples, which were discovered at archaeological sites in the Alps region of central Europe. Using the presented data analysis methods, the data is analyzed to answer relevant domain questions. In a first application, the attributes of the measurements are analyzed for their relative importance and their ability to predict the spatial location of samples. Another presented application is the reconstruction of potential migration routes between the investigated sites. Then spatial models are built using the presented modeling approaches. Univariate outliers are determined and used to predict locations based on the generated models. These are cross-referenced with the recorded origins. Finally, maps of the isotope distribution in the investigated regions are presented. The described methods and demonstrated analyses show that domain knowledge can be used to formulate constraints that inform the data analysis process to yield valid models from relatively small data sets and support domain scientists in their analyses.Datengetriebene Forschungsansätze werden für eine wachsende Anzahl von wissenschaftlichen Disziplinen immer wichtiger. Obwohl ein datengetriebener Forschungsansatz bessere Ergebnisse erzielen kann, kann es sehr teuer sein die notwendigen Daten zu gewinnen. Dies hat häufig zur Folge, dass kleine und komplexe Datensätze entstehen, bei denen es nicht möglich ist sich auf die Menge der Datenpunkte zu verlassen um Probleme bei der Analyse auszugleichen. Um diesem Problem zu begegnen müssen andere Informationsquellen verwendet werden. Fachwissen als eine besonders zuverlässige Quelle solcher Informationen kann herangezogen werden, um die datengetriebenen Analysemethoden zu unterstützen. Dieses Fachwissen wird ausgedrückt als Constraints (Nebenbedingungen) der möglichen Lösungen, die die vorgestellten Methoden benutzen können um ihre Analyse zu steuern. Der Fokus liegt dabei auf räumlichen Constraints als eine besonders häufige Art von Constraints, aber die vorgeschlagenen Methoden sind allgemein genug um auf andere Arte von Constraints angewendet zu werden. Es werden neue Methoden diskutiert, die Fachwissen für datengetriebene wissenschaftliche Anwendungen verwenden. Diese Methoden haben Anwendungen auf Feature-Evaluation, die Reparatur von Bewegungsdatenbanken und auf Gaussian-Mixture-Modelle von räumlichen Daten. Die erste Anwendung betrifft Feature-Evaluation. Die vorgestellte Methode erhält zwei Repräsentationen der selben Daten: eine als Zielrepräsentation und eine zur Untersuchung. Sie berechnet einen Wert, der aussagt, wie einig sich die beiden Repräsentationen sind. Eine vorgestellte Anwendung benutzt diese Technik um eine Referenzmenge von Attributen mit verschiedenen Untermengen zu vergleichen, um die Wichtigkeit und Relevanz einzelner Attribute zu bestimmen. Eine zweite Technik analysiert die Einhaltung von Constraints in Bewegungsdaten. Das präsentierte Framework erlaubt dem Benutzer Constraints zu definieren und mögliche Aktionen zur Veränderung der Daten anzuwenden. Die präsentierte Methode benutzt diese Eingaben dann um eine neue Variante der Daten zu erstellen, die die Constraints erfüllt ohne die Datenbank mehr als notwendig zu verändern. Zwei Erweiterungen dieser Grundidee werden vorgestellt: eine Erweiterung auf stetige Kostenfunktionen, die minimiert werden, und eine Erweiterung auf Bedingungen, die mehr als ein bewegliches Objekt betreffen. Ein weiteres behandeltes Anwendungsgebiet ist die Modellierung von multivariaten Messungen, die an räumlich verteilten Orten gemessen wurden. Die räumliche Information, die zusammen mit diesen Daten erhoben wurde, kann als Grundlage genutzt werden um Constraints zu formulieren. Mehrere Ansätze zum Erstellen von Modellen auf dieser Art von Daten werden vorgestellt, die räumliche Constraints einhalten. Der erste dieser Ansätze ist ein interaktives Werkzeug, das Fachwissenschaftlern dabei hilft, Modelle der Daten zu erstellen, die mit ihrem Wissen über die Daten übereinstimmen. Der zweite ist eine Monte-Carlo-Simulation, die eine große Menge möglicher Modelle erstellt, testet ob sie mit den Constraints übereinstimmen und das beste Modell zurückgeben. Zwei letzte Ansätze basieren auf dem EM-Algorithmus und benutzen verschiedene Arten diese Information in das Modell zu integrieren. Am Ende werden zwei Anwendungen der gerade vorgestellten Modelle vorgestellt. Die erste ist die Vorhersage der Herkunft von Proben und die andere ist die grafische Darstellung der erstellten Modelle auf einer Karte. Diese Werkzeuge können von Fachwissenschaftlern benutzt werden um ihre bewährten Methoden zu unterstützen. Die entwickelten Methoden werden auf einen realen Datensatz angewendet, der von dem archäo-biologischen Forschungsprojekt FOR 1670 (Transalpine Mobilität und Kulturtransfer der Deutschen Forschungsgemeinschaft erhoben worden ist. Der Datensatz enthält Messungen von Isotopenverhältnissen von Proben, die in archäologischen Fundstellen in den zentraleuropäischen Alpen gefunden wurden. Die präsentierten Datenanalyse-Methoden werden verwendet um diese Daten zu analysieren und relevante Forschungsfragen zu klären. In einer ersten Anwendung werden die Attribute der Messungen analysiert um ihre relative Wichtigkeit und ihre Fähigkeit zu bewerten, die räumliche Herkunft der Proben vorherzusagen. Eine weitere vorgestellte Anwendung ist die Wiederherstellung von möglichen Migrationsrouten zwischen den untersuchten Fundstellen. Danach werden räumliche Modelle der Daten unter Verwendung der vorgestellten Methoden erstellt. Univariate Outlier werden bestimmt und ihre möglich Herkunft basierend auf der erstellten Karte wird bestimmt. Die vorhergesagte Herkunft wird mit der tatsächlichen Fundstelle verglichen. Zuletzt werden Karten der Isotopenverteilung der untersuchten Region vorgestellt. Die beschriebenen Methoden und vorgestellten Analysen zeigen, dass Fachwissen verwendet werden kann um Constraints zu formulieren, die den Datenanalyseprozess unterstützen, um gültige Modelle aus relativ kleinen Datensätzen zu erstellen und Fachwissenschaftler bei ihren Analysen zu unterstützen

‘And They Read in That Night Books of History’: Consuming, Discussing, and Producing Texts about the Past in al-Ghawrī’s Majālis as Social Practices

Author: Mauder Christian
Publication venue: 'Brill'
Publication date: 01/01/2021
Field of study

publishedVersio

University of Bergen

NORA - Norwegian Open Research Archives

Capturing all relevant scales of biosphere-atmosphere exchange - the enigmatic energy balance closure problem

Author: Mauder M.
Publication venue
Publication date: 28/06/2012
Field of study

KITopen

Documentation and Instruction Manual of the Eddy Covariance Software Package TK2

Author: Foken Thomas
Mauder Matthias
Publication venue
Publication date: 01/01/2004
Field of study

EPub Bayreuth

Documentation and Instruction Manual of the Eddy-Covariance Software Package TK3

Author: Foken Thomas
Mauder Matthias
Publication venue
Publication date: 01/01/2011
Field of study

EPub Bayreuth

Documentation and Instruction Manual of the Eddy-Covariance Software Package TK3 (update)

Author: Foken Thomas
Mauder Matthias
Publication venue: Eigenverlag
Publication date: 15/07/2015
Field of study

EPub Bayreuth

Towards a consistent eddy-covariance processing: An intercomparison of EddyPro and TK3

Author: Fratini G.
Mauder M.
Publication venue: Copernicus Publications
Publication date: 01/07/2014
Field of study

A comparison of two popular eddy-covariance software packages is presented, namely, EddyPro and TK3. Two approximately 1-month long test data sets were processed, representing typical instrumental setups (i.e., CSAT3/LI-7500 above grassland and Solent R3/LI-6262 above a forest). The resulting fluxes and quality flags were compared. Achieving a satisfying agreement and understanding residual discrepancies required several iterations and interventions of different nature, spanning from simple software reconfiguration to actual code manipulations. In this paper, we document our comparison exercise and show that the two software packages can provide utterly satisfying agreement when properly configured. Our main aim, however, is to stress the complexity of performing a rigorous comparison of eddy-covariance software. We show that discriminating actual discrepancies in the results from inconsistencies in the software configuration requires deep knowledge of both software packages and of the eddy-covariance method. In some instances, it may be even beyond the possibility of the investigator who does not have access to and full knowledge of the source code. Being the developers of EddyPro and TK3, we could discuss the comparison at all levels of details and this proved necessary to achieve a full understanding. As a result, we suggest that researchers are more likely to get comparable results when using EddyPro (v5.1.1) and TK3 (v3.11) – at least with the setting presented in this paper – than they are when using any other pair of EC software which did not undergo a similar cross-validation. As a further consequence, we also suggest that, to the aim of assuring consistency and comparability of centralized flux databases, and for a confident use of eddy fluxes in synthesis studies on the regional, continental and global scale, researchers only rely on software that have been extensively validated in documented intercomparisons

KITopen

Directory of Open Access Journals

Vliv chemického složení oceli na numerickou simulaci plynulého odlévání sochorů

Author: Kavička František
Mauder Tomáš
Štětina Josef
Publication venue: Vysoká škola báňská - Technická univerzita Ostrava
Publication date: 01/01/2010
Field of study

The chemical composition of steels has significant influence on the actual concasting process, and on the accuracy of its numerical simulation and optimization. The chemical composition of steel affects the thermophysical properties (heat conductivity, specific heat capacity and density in the solid and liquid states) often requires more time than the actual numerical calculation of the temperature fields of a continuously cast steel billet. Therefore, an analysis study of these thermophysical properties was conducted. The order of importance within the actual process and the accuracy of simulation were also determined. The order of significance of the chemical composition on thermophysical properties was determined with respect to the metallurgical length. The analysis was performed by means of a so-called calculation experiment, i.e. by means of the original numerical concasting model developed by the authors of this paper. It is convenient to conduct such an analysis in order to facilitate the simulation of each individual case of concasting, thus enhancing the process of optimization.Chemické složení ocelí má významný vliv na reálný proces plynulého odlévání a na přesnost jeho numerické simulace a optimalizace. Chemické složení oceli ovlivňuje termofyzikální vlastnosti (tepelné vodivosti, měrné tepelné kapacity a hustoty v tuhém i tekutém stavu) a jejich prostřednictvím ovlivňuje výpočet teplotního pole plynule odlévaných ocelových sochorů. Proto byla provedena analýza studie těchto termofyzikálních vlastností. Vliv významu chemického složení na termofyzikální vlastnosti byla určena s ohledem na metalurgickou délku. Analýza byla provedena pomocí takzvaných výpočetních experimentů, tj. pomocí originálního numerického modelu teplotního pole, který byl vyvinut autory tohoto příspěvku. Tato analýza usnadní a tím zlepší proces optimalizace plynulého odlévání oceli

DSpace at VSB Technical University of Ostrava

Theoretical considerations on the energy balance closure

Author: de Roo F.
Mauder M.
Publication venue
Publication date: 04/02/2014
Field of study

KITopen

The influence of idealized surface heterogeneity on virtual turbulent flux measurements

Author: De Roo Frederik
Mauder Matthias
Publication venue: European Geosciences Union
Publication date: 30/04/2018
Field of study

The imbalance of the surface energy budget in eddy-covariance measurements is still an unsolved problem. A possible cause is the presence of land surface heterogeneity, which affects the boundary-layer turbulence. To investigate the impact of surface variables on the partitioning of the energy budget of flux measurements in the surface layer under convective conditions, we set up a systematic parameter study by means of large-eddy simulation. For the study we use a virtual control volume approach, which allows the determination of advection by the mean flow, flux-divergence and storage terms of the energy budget at the virtual measurement site, in addition to the standard turbulent flux. We focus on the heterogeneity of the surface fluxes and keep the topography flat. The surface fluxes vary locally in intensity and these patches have different length scales. Intensity and length scales can vary for the two horizontal dimensions but follow an idealized chessboard pattern. Our main focus lies on surface heterogeneity of the kilometer scale, and one order of magnitude smaller. For these two length scales, we investigate the average response of the fluxes at a number of virtual towers, when varying the heterogeneity length within the length scale and when varying the contrast between the different patches. For each simulation, virtual measurement towers were positioned at functionally different positions (e.g., downdraft region, updraft region, at border between domains, etc.). As the storage term is always small, the non-closure is given by the sum of the advection by the mean flow and the flux-divergence. Remarkably, the missing flux can be described by either the advection by the mean flow or the flux-divergence separately, because the latter two have a high correlation with each other. For kilometer scale heterogeneity, we notice a clear dependence of the updrafts and downdrafts on the surface heterogeneity and likewise we also see a dependence of the energy partitioning on the tower location. For the hectometer scale, we do not notice such a clear dependence. Finally, we seek correlators for the energy balance ratio in the simulations. The correlation with the friction velocity is less pronounced than previously found, but this is likely due to our concentration on effectively strongly to freely convective conditions

KITopen