27 research outputs found

    The transitional-probability Markov chain versus traditional indicator methods for modeling the geotechnical categories in a test site.

    Get PDF
    Das Ziel der vorliegenden Arbeit war die Erstellung eines dreidimensionalen Untergrundmodells der Region Göttingen basierend auf einer geotechnischen Klassifikation der unkosolidierten Sedimente. Die untersuchten Materialen reichen von Lockersedimenten bis hin zu Festgesteinen, werden jedoch in der vorliegenden Arbeit als Boden, Bodenklassen bzw. Bodenkategorien bezeichnet. Diese Studie evaluiert verschiedene Möglichkeiten durch geostatistische Methoden und Simulationen heterogene UntergrĂŒnde zu erfassen. Derartige Modellierungen stellen ein fundamentales Hilfswerkzeug u.a. in der Geotechnik, im Bergbau, der Ölprospektion sowie in der Hydrogeologie dar. Eine detaillierte Modellierung der benötigten kontinuierlichen Parameter wie z. B. der PorositĂ€t, der PermeabilitĂ€t oder hydraulischen LeitfĂ€higkeit des Untergrundes setzt eine exakte Bestimmung der Grenzen von Fazies- und Bodenkategorien voraus. Der Fokus dieser Arbeit liegt auf der dreidimensionalen Modellierung von Lockergesteinen und deren Klassifikation basierend auf entsprechend geostatistisch ermittelten Kennwerten. Als Methoden wurden konventionelle, pixelbasierende sowie ĂŒbergangswahrscheinlichkeitsbasierende Markov-Ketten Modelle verwendet. Nach einer generellen statistischen Auswertung der Parameter wird das Vorhandensein bzw. Fehlen einer Bodenkategorie entlang der Bohrlöcher durch Indikatorparameter beschrieben. Der Indikator einer Kategorie eines Probepunkts ist eins wenn die Kategorie vorhanden ist bzw. null wenn sie nicht vorhanden ist. Zwischenstadien können ebenfalls definiert werden. Beispielsweise wird ein Wert von 0.5 definiert falls zwei Kategorien vorhanden sind, der genauen Anteil jedoch nicht nĂ€her bekannt ist. Um die stationĂ€ren Eigenschaften der Indikatorvariablen zu verbessern, werden die initialen Koordinaten in ein neues System, proportional zur Ober- bzw. Unterseite der entsprechenden Modellschicht, transformiert. Im neuen Koordinatenraum werden die entsprechenden Indikatorvariogramme fĂŒr jede Kategorie fĂŒr verschiedene Raumrichtungen berechnet. Semi-Variogramme werden in dieser Arbeit, zur besseren Übersicht, ebenfalls als Variogramme bezeichnet. IV Durch ein Indikatorkriging wird die Wahrscheinlichkeit jeder Kategorie an einem Modellknoten berechnet. Basierend auf den berechneten Wahrscheinlichkeiten fĂŒr die Existenz einer Modellkategorie im vorherigen Schritt wird die wahrscheinlichste Kategorie dem Knoten zugeordnet. Die verwendeten Indikator-Variogramm Modelle und Indikatorkriging Parameter wurden validiert und optimiert. Die Reduktion der Modellknoten und die Auswirkung auf die PrĂ€zision des Modells wurden ebenfalls untersucht. Um kleinskalige Variationen der Kategorien auflösen zu können, wurden die entwickelten Methoden angewendet und verglichen. Als Simulationsmethoden wurden "Sequential Indicator Simulation" (SISIM) und der "Transition Probability Markov Chain" (TP/MC) verwendet. Die durchgefĂŒhrten Studien zeigen, dass die TP/MC Methode generell gute Ergebnisse liefert, insbesondere im Vergleich zur SISIM Methode. Vergleichend werden alternative Methoden fĂŒr Ă€hnlichen Fragestellungen evaluiert und deren Ineffizienz aufgezeigt. Eine Verbesserung der TP/MC Methoden wird ebenfalls beschrieben und mit Ergebnissen belegt, sowie weitere VorschlĂ€ge zur Modifikation der Methoden gegeben. Basierend auf den Ergebnissen wird zur Anwendung der Methode fĂŒr Ă€hnliche Fragestellungen geraten. HierfĂŒr werden Simulationsauswahl, Tests und Bewertungsysteme vorgeschlagen sowie weitere Studienschwerpunkte beleuchtet. Eine computergestĂŒtzte Nutzung des Verfahrens, die alle Simulationsschritte umfasst, könnte zukĂŒnftig entwickelt werden um die Effizienz zu erhöhen. Die Ergebnisse dieser Studie und nachfolgende Untersuchungen könnten fĂŒr eine Vielzahl von Fragestellungen im Bergbau, der Erdölindustrie, Geotechnik und Hydrogeologie von Bedeutung sein.Having a plenty of geotechnical records and measurements in Göttingen area, a subsurface three-dimensional model of the unconsolidated sediment classes was required. To avoid the repetition of the long expressions, from this point on, these unconsolidated materials which vary from the loose sediments to the hard rocks has been termed as “soil”, “category”, “soil class” or “soil category”. These sediments which are intermediate between the hard bed-rock and loose sediments (soils) were categorized based on the geotechnical norms of the DIN 18196. In this study, the aim was to evaluate the capabilities of the application of geostatistical estimation and simulation methods in modeling the subsurface heterogeneities, especially about the geotechnical soil classes. Such a heterogeneity modeling is a crucial step in a variety of applications such as geotechnics, mining, petroleum engineering, hydrogeology, and so on. For an accurate modeling of the essential continuous parameters, such as the ore grades, porosity, permeability, and hydraulic conductivity of a porous medium, the precise delineation of the facies or soil category boundaries prior to any modeling step is necessary. The focus of this study is on a three-dimensional modeling and delineation of the unconsolidated materials of the subsurface using the geostatistical methods. The applied geostatistical methods here consisted of the pixel-based conventional and transition-probability Markov chain-based geostatistical methods. After a general statistical evaluation of different parameters, the presence and absence of each category along the sampling boreholes was coded by new parameters called indicators. The indicator of a category in a sampling point is one (1) when the category exists and zero (0) when it is absent. Some intermediate states can also be found. For instance, the indicator of a two categories can be assigned to 0.5 when both the categories probably exist at that location but it is unsure which one exactly presents at that location. Moreover, to increase the stationarity characteristic of the indicator variables, the initial coordinates were transformed into a new system proportional to the top and bottom of the modeled layer as a first modeling step. In the new space, to conduct the conventional geostatistical modeling, the indicator variograms were calculated and modeled for each category in a variety of directions. In this text, for easier reference to the semi-variograms, the term variogram has been applied instead. II Using the indicator kriging, the probability of the occurrence of each category at each modeling node was estimated. Based on the estimated probabilities of the existence of each soil category from the previous stage, the most probable category was assigned to each modeling point then. Moreover, the employed indicator variogram models and indicator kriging estimation parameters were validated and improved. The application of a less number of samples were also tested and suggested for similar cases with a comparable precision in the results. To better reflect the fine variations of the categories, the geostatistical simulation methods were applied, evaluated, and compared together. The employed simulation methods consisted of the sequential indicator simulation (SISIM) and the transition probability Markov chain (TP/MC). The conducted study here suggested that the TP/MC method could generate satisfactory results especially compared to those of the SISIM method. Some reasons were also brought and discussed for the inefficiency of the other facies modeling alternatives for this application (and similar cases). Some attempts for improving the TP/MC method were also conducted and a number of results and suggestions for further researches were summarized here. Based on the achieved results, the application of the TP/MC methods was advised for the similar problems. Besides, some simulation selection, tests, and assessment frameworks were proposed for analogous applications. In addition, some instructions for future studies were made. The proposed framework and possibly the improved version of it could be further completed by creating a guided computer code that would contain all of the proposed steps. The results of this study and probably its follow-up surveys could be of an essential importance in a variety of important applications such as geotechnics, hydrogeology, mining, and hydrocarbon reservoirs

    Outlier Identification in Spatio-Temporal Processes

    Full text link
    This dissertation answers some of the statistical challenges arising in spatio-temporal data from Internet traffic, electricity grids and climate models. It begins with methodological contributions to the problem of anomaly detection in communication networks. Using electricity consumption patterns for University of Michigan campus, the well known spatial prediction method kriging has been adapted for identification of false data injections into the system. Events like Distributed Denial of Service (DDoS), Botnet/Malware attacks, Port Scanning etc. call for methods which can identify unusual activity in Internet traffic patterns. Storing information on the entire network though feasible cannot be done at the time scale at which data arrives. In this work, hashing techniques which can produce summary statistics for the network have been used. The hashed data so obtained indeed preserves the heavy tailed nature of traffic payloads, thereby providing a platform for the application of extreme value theory (EVT) to identify heavy hitters in volumetric attacks. These methods based on EVT require the estimation of the tail index of a heavy tailed distribution. The traditional estimators (Hill et al. (1975)) for the tail index tend to be biased in the presence of outliers. To circumvent this issue, a trimmed version of the classic Hill estimator has been proposed and studied from a theoretical perspective. For the Pareto domain of attraction, the optimality and asymptotic normality of the estimator has been established. Additionally, a data driven strategy to detect the number of extreme outliers in heavy tailed data has also been presented. The dissertation concludes with the statistical formulation of m-year return levels of extreme climatic events (heat/cold waves). The Generalized Pareto distribution (GPD) serves as good fit for modeling peaks over threshold of a distribution. Allowing the parameters of the GPD to vary as a function of covariates such as time of the year, El-Nino and location in the US, extremes of the areal impact of heat waves have been well modeled and inferred.PHDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145789/1/shrijita_1.pd

    IDEAS-1997-2021-Final-Programs

    Get PDF
    This document records the final program for each of the 26 meetings of the International Database and Engineering Application Symposium from 1997 through 2021. These meetings were organized in various locations on three continents. Most of the papers published during these years are in the digital libraries of IEEE(1997-2007) or ACM(2008-2021)

    Causality and independence in systems of equations

    Get PDF
    The technique of causal ordering is used to study causal and probabilistic aspects implied by model equations. Causal discovery algorithms are used to learn causal and dependence structure from data. In this thesis, 'Causality and independence in systems of equations', we explore the relationship between causal ordering and the output of causal discovery algorithms. By combining these techniques, we bridge the gap between the world of dynamical systems at equilibrium and literature regarding causal methods for static systems. In a nutshell, this gives new insights about models with feedback and an improved understanding of observed phenomena in certain (biological) systems. Based on our ideas, we outline a novel approach towards causal discovery for dynamical systems at equilibrium. This work was inspired by a desire to understand why the output of causal discovery algorithms sometimes appears to be at odds with expert knowledge. We were particularly interested in explaining apparent reversals of causal directions when causal discovery methods are applied to protein expression data. We propose the presence of a perfectly adapting feedback mechanism or unknown measurement error as possible explanations for these apparent reversals. We develop conditions for the detection of perfect adaptation from model equations or from data and background knowledge. This can be used to reason about the existence of feedback mechanisms using only partial observations of a system, resulting in additional criteria for data-driven selection of causal models. This line of research was made possible by novel interpretations and extensions of the causal ordering algorithm. Additionally, we challenge a key assumption in many causal discovery algorithms; that the underlying system can be modelled by the well-known class of structural causal models. To overcome the limitations of these models in capturing the causal semantics of dynamical systems at equilibrium, we propose a generalization that we call causal constraints models. Looking beyond standard causal modelling frameworks allows us to further explore the relationship between dynamical models at equilibrium and methods for causal discovery on equilibrium data

    Hypergraph Partitioning in the Cloud

    Get PDF
    The thesis investigates the partitioning and load balancing problem which has many applications in High Performance Computing (HPC). The application to be partitioned is described with a graph or hypergraph. The latter is of greater interest as hypergraphs, compared to graphs, have a more general structure and can be used to model more complex relationships between groups of objects such as non-symmetric dependencies. Optimal graph and hypergraph partitioning is known to be NP-Hard but good polynomial time heuristic algorithms have been proposed. In this thesis, we propose two multi-level hypergraph partitioning algorithms. The algorithms are based on rough set clustering techniques. The first algorithm, which is a serial algorithm, obtains high quality partitionings and improves the partitioning cut by up to 71\% compared to the state-of-the-art serial hypergraph partitioning algorithms. Furthermore, the capacity of serial algorithms is limited due to the rapid growth of problem sizes of distributed applications. Consequently, we also propose a parallel hypergraph partitioning algorithm. Considering the generality of the hypergraph model, designing a parallel algorithm is difficult and the available parallel hypergraph algorithms offer less scalability compared to their graph counterparts. The issue is twofold: the parallel algorithm and the complexity of the hypergraph structure. Our parallel algorithm provides a trade-off between global and local vertex clustering decisions. By employing novel techniques and approaches, our algorithm achieves better scalability than the state-of-the-art parallel hypergraph partitioner in the Zoltan tool on a set of benchmarks, especially ones with irregular structure. Furthermore, recent advances in cloud computing and the services they provide have led to a trend in moving HPC and large scale distributed applications into the cloud. Despite its advantages, some aspects of the cloud, such as limited network resources, present a challenge to running communication-intensive applications and make them non-scalable in the cloud. While hypergraph partitioning is proposed as a solution for decreasing the communication overhead within parallel distributed applications, it can also offer advantages for running these applications in the cloud. The partitioning is usually done as a pre-processing step before running the parallel application. As parallel hypergraph partitioning itself is a communication-intensive operation, running it in the cloud is hard and suffers from poor scalability. The thesis also investigates the scalability of parallel hypergraph partitioning algorithms in the cloud, the challenges they present, and proposes solutions to improve the cost/performance ratio for running the partitioning problem in the cloud. Our algorithms are implemented as a new hypergraph partitioning package within Zoltan. It is an open source Linux-based toolkit for parallel partitioning, load balancing and data-management designed at Sandia National Labs. The algorithms are known as FEHG and PFEHG algorithms

    Eight Biennial Report : April 2005 – March 2007

    No full text

    Measurement of the Triple-Differential Cross-Section for the Production of Multijet Events using 139 fb^{-1} of Proton-Proton Collision Data at \sqrt{s} = 13 TeV with the ATLAS Detector to Disentangle Quarks and Gluons at the Large Hadron Collider

    Get PDF
    At hadron-hadron colliders, it is almost impossible to obtain pure samples in either quark- or gluon-initialized hadronic showers as one always deals with a mixture of particle jets. The analysis presented in this dissertation aims to break the aforementioned degeneracy by extracting the underlying fractions of (light) quarks and gluons through a measurement of the relative production rates of multijet events. A measurement of the triple-differential multijet cross section at a centre-of-mass energy of 13 TeV using an integrated luminosity of 139 fb −1 of data collected with the ATLAS detector in proton-proton collisions at the Large Hadron Collider (LHC) is presented. The cross section is measured as a function of the transverse momentum p T , two categories of pseudorapidity η rel defined by the relative orientation between the jets, as well as a Jet Sub-Structure (JSS) observable O JSS , sensitive to the quark- or gluon-like nature of the hadronic shower of the two leading-p T jets with 250 GeV < p T < 4.5 TeV and |η| < 2.1 in the event. The JSS variables, which have been studied within the context of this thesis, can broadly be divided into two categories: one set of JSS observables is constructed by iteratively declustering and counting the jet’s charged constituents; the second set is based on the output predicted by Deep Neural Networks (DNNs) derived from the “deep sets” paradigm to implement permutation invariant functions over sets, which are trained to discriminate between quark- and gluon- initialized showers in a supervised fashion. All JSS observables are measured based on Inner Detector tracks with p T > 500 MeV and |η| < 2.5 to maintain strong correlations between detector- and particle-level objects. The reconstructed spectra are fully corrected for acceptance and detector effects, and the unfolded cross section is compared to various state-of-the-art parton shower Monte Carlo models. Several sources of systematic and statistical uncertainties are taken into account that are fully propagated through the entire unfolding procedure onto the final cross section. The total uncertainty on the cross section varies between 5 % and 20 % depending on the region of phase space. The unfolded multi-differential cross sections are used to extract the underlying fractions and probability distributions of quark- and gluon-initialized jets in a solely data-driven, model- independent manner using a statistical demixing procedure (“jet topics”), which has originally been developed as a tool for extracting emergent themes in an extensive corpus of text-based documents. The obtained fractions are model-independent and are based on an operational definition of quark and gluon jets that does not seek to assign a binary label on a jet-to-jet basis, but rather identifies quark- and gluon-related features on the level of individual distributions, avoiding common theoretical and conceptional pitfalls regarding the definition of quark and gluon jets. The total fraction of gluon-initialized jets in the multijet sample is (IRC-safely) measured to be 60.5 ± 0.4(Stat) ⊕ 2.4(Syst) % and 52.3 ± 0.4(Stat) ⊕ 2.6(Syst) % in central and forward region, respectively. Furthermore, the gluon fractions are extracted in several exclusive regions of transverse momentum

    Greater Cairo Earthquake Loss Assessment and its Implications on the Egyptian Economy

    Get PDF
    This study develops a loss estimation model for assessing the seismic economic implications resulting from damage to Greater Cairo’s building stock, as well as natural gas and electricity lifelines. The model estimates both the direct and indirect economic losses resulting from seismic occurrences. The developed model is composed of three modules. The first of which is the ground shaking module which estimates ground-motion throughout Greater Cairo. This is done through investigating the seismicity of Egypt and its surroundings, in order to develop recurrence relationships. Furthermore, through the use of geological and geotechnical data, seismic geological classification is conducted. This investigation is used along with three attenuation relationships to estimate ground-motion throughout Greater Cairo. The second module evaluates the damage to the building stock as well as natural gas and electricity lifelines. This is done through developing a building inventory database, and classifying structures in this database according to various classes. Moreover, data regarding components in the natural gas and electricity networks is collected, and through the use of minimum cut sets the networks’ behaviour is assessed. Finally, through the use of fragility curves the vulnerability of structures and components is evaluated. The final module estimates the direct economic losses associated with repairing damaged components. Furthermore, the indirect costs associated with business interruption resulting from disruption to elements in the built environment are also estimated. This study will pave the way for developing countries to recognize the impacts of earthquakes on their economies. Moreover, it will be useful for countries that exhibit a centralized economy that is dependant on major cities. Furthermore it provides a step forward in earthquake loss estimation to model multiple lifelines, rather than past research which modelled each lifeline separately
    corecore