1,470,069 research outputs found
Recommended from our members
Iterative seismic data interpolation using plane-wave shaping
textGeophysical applications often require finding an appropriate solution to an ill-posed inverse problem. An example application is interpolating irregular or sparse data to a regular grid. This data regularization problem must be addressed appropriately before many data processing techniques can begin. In this thesis, I investigate plane-wave shaping in two and three dimensions as a data regularization algorithm, which can be used for the interpolation of seismic data and images. I use plane-wave shaping to interpolate several synthetic and field datasets and test its accuracy in image reconstruction. Because plane-wave shaping adheres to the direction of the local slopes of an image, the image-guided interpolation scheme attempts to preserve information of geologic structures. I apply several alternative interpolation schemes - formulated as an inverse problem with a convolutional operator to constrain the model space - namely: plane-wave destruction, plane-wave construction, and prediction-error filters. Investigating their iterative convergence rates, I find that plane-wave shaping converges to a solution in fewer iterations than the alternative techniques. I find that the only required parameter for this method, the smoothing radius, is best chosen to be approximately the same size as the holes for missing-data problems. The optional parameter for edge padding is best selected as approximately half of the smoothing radius. Applications of this research project include potential applications in well-log interpolation, seismic tomography, and 5-D seismic data interpolation.Geological Science
Recommended from our members
Topological tools for understanding complex systems
The behavior of complex systems is often influenced by their structure. In mathematics, the field of algebraic topology has been especially useful for characterizing mathematical structures. Topological data analysis (TDA) is a growing field in which methods from algebraic topology are applied to studying the structure of data. TDA has been used in a variety of applications, including biological data, granular materials, and demography. Social interactions are heavily informed by space and have complex structure due to patterns in the way humans arrange themselves geographically. Consequently, social applications can benefit from the application of TDA.In this dissertation, I develop topological methods for studying spatial networks and apply them to a wide variety of data sets. In particular, I study methods for building topological spaces (specifically, simplicial complexes) based on data. I present two novel simplicial-complex constructions, the adjacency complex and the level-set complex, for spatial data. I apply both constructions to random networks, cities, voting, and scientific images, gaining insights into the structure of these systems. I also propose a novel simplicial complex construction for studying patterns of neighborhood formation based on combining demographic and spatial data. I present case studies in neighborhood segregation for two U.S. cities. In addition to my topological research, I discuss two projects in the study of social systems using methods from network analysis. I present an extension to multilayer networks of the Hegselmann--Krause model for opinion dynamics and discuss preliminary findings on its convergence properties. I also present a framework for estimating homelessness underreporting in California Local Education agencies (LEAs)
Gay Data
Since its launch in 2009, the geosocial networking service Grindr has become an increasingly mainstream and prominent part of gay culture, both in the United States and globally. Mobile applications like Grindr give users the ability to quickly and easily share information about themselves (in the form of text, numbers, and pictures), and connect with each other in real time on the basis of geographic proximity. I argue that these services constitute an important site for examining how bodies, identities, and communities are translated into data, as well as how data becomes a tool for forming, understanding, and managing personal relationships. Throughout this work, I articulate a model of networked interactivity that conceptualizes self-expression as an act determined by three sometimes overlapping, sometimes conflicting sets of affordances and constraints: (1) technocommercial structures of software and business; (2) cultural and subcultural norms, mores, histories, and standards of acceptable and expected conduct; and (3) sociopolitical tendencies that appear to be (but in fact are not) fixed technocommercial structures. In these discussions, Grindr serves both as a model of processes that apply to social networking more generally, as well as a particular study into how networked interactivity is complicated by the histories and particularities of Western gay culture. Over the course of this dissertation, I suggest ways in which users, policymakers, and developers can productively recognize the liveness, vitality, and durability of personal information in the design, implementation, and use of gay-targeted social networking services. Specifically, I argue that through a focus on (1) open-ended structures of interface design, (2) clear and transparent articulations of service policies, and the rationales behind them, and (3) approaches to user information that promote data sovereignty, designers, developers, and advocates can work to make social networking services, including Grindr, safer and more representative of their users throughout their data’s lifecycle
Accelerated Structure Prediction of Halide Perovskites with Machine Learning
Halide perovskites are a promising materials class for solar energy production. The photovoltaic efficiency of halide perovskites is remarkable but their toxicity and instability have prevented commercialization. These problems could be addressed through compositional engineering in the halide perovskite materials space but the number of different materials that would need to be considered is too large for conventional experimental and computational methods. Machine learning can be used to accelerate computations to the level that is required for this task.
In this thesis I present a machine learning approach for compositional exploration and apply it to the composite halide perovskite CsPb(Cl, Br)3 . I used data from density functional theory (DFT) calculations to train a machine learning model based on kernel ridge regression with the many-body tensor representation for the atomic structure. The trained model was then applied to predict the decomposition energies of CsPb(Cl, Br)3 materials from their atomic structure. The main part of my work was to derive and implement gradients for the machine learning model to facilitate efficient structure optimization.
I tested the machine learning model by comparing its decomposition energy predictions to DFT calculations. The prediction accuracy was under 0.12 meV per atom and the prediction time was five orders of magnitude faster than DFT. I also used the model to optimize CsPb(Cl, Br)3 structures. Reasonable structures were obtained, but the accuracy was qualitative. Analysis on the results of the structural optimizations exposed shortcomings in the approach, providing important insight for future improvements. Overall, this project makes a successful step towards the discovery of novel perovskite materials with designer properties for future solar cell applications
Reasoning with Uncertainty in Deep Learning for Safer Medical Image Computing
Deep learning is now ubiquitous in the research field of medical image computing. As such technologies progress towards clinical translation, the question of safety becomes critical. Once deployed, machine learning systems unavoidably face situations where the correct decision or prediction is ambiguous. However, the current methods disproportionately rely on deterministic algorithms, lacking a mechanism to represent and manipulate uncertainty. In safety-critical applications such as medical imaging, reasoning under uncertainty is crucial for developing a reliable decision making system. Probabilistic machine learning provides a natural framework to quantify the degree of uncertainty over different variables of interest, be it the prediction, the model parameters and structures, or the underlying data (images and labels). Probability distributions are used to represent all the uncertain unobserved quantities in a model and how they relate to the data, and probability theory is used as a language to compute and manipulate these distributions. In this thesis, we explore probabilistic modelling as a framework to integrate uncertainty information into deep learning models, and demonstrate its utility in various high-dimensional medical imaging applications. In the process, we make several fundamental enhancements to current methods. We categorise our contributions into three groups according to the types of uncertainties being modelled: (i) predictive; (ii) structural and (iii) human uncertainty. Firstly, we discuss the importance of quantifying predictive uncertainty and understanding its sources for developing a risk-averse and transparent medical image enhancement application. We demonstrate how a measure of predictive uncertainty can be used as a proxy for the predictive accuracy in the absence of ground-truths. Furthermore, assuming the structure of the model is flexible enough for the task, we introduce a way to decompose the predictive uncertainty into its orthogonal sources i.e. aleatoric and parameter uncertainty. We show the potential utility of such decoupling in providing a quantitative “explanations” into the model performance. Secondly, we introduce our recent attempts at learning model structures directly from data. One work proposes a method based on variational inference to learn a posterior distribution over connectivity structures within a neural network architecture for multi-task learning, and share some preliminary results in the MR-only radiotherapy planning application. Another work explores how the training algorithm of decision trees could be extended to grow the architecture of a neural network to adapt to the given availability of data and the complexity of the task. Lastly, we develop methods to model the “measurement noise” (e.g., biases and skill levels) of human annotators, and integrate this information into the learning process of the neural network classifier. In particular, we show that explicitly modelling the uncertainty involved in the annotation process not only leads to an improvement in robustness to label noise, but also yields useful insights into the patterns of errors that characterise individual experts
Index Strukturen fĂĽr Data Warehouse
0 Title and Table of Contents i
1\. Introduction 1
2\. State of the Art of Data Warehouse Research 5
3\. Data Storage and Index Structures 15
4\. Mixed Integer Problems for Finding Optimal Tree-based Index Structures 35
5\. Aggregated Data in Tree-Based Index Structures 43
6\. Performance Models for Tree-Based Index Structures 63
7\. Techniques for Comparing Index Structures 89
8\. Conclusion and Outlook 113
Bibliographie 116
Appendix 125This thesis investigates which index structures support query processing in
typical data warehouse environments most efficiently. Data warehouse
applications differ significantly from traditional transaction-oriented
operational applications. Therefore, the techniques applied in transaction-
oriented systems cannot be used in the context of data warehouses and new
techniques must be developed. The thesis shows that the time complexity for
the computation of optimal tree-based index structures prohibits its use in
real world applications. Therefore, we improve heuristic techniques (e. g.,
R*-tree) to process range queries on aggregated data more efficiently.
Experiments show the benefits of this approach for different kinds of typical
data warehouse queries. Performance models estimate the behavior of standard
index structures and the behavior of the extended index structures. We
introduce a new model that considers the distribution of data. We show
experimentally that the new model is more precise than other models known from
literature. Two techniques compare two tree-based index structures with two
bitmap indexing techniques. The performance of these index structures depends
on a set of different parameters. Our results show which index structure
performs most efficiently depending on the parameters.In dieser Arbeit wird untersucht, welche Indexstrukturen Anfragen in typischen
Data Warehouse Systemen effizient unterstĂĽtzen. Indexstrukturen, seit mehr als
zwanzig Jahren Forschungsgegenstand im Datenbankbereich, wurden in der
Vergangenheit fĂĽr transaktionsorientierte Systeme optimiert. Ein Kennzeichen
dieser Systeme ist die effiziente UnterstĂĽtzung von EinfĂĽge-, Ă„nderungs- und
Löschoperationen auf einzelnen Datensätzen. Typische Operationen in Data
Warehouse Systemen sind dagegen komplexe Anfragen auf groĂźen relativ
statischen Datenmengen. Aufgrund dieser veränderten Anforderungen müssen
Datenbankmanagementsysteme, die fĂĽr Data Warehouses eingesetzt werden, andere
Techniken nutzen, um komplexe Anfragen effizient zu unterstützen. Zunächst
wird ein Ansatz untersucht, der mit Hilfe eines gemischt ganzzahligen
Optimierungsproblems eine optimale Indexstruktur berechnet. Da die Kosten fĂĽr
die Berechnung dieser optimalen Indexstruktur mit der Anzahl der zu
indizierenden Datensätze exponentiell steigen, wird in anschließenden Teilen
der Arbeit heuristischen Ansätzen nachgegangen, die mit der Größe der zu
indizierenden Datensätze skalieren. Ein Ansatz erweitert auf Bäumen basierende
Indexstrukturen um aggregierte Daten in den inneren Knoten. Experimentell wird
gezeigt, daĂź mit Hilfe der materialisierten Zwischenergebnisse in den inneren
Knoten Bereichsanfragen auf aggregierten Daten wesentlich schneller bearbeitet
werden. Um das Leistungsverhalten von Indexstrukturen mit und ohne
materialisierte Zwischenergebnisse zu untersuchen, wird das PISA Modell
(Performance of Index Structures with and without Aggregated Data) entwickelt.
In diesem Modell wird die Verteilung der Daten und die Verteilung der Anfragen
berĂĽcksichtigt. Das PISA Modell wird an gleich-, schief- und normalverteilte
Datensätze angepaßt. Experimentell wird gezeigt, daß das PISA Modell mit einer
höheren Präzision als die bisher aus der Literatur bekannten Modelle arbeitet.
Die Leistung von Indexstrukturen hängt von unterschiedlichen Parametern ab. In
dieser Arbeit werden zwei Techniken vorgestellt, die abhängig von einer
bestimmten Menge von Parametern Indexstrukturen vergleichen. Mit Hilfe von
Klassifikationsbäumen wird z. B. gezeigt, daß die Blockgröße die relative
Leistung weniger beeinfluĂźt als andere Parameter. Ein weiteres Ergebnis ist,
daß Bitmap-Indexstrukturen von den Verbesserungen neuerer Sekundärspeicher
stärker profitieren als heute übliche auf Bäumen basierende Indexstrukturen.
Bitmap-Indexierungstechniken bieten noch ein groĂźes Potential fĂĽr weitere
Leistungssteigerungen im Datenbankbereich
Integer Sparse Distributed Memory and Modular Composite Representation
Challenging AI applications, such as cognitive architectures, natural language understanding, and visual object recognition share some basic operations including pattern recognition, sequence learning, clustering, and association of related data. Both the representations used and the structure of a system significantly influence which tasks and problems are most readily supported. A memory model and a representation that facilitate these basic tasks would greatly improve the performance of these challenging AI applications.Sparse Distributed Memory (SDM), based on large binary vectors, has several desirable properties: auto-associativity, content addressability, distributed storage, robustness over noisy inputs that would facilitate the implementation of challenging AI applications. Here I introduce two variations on the original SDM, the Extended SDM and the Integer SDM, that significantly improve these desirable properties, as well as a new form of reduced description representation named MCR.Extended SDM, which uses word vectors of larger size than address vectors, enhances its hetero-associativity, improving the storage of sequences of vectors, as well as of other data structures. A novel sequence learning mechanism is introduced, and several experiments demonstrate the capacity and sequence learning capability of this memory.Integer SDM uses modular integer vectors rather than binary vectors, improving the representation capabilities of the memory and its noise robustness. Several experiments show its capacity and noise robustness. Theoretical analyses of its capacity and fidelity are also presented.A reduced description represents a whole hierarchy using a single high-dimensional vector, which can recover individual items and directly be used for complex calculations and procedures, such as making analogies. Furthermore, the hierarchy can be reconstructed from the single vector. Modular Composite Representation (MCR), a new reduced description model for the representation used in challenging AI applications, provides an attractive tradeoff between expressiveness and simplicity of operations. A theoretical analysis of its noise robustness, several experiments, and comparisons with similar models are presented.My implementations of these memories include an object oriented version using a RAM cache, a version for distributed and multi-threading execution, and a GPU version for fast vector processing
UADB: Unsupervised Anomaly Detection Booster
Unsupervised Anomaly Detection (UAD) is a key data mining problem owing to
its wide real-world applications. Due to the complete absence of supervision
signals, UAD methods rely on implicit assumptions about anomalous patterns
(e.g., scattered/sparsely/densely clustered) to detect anomalies. However,
real-world data are complex and vary significantly across different domains. No
single assumption can describe such complexity and be valid in all scenarios.
This is also confirmed by recent research that shows no UAD method is
omnipotent. Based on above observations, instead of searching for a magic
universal winner assumption, we seek to design a general UAD Booster (UADB)
that empowers any UAD models with adaptability to different data. This is a
challenging task given the heterogeneous model structures and assumptions
adopted by existing UAD methods. To achieve this, we dive deep into the UAD
problem and find that compared to normal data, anomalies (i) lack clear
structure/pattern in feature space, thus (ii) harder to learn by model without
a suitable assumption, and finally, leads to (iii) high variance between
different learners. In light of these findings, we propose to (i) distill the
knowledge of the source UAD model to an imitation learner (booster) that holds
no data assumption, then (ii) exploit the variance between them to perform
automatic correction, and thus (iii) improve the booster over the original UAD
model. We use a neural network as the booster for its strong expressive power
as a universal approximator and ability to perform flexible post-hoc tuning.
Note that UADB is a model-agnostic framework that can enhance heterogeneous UAD
models in a unified way. Extensive experiments on over 80 tabular datasets
demonstrate the effectiveness of UADB
Surface Wave Analysis in Laterally Varying Media
This work studies the possibility of using surface wave analysis as a tool for a robust estimation of the S-wave velocity behaviour in laterally varying media. The surface wave method, in fact, can be effectively adopted for different purposes and at different scales, but I focused on the geo-engineering and geotechnical applications of surface wave analysis and also on the production of near-surface models for deep exploration: in both cases the aim is to retrieve the trend of the S-wave velocity in the first tens up to hundreds meters of depth of the subsoil. The surface wave method exploits the geometric dispersion proper of surface waves: in a non-homogeneous medium every frequency is characterized by a different phase velocity, as every frequency component travels through a portion of medium whose thickness is proportional to its wavelength. The curve associating every frequency component to its phase velocity is called dispersion curve, and it constitutes the experimental datum one uses for the solution of an inverse problem to estimate the behaviour of S-wave velocity in the subsurface. The inversion is performed by assuming a 1D forward modelling simulation and suffers from equivalence problems, leading to the non uniqueness of the solution. Despite its great ductility, the main limitation of surface wave method is constituted by its 1D approach, which has proved to be unsatisfactory or even misleading in case of presence of lateral variations in the subsoil. The aim of the present work is to provide data processing tools able to mitigate such limitation, so that the surface wave method can be effectively applied in laterally varying media. As far as the inadequacy of surface wave method in case of 2D structures in the subsoil, I developed two separate strategies to handle smooth and gradual lateral variations and abrupt subsurface heterogeneities. In case of smooth variations, the approach I adopted aims at "following" the gradual changes in subsoil materials properties. I therefore propose a procedure to extract a set of neighbouring dispersion curves from a single multichannel seismic record by applying a suitable spatial windowing of the traces. Each curve corresponds to a different subsoil portion, so that gradual changes in subsoil seismic parameters can be reconstructed through the inversion of dispersion curves. The method was tested on synthetic and real datasets, but proved its reliability in processing the data from a small scale seismic experiment as well. In the context of characterizing smooth 2D structures in the subsurface via the surface wave method, I also developed a procedure to quantitatively estimate the (gradual) lateral variability of model parameters by comparing the shape of local dispersion curves, without the need to solve a formal inverse problem. The method is based on a sensitivity analysis and on the applications of the scale properties of surface wave. The procedure can be devoted to different applications: I exploited it to extend a priori local information to subsoil portions for which an experimental dispersion curve is available and for an estimation of the lateral variability of model parameters for a set of neighboring dispersion curves. The method was successfully applied to synthetic and real datasets. To characterize sudden and abrupt lateral variations in the subsurface, I adopted another strategy: the aim is to estimate the location and embedment depth of sharp heterogeneities, to process separately the seismic traces belonging to quasi-1D subsoil portions. I adapted several methods, already available in literature but developed for different purposes and scales, to the detection of sudden changes in subsoil seismic properties via the analysis of anomalies in surface wave propagation. I got the most promising results when adapting these methods, originally developed for single shot configurations, to multifold seismic lines, exploiting their data redundancy to enhance the robustness of the analyses. The outcome of the thesis is therefore a series of processing tools that improve the reliability and the robustness of surface wave method when applied to the near surface characterization of laterally varying medi
System identification of linear vibrating structures
Methods of dynamic modelling and analysis of structures, for example the finite element method, are well developed. However, it is generally agreed that accurate modelling of complex structures is difficult and for critical applications it is necessary to validate or update the theoretical models using data measured from actual structures. The techniques of identifying the parameters of linear dynamic models using Vibration test data have attracted considerable interest recently. However, no method has received a general acceptance due to a number of difficulties. These difficulties are mainly due to (i) Incomplete number of Vibration modes that can be excited and measured, (ii) Incomplete number of coordinates that can be measured, (iii) Inaccuracy in the experimental data (iv) Inaccuracy in the model structure. This thesis reports on a new approach to update the parameters of a finite element model as well as a lumped parameter model with a diagonal mass matrix. The structure and its theoretical model are equally perturbed by adding mass or stiffness and the incomplete number of eigen-data is measured. The parameters are then identified by an iterative updating of the initial estimates, by sensitivity analysis, using eigenvalues or both eigenvalues and eigenvectors of the structure before and after perturbation. It is shown that with a suitable choice of the perturbing coordinates exact parameters can be identified if the data and the model structure are exact. The theoretical basis of the technique is presented. To cope with measurement errors and possible inaccuracies in the model structure, a well known Bayesian approach is used to minimize the least squares difference between the updated and the initial parameters. The eigen-data of the structure with added mass or stiffness is also determined using the frequency response data of the unmodified structure by a structural modification technique. Thus, mass or stiffness do not have to be added physically. The mass-stiffness addition technique is demonstrated by simulation examples and Laboratory experiments on beams and an H-frame
- …