Search CORE

8 research outputs found

Towards geostatistical learning for the geosciences: A case study in improving the spatial awareness of spectral clustering

Author: Mueller Ute
Peeters L. J. M.
Talebi Hassan
Tolosana-Delgado R.
van den Boogaart K. G.
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 01/01/2020
Field of study

The particularities of geosystems and geoscience data must be understood before any development or implementation of statistical learning algorithms. Without such knowledge, the predictions and inferences may not be accurate and physically consistent. Accuracy, transparency and interpretability, credibility, and physical realism are minimum criteria for statistical learning algorithms when applied to the geosciences. This study briefly reviews several characteristics of geoscience data and challenges for novel statistical learning algorithms. A novel spatial spectral clustering approach is introduced to illustrate how statistical learners can be adapted for modelling geoscience data. The spatial awareness and physical realism of the spectral clustering are improved by utilising a dissimilarity matrix based on nonparametric higher-order spatial statistics. The proposed model-free technique can identify meaningful spatial clusters (i.e. meaningful geographical subregions) from multivariate spatial data at different scales without the need to define a model of co-dependence. Several mixed (e.g. continuous and categorical) variables can be used as inputs to the proposed clustering technique. The proposed technique is illustrated using synthetic and real mining datasets. The results of the case studies confirm the usefulness of the proposed method for modelling spatial data

Research Online @ ECU

Mining Novel Multivariate Relationships in Time Series Data Using Correlation Networks

Author: Agrawal Saurabh
Atluri Gowtham
Boley Daniel
Chatterjee Snigdhansu
Dang Anh The
Kumar Vipin
Liess Stefan
Steinbach Michael
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

In many domains, there is significant interest in capturing novel relationships between time series that represent activities recorded at different nodes of a highly complex system. In this paper, we introduce multipoles, a novel class of linear relationships between more than two time series. A multipole is a set of time series that have strong linear dependence among themselves, with the requirement that each time series makes a significant contribution to the linear dependence. We demonstrate that most interesting multipoles can be identified as cliques of negative correlations in a correlation network. Such cliques are typically rare in a real-world correlation network, which allows us to find almost all multipoles efficiently using a clique-enumeration approach. Using our proposed framework, we demonstrate the utility of multipoles in discovering new physical phenomena in two scientific domains: climate science and neuroscience. In particular, we discovered several multipole relationships that are reproducible in multiple other independent datasets and lead to novel domain insights.Comment: This is the accepted version of article submitted to IEEE Transactions on Knowledge and Data Engineering 201

arXiv.org e-Print Archive

Crossref

Predictive Learning with Heterogeneity in Populations

Author: Karpatne Anuj
Publication venue
Publication date: 01/10/2017
Field of study

University of Minnesota Ph.D. dissertation. October 2017. Major: Computer Science. Advisor: Vipin Kumar. 1 computer file (PDF); x, 119 pages.Predictive learning forms the backbone of several data-driven systems powering scientific as well as commercial applications, e.g., filtering spam messages, detecting faces in images, forecasting health risks, and mapping ecological resources. However, one of the major challenges in applying standard predictive learning methods in real-world applications is the heterogeneity in populations of data instances, i.e., different groups (or populations) of data instances show different nature of predictive relationships. For example, different populations of human subjects may show different risks for a disease even if they have similar diagnosis reports, depending on their ethnic profiles, medical history, and lifestyle choices. In the presence of population heterogeneity, a central challenge is that the training data comprises of instances belonging from multiple populations, and the instances in the test set may be from a different population than that of the training instances. This limits the effectiveness of standard predictive learning frameworks that are based on the assumption that the instances are independent and identically distributed (i.i.d), which are ideally true only in simplistic settings. This thesis introduces several ways of learning predictive models with heterogeneity in populations, by incorporating information about the context of every data instance, which is available in varying types and formats in different application settings. It introduces a novel multi-task learning framework for problems where we have access to some ancillary variables that can be grouped to produce homogeneous partitions of data instances, thus addressing the heterogeneity in populations. This thesis also introduces a novel strategy for constructing mode-specific ensembles in binary classification settings, where each class shows multi-modal distribution due to the heterogeneity in their populations. When the context of data instances is implicitly defined such that the test data is known to comprise of contextually similar groups, this thesis presents a novel framework for adapting classification decisions using the group-level properties of test instances. This thesis also builds the foundations of a novel paradigm of scientific discovery, termed as theory-guided data science, that seeks to explore the full potential of data science methods but without ignoring the treasure of knowledge contained in scientific theories and principles

University of Minnesota Digital Conservancy

A graph-based approach to find teleconnections in climate data

Author: Bridgman
Cattiaux
Dommenget
Donges
Elsner
Engle
Gadgil
García-Serrano
Granger
Hines
Intergovernmental Panel on Climate Change
Onogi
Pekeris
Steinhaeuser
Steinhaeuser
Tsonis
Tsonis
Tsonis
Uppala
Vecchi
Von
Walker
Wallace
White
Publication venue: 'Wiley'
Publication date
Field of study

Crossref