Search CORE

38 research outputs found

Automatic Bayesian Density Analysis

Author: Ghahramani Zoubin
Kersting Kristian
Molina Alejandro
Peharz Robert
Valera Isabel
Vergari Antonio
Publication venue
Publication date: 01/01/2019
Field of study

Making sense of a dataset in an automatic and unsupervised fashion is a challenging problem in statistics and AI. Classical approaches for {exploratory data analysis} are usually not flexible enough to deal with the uncertainty inherent to real-world data: they are often restricted to fixed latent interaction models and homogeneous likelihoods; they are sensitive to missing, corrupt and anomalous data; moreover, their expressiveness generally comes at the price of intractable inference. As a result, supervision from statisticians is usually needed to find the right model for the data. However, since domain experts are not necessarily also experts in statistics, we propose Automatic Bayesian Density Analysis (ABDA) to make exploratory data analysis accessible at large. Specifically, ABDA allows for automatic and efficient missing value estimation, statistical data type and likelihood discovery, anomaly detection and dependency structure mining, on top of providing accurate density estimation. Extensive empirical evidence shows that ABDA is a suitable tool for automatic exploratory analysis of mixed continuous and discrete tabular data.Comment: In proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19

arXiv.org e-Print Archive

TUbiblio

Pure OAI Repository

MPG.PuRe

Association for the Advancement of Artificial Intelligence: AAAI Publications

Inductive Graph Neural Networks for Spatiotemporal Kriging

Author: Labbe Aurelie
Sun Lijun
Wu Yuankai
Zhuang Dingyi
Publication venue
Publication date: 19/12/2020
Field of study

Time series forecasting and spatiotemporal kriging are the two most important tasks in spatiotemporal data analysis. Recent research on graph neural networks has made substantial progress in time series forecasting, while little attention has been paid to the kriging problem -- recovering signals for unsampled locations/sensors. Most existing scalable kriging methods (e.g., matrix/tensor completion) are transductive, and thus full retraining is required when we have a new sensor to interpolate. In this paper, we develop an Inductive Graph Neural Network Kriging (IGNNK) model to recover data for unsampled sensors on a network/graph structure. To generalize the effect of distance and reachability, we generate random subgraphs as samples and reconstruct the corresponding adjacency matrix for each sample. By reconstructing all signals on each sample subgraph, IGNNK can effectively learn the spatial message passing mechanism. Empirical results on several real-world spatiotemporal datasets demonstrate the effectiveness of our model. In addition, we also find that the learned model can be successfully transferred to the same type of kriging tasks on an unseen dataset. Our results show that: 1) GNN is an efficient and effective tool for spatial kriging; 2) inductive GNNs can be trained using dynamic adjacency matrices; 3) a trained model can be transferred to new graph structures and 4) IGNNK can be used to generate virtual sensors.Comment: AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Multitask Learning of Vegetation Biochemistry from Hyperspectral Data

Author: Gewali Utsav B.
Monteiro Sildomar T.
Publication venue: RIT Scholar Works
Publication date: 31/07/2016
Field of study

Statistical models have been successful in accurately estimating the biochemical contents of vegetation from the reflectance spectra. However, their performance deteriorates when there is a scarcity of sizable amount of ground truth data for modeling the complex non-linear relationship occurring between the spectrum and the biochemical quantity. We propose a novel Gaussian process based multitask learning method for improving the prediction of a biochemical through the transfer of knowledge from the learned models for predicting related biochemicals. This method is most advantageous when there are few ground truth data for the biochemical of interest, but plenty of ground truth data for related biochemicals. The proposed multitask Gaussian process hypothesizes that the inter-relationship between the biochemical quantities is better modeled by using a combination of two or more covariance functions and inter-task correlation matrices. In the experiments, our method outperformed the current methods on two real-world datasets

arXiv.org e-Print Archive

Crossref

RIT Scholar Works

A Bayesian Perspective of Statistical Machine Learning for Big Data

Author: Sambasivan Rajiv
Das Sourish
Sahu Sujit K
Publication venue
Publication date: 15/10/2015
Field of study

Statistical Machine Learning (SML) refers to a body of algorithms and methods by which computers are allowed to discover important features of input data sets which are often very large in size. The very task of feature discovery from data is essentially the meaning of the keyword `learning' in SML. Theoretical justifications for the effectiveness of the SML algorithms are underpinned by sound principles from different disciplines, such as Computer Science and Statistics. The theoretical underpinnings particularly justified by statistical inference methods are together termed as statistical learning theory. This paper provides a review of SML from a Bayesian decision theoretic point of view -- where we argue that many SML techniques are closely connected to making inference by using the so called Bayesian paradigm. We discuss many important SML techniques such as supervised and unsupervised learning, deep learning, online learning and Gaussian processes especially in the context of very large data sets where these are often employed. We present a dictionary which maps the key concepts of SML from Computer Science and Statistics. We illustrate the SML techniques with three moderately large data sets where we also discuss many practical implementation issues. Thus the review is especially targeted at statisticians and computer scientists who are aspiring to understand and apply SML for moderately large to big data sets.Comment: 26 pages, 3 figures, Review pape

arXiv.org e-Print Archive

UvA-DARE

International Migration, Integration and Social Cohesion online publications

The Emerging Trends of Multi-Label Learning

Author: Liu Weiwei
Shen Xiaobo
Tsang Ivor W.
Wang Haobo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/11/2021
Field of study

Exabytes of data are generated daily by humans, leading to the growing need for new efforts in dealing with the grand challenges for multi-label learning brought by big data. For example, extreme multi-label classification is an active and rapidly growing research area that deals with classification tasks with an extremely large number of classes or labels; utilizing massive data with limited supervision to build a multi-label classification model becomes valuable for practical applications, etc. Besides these, there are tremendous efforts on how to harvest the strong learning capability of deep learning to better capture the label dependencies in multi-label learning, which is the key for deep learning to address real-world classification tasks. However, it is noted that there has been a lack of systemic studies that focus explicitly on analyzing the emerging trends and new challenges of multi-label learning in the era of big data. It is imperative to call for a comprehensive survey to fulfill this mission and delineate future research directions and new applications.Comment: Accepted to TPAMI 202

arXiv.org e-Print Archive

OPUS - University of Technology Sydney