74 research outputs found
Spatial Classification With Limited Observations Based On Physics-Aware Structural Constraint
Spatial classification with limited feature observations has been a
challenging problem in machine learning. The problem exists in applications
where only a subset of sensors are deployed at certain spots or partial
responses are collected in field surveys. Existing research mostly focuses on
addressing incomplete or missing data, e.g., data cleaning and imputation,
classification models that allow for missing feature values or model missing
features as hidden variables in the EM algorithm. These methods, however,
assume that incomplete feature observations only happen on a small subset of
samples, and thus cannot solve problems where the vast majority of samples have
missing feature observations. To address this issue, we recently proposed a new
approach that incorporates physics-aware structural constraint into the model
representation. Our approach assumes that a spatial contextual feature is
observed for all sample locations and establishes spatial structural constraint
from the underlying spatial contextual feature map. We design efficient
algorithms for model parameter learning and class inference. This paper extends
our recent approach by allowing feature values of samples in each class to
follow a multi-modal distribution. We propose learning algorithms for the
extended model with multi-modal distribution. Evaluations on real-world
hydrological applications show that our approach significantly outperforms
baseline methods in classification accuracy, and the multi-modal extension is
more robust than our early single-modal version especially when feature
distribution in training samples is multi-modal. Computational experiments show
that the proposed solution is computationally efficient on large datasets
MMCPP: A MULTI-MODAL CONTRASTIVE PRE-TRAINING MODEL FOR PLACE REPRESENTATION BASED ON THE SPATIO-TEMPORAL FRAMEWORK
The concept of "place" is crucial for understanding geographical environments from a human perspective. Place representation learning involves converting places into numerical low-dimensional dense vectors and is a fundamental procedure for artificial intelligence in geography (GeoAI). However, most studies ignore multi-level distance constraints and spatial proximity interactions that enable behavioral interactions between places. Furthermore, representing the temporal characteristics of these interactions in trajectory sequences poses a challenge for natural language processing and other field techniques. In addition, most existing methods rely on all modalities from inputs as they use joint training to integrate multiple modalities. To address these issues, we propose a Multi-Modal Contrastive Pre-training model for Place representation (MMCPP). Our model consists of three encoders that capture corresponding place attributes across different modalities, including point of interests (POIs), images, and trajectories. The trajectory encoder, named RodtFormer, takes fine-grained spatio-temporal trajectories as input and leverages self-attention with rotary temporal interval position embedding to simulate dynamic spatial and behavioral proximity interactions between places. By using a coordinated pre-training framework, MMCPP independently encodes place representations across different modalities and improves model reusability. We verify the effectiveness of our model on a taxi trajectory dataset using the location prediction task at next n seconds, including 30 seconds(s), 180(s), 300(s). Our results demonstrate that compared to existing embedding methods, our model is capable of learning higher-quality position representations during pre-training, leading to improved performance on downstream tasks
Generative adversarial networks for sequential learning
Generative modelling aims to learn the data generating mechanism from observations without supervision. It is a desirable and natural approach for learning unlabelled data which is easily accessible. Deep generative models refer to a class of generative models combined with the usage of deep learning techniques, taking advantage of the intuitive principles of generative models as well as the expressiveness and flexibility of neural networks. The applications of generative modelling include image, audio, and video synthesis, text summarisation and translation, and so on. The methods developed in this thesis particularly emphasise on domains involving data of sequential nature, such as video generation and prediction, weather forecasting, and dynamic 3D reconstruction. Firstly, we introduce a new adversarial algorithm for training generative models suitable for sequential data. This algorithm is built on the theory of Causal Optimal Transport (COT) which constrains the transport plans to respect the temporal dependencies exhibited in the data. Secondly, the algorithm is extended to learn conditional sequences, that is, how a sequence is likely to evolve given the observation of its past evolution. Meanwhile, we work with the modified empirical measures to guarantee the convergence of the COT distance when the sequences do not overlap at any time step. Thirdly, we show that state-of-the-art results in the complex spatio-temporal modelling using GANs can be further improved by leveraging prior knowledge in the spatial-temporal correlation in the domain of weather forecasting. Finally, we demonstrate how deep generative models can be adopted to address a classical statistical problem of conditional independence testing. A class of classic approaches for such a task requires computing a test statistic using samples drawn from two unknown conditional distributions. We therefore present a double GANs framework to learn two generative models that approximate both conditional distributions. The success of this approach sheds light on how certain challenging statistical problems can benefit from the adequate learning results as well as the efficient sampling procedure of deep generative models
Unifying Inter-Region Autocorrelation and Intra-Region Structures for Spatial Embedding via Collective Adversarial Learning
Unsupervised spatial representation learning aims to automatically identify effective features of geographic entities (i.e., regions) from unlabeled yet structural geographical data. Existing network embedding methods can partially address the problem by: (1) regarding a region as a node in order to reformulate the problem into node embedding; (2) regarding a region as a graph in order to reformulate the problem into graph embedding. However, these studies can be improved by preserving (1) intra-region geographic structures, which are represented by multiple spatial graphs, leading to a reformulation of collective learning from relational graphs; (2) inter-region spatial autocorrelations, which are represented by pairwise graph regularization, leading to a reformulation of adversarial learning. Moreover, field data in real systems are usually lack of labels, an unsupervised fashion helps practical deployments. Along these lines, we develop an unsupervised Collective Graph-regularized dual-Adversarial Learning (CGAL) framework for multi-view graph representation learning and also a Graph-regularized dual-Adversarial Learning (GAL) framework for single-view graph representation learning. Finally, our experimental results demonstrate the enhanced effectiveness of our method
Learning with Attributed Networks: Algorithms and Applications
abstract: Attributes - that delineating the properties of data, and connections - that describing the dependencies of data, are two essential components to characterize most real-world phenomena. The synergy between these two principal elements renders a unique data representation - the attributed networks. In many cases, people are inundated with vast amounts of data that can be structured into attributed networks, and their use has been attractive to researchers and practitioners in different disciplines. For example, in social media, users interact with each other and also post personalized content; in scientific collaboration, researchers cooperate and are distinct from peers by their unique research interests; in complex diseases studies, rich gene expression complements to the gene-regulatory networks. Clearly, attributed networks are ubiquitous and form a critical component of modern information infrastructure. To gain deep insights from such networks, it requires a fundamental understanding of their unique characteristics and be aware of the related computational challenges.
My dissertation research aims to develop a suite of novel learning algorithms to understand, characterize, and gain actionable insights from attributed networks, to benefit high-impact real-world applications. In the first part of this dissertation, I mainly focus on developing learning algorithms for attributed networks in a static environment at two different levels: (i) attribute level - by designing feature selection algorithms to find high-quality features that are tightly correlated with the network topology; and (ii) node level - by presenting network embedding algorithms to learn discriminative node embeddings by preserving node proximity w.r.t. network topology structure and node attribute similarity. As changes are essential components of attributed networks and the results of learning algorithms will become stale over time, in the second part of this dissertation, I propose a family of online algorithms for attributed networks in a dynamic environment to continuously update the learning results on the fly. In fact, developing application-aware learning algorithms is more desired with a clear understanding of the application domains and their unique intents. As such, in the third part of this dissertation, I am also committed to advancing real-world applications on attributed networks by incorporating the objectives of external tasks into the learning process.Dissertation/ThesisDoctoral Dissertation Computer Science 201
Learning visual representations with neural networks for video captioning and image generation
La recherche sur les reĢseaux de neurones a permis de reĢaliser de larges progreĢs durant la dernieĢre deĢcennie. Non seulement les reĢseaux de neurones ont eĢteĢ appliqueĢs avec succeĢs pour reĢsoudre des probleĢmes de plus en plus complexes; mais ils sont aussi devenus lāapproche dominante dans les domaines ouĢ ils ont eĢteĢ testeĢs tels que la compreĢhension du langage, les agents jouant aĢ des jeux de manieĢre automatique ou encore la vision par ordinateur, graĢce aĢ leurs capaciteĢs calculatoires et leurs efficaciteĢs statistiques.
La preĢsente theĢse eĢtudie les reĢseaux de neurones appliqueĢs aĢ des probleĢmes en vision par ordinateur, ouĢ les repreĢsentations seĢmantiques abstraites jouent un roĢle fondamental. Nous deĢmontrerons, aĢ la fois par la theĢorie et par lāexpeĢrimentation, la capaciteĢ des reĢseaux de neurones aĢ apprendre de telles repreĢsentations aĢ partir de donneĢes, avec ou sans supervision.
Le contenu de la theĢse est diviseĢ en deux parties. La premieĢre partie eĢtudie les reĢseaux de neurones appliqueĢs aĢ la description de videĢo en langage naturel, neĢcessitant lāapprentissage de repreĢsentation visuelle. Le premier modeĢle proposeĢ permet dāavoir une attention dynamique sur les diffeĢrentes trames de la videĢo lors de la geĢneĢration de la description textuelle pour de courtes videĢos. Ce modeĢle est ensuite ameĢlioreĢ par lāintroduction dāune opeĢration de convolution reĢcurrente. Par la suite, la dernieĢre section de cette partie identifie un probleĢme fondamental dans la description de videĢo en langage naturel et propose un nouveau type de meĢtrique dāeĢvaluation qui peut eĢtre utiliseĢ empiriquement comme un oracle afin dāanalyser les performances de modeĢles concernant cette taĢche.
La deuxieĢme partie se concentre sur lāapprentissage non-superviseĢ et eĢtudie une famille de modeĢles capables de geĢneĢrer des images. En particulier, lāaccent est mis sur les āNeural Autoregressive Density Estimators (NADEs), une famille de modeĢles probabilistes pour les images naturelles. Ce travail met tout dāabord en eĢvidence une connection entre les modeĢles NADEs et les reĢseaux stochastiques geĢneĢratifs (GSN). De plus, une ameĢlioration des modeĢles NADEs standards est proposeĢe. DeĢnommeĢs NADEs iteĢratifs, cette ameĢlioration introduit plusieurs iteĢrations lors de lāinfeĢrence du modeĢle NADEs tout en preĢservant son nombre de parameĢtres.
DeĢbutant par une revue chronologique, ce travail se termine par un reĢsumeĢ des reĢcents deĢveloppements en lien avec les contributions preĢsenteĢes dans les deux parties principales, concernant les probleĢmes dāapprentissage de repreĢsentation seĢmantiques pour les images et les videĢos. De prometteuses directions de recherche sont envisageĢes.The past decade has been marked as a golden era of neural network research. Not only have neural networks been successfully applied to solve more and more challenging real- world problems, but also they have become the dominant approach in many of the places where they have been tested. These places include, for instance, language understanding, game playing, and computer vision, thanks to neural networksā superiority in computational efficiency and statistical capacity. This thesis applies neural networks to problems in computer vision where high-level and semantically meaningful representations play a fundamental role. It demonstrates both in theory and in experiment the ability to learn such representations from data with and without supervision. The main content of the thesis is divided into two parts. The first part studies neural networks in the context of learning visual representations for the task of video captioning. Models are developed to dynamically focus on different frames while generating a natural language description of a short video. Such a model is further improved by recurrent convolutional operations. The end of this part identifies fundamental challenges in video captioning and proposes a new type of evaluation metric that may be used experimentally as an oracle to benchmark performance. The second part studies the family of models that generate images. While the first part is supervised, this part is unsupervised. The focus of it is the popular family of Neural Autoregressive Density Estimators (NADEs), a tractable probabilistic model for natural images. This work first makes a connection between NADEs and Generative Stochastic Networks (GSNs). The standard NADE is improved by introducing multiple iterations in its inference without increasing the number of parameters, which is dubbed iterative NADE. With a historical view at the beginning, this work ends with a summary of recent development for work discussed in the first two parts around the central topic of learning visual representations for images and videos. A bright future is envisioned at the end
Machine learning and the physical sciences
Machine learning encompasses a broad range of algorithms and modeling tools
used for a vast array of data processing tasks, which has entered most
scientific disciplines in recent years. We review in a selective way the recent
research on the interface between machine learning and physical sciences. This
includes conceptual developments in machine learning (ML) motivated by physical
insights, applications of machine learning techniques to several domains in
physics, and cross-fertilization between the two fields. After giving basic
notion of machine learning methods and principles, we describe examples of how
statistical physics is used to understand methods in ML. We then move to
describe applications of ML methods in particle physics and cosmology, quantum
many body physics, quantum computing, and chemical and material physics. We
also highlight research and development into novel computing architectures
aimed at accelerating ML. In each of the sections we describe recent successes
as well as domain-specific methodology and challenges
- ā¦