31 research outputs found
Continual Invariant Risk Minimization
Empirical risk minimization can lead to poor generalization behavior on
unseen environments if the learned model does not capture invariant feature
representations. Invariant risk minimization (IRM) is a recent proposal for
discovering environment-invariant representations. IRM was introduced by
Arjovsky et al. (2019) and extended by Ahuja et al. (2020). IRM assumes that
all environments are available to the learning system at the same time. With
this work, we generalize the concept of IRM to scenarios where environments are
observed sequentially. We show that existing approaches, including those
designed for continual learning, fail to identify the invariant features and
models across sequentially presented environments. We extend IRM under a
variational Bayesian and bilevel framework, creating a general approach to
continual invariant risk minimization. We also describe a strategy to solve the
optimization problems using a variant of the alternating direction method of
multiplier (ADMM). We show empirically using multiple datasets and with
multiple sequential environments that the proposed methods outperform or is
competitive with prior approaches.Comment: Shorter version of this paper was presented at RobustML workshop of
ICLR 202
Self-Tuning Hamiltonian Monte Carlo for Accelerated Sampling
The performance of Hamiltonian Monte Carlo crucially depends on its
parameters, in particular the integration timestep and the number of
integration steps. We present an adaptive general-purpose framework to
automatically tune these parameters based on a loss function which promotes the
fast exploration of phase-space. For this, we make use of a
fully-differentiable set-up and use backpropagation for optimization. An
attention-like loss is defined which allows for the gradient driven learning of
the distribution of integration steps. We also highlight the importance of
jittering for a smooth loss-surface. Our approach is demonstrated for the
one-dimensional harmonic oscillator and alanine dipeptide, a small protein
common as a test-case for simulation methods. We find a good correspondence
between our loss and the autocorrelation times, resulting in well-tuned
parameters for Hamiltonian Monte Carlo
Learning Neural PDE Solvers with Parameter-Guided Channel Attention
Scientific Machine Learning (SciML) is concerned with the development of
learned emulators of physical systems governed by partial differential
equations (PDE). In application domains such as weather forecasting, molecular
dynamics, and inverse design, ML-based surrogate models are increasingly used
to augment or replace inefficient and often non-differentiable numerical
simulation algorithms. While a number of ML-based methods for approximating the
solutions of PDEs have been proposed in recent years, they typically do not
adapt to the parameters of the PDEs, making it difficult to generalize to PDE
parameters not seen during training. We propose a Channel Attention mechanism
guided by PDE Parameter Embeddings (CAPE) component for neural surrogate models
and a simple yet effective curriculum learning strategy. The CAPE module can be
combined with neural PDE solvers allowing them to adapt to unseen PDE
parameters. The curriculum learning strategy provides a seamless transition
between teacher-forcing and fully auto-regressive training. We compare CAPE in
conjunction with the curriculum learning strategy using a popular PDE benchmark
and obtain consistent and significant improvements over the baseline models.
The experiments also show several advantages of CAPE, such as its increased
ability to generalize to unseen PDE parameters without large increases
inference time and parameter count.Comment: accepted for publication in ICML202
Efficient and Scalable Multi-task Regression on Massive Number of Tasks
Many real-world large-scale regression problems can be formulated as
Multi-task Learning (MTL) problems with a massive number of tasks, as in retail
and transportation domains. However, existing MTL methods still fail to offer
both the generalization performance and the scalability for such problems.
Scaling up MTL methods to problems with a tremendous number of tasks is a big
challenge. Here, we propose a novel algorithm, named Convex Clustering
Multi-Task regression Learning (CCMTL), which integrates with convex clustering
on the k-nearest neighbor graph of the prediction models. Further, CCMTL
efficiently solves the underlying convex problem with a newly proposed
optimization method. CCMTL is accurate, efficient to train, and empirically
scales linearly in the number of tasks. On both synthetic and real-world
datasets, the proposed CCMTL outperforms seven state-of-the-art (SoA)
multi-task learning methods in terms of prediction accuracy as well as
computational efficiency. On a real-world retail dataset with 23,812 tasks,
CCMTL requires only around 30 seconds to train on a single thread, while the
SoA methods need up to hours or even days.Comment: Accepted at AAAI 201
Measuring the Discrepancy between Conditional Distributions: Methods, Properties and Applications
We propose a simple yet powerful test statistic to quantify the discrepancy
between two conditional distributions. The new statistic avoids the explicit
estimation of the underlying distributions in highdimensional space and it
operates on the cone of symmetric positive semidefinite (SPS) matrix using the
Bregman matrix divergence. Moreover, it inherits the merits of the correntropy
function to explicitly incorporate high-order statistics in the data. We
present the properties of our new statistic and illustrate its connections to
prior art. We finally show the applications of our new statistic on three
different machine learning problems, namely the multi-task learning over
graphs, the concept drift detection, and the information-theoretic feature
selection, to demonstrate its utility and advantage. Code of our statistic is
available at https://bit.ly/BregmanCorrentropy.Comment: manuscript accepted at IJCAI 20; added additional notes on
computational complexity and auto-differentiable property; code is available
at https://github.com/SJYuCNEL/Bregman-Correntropy-Conditional-Divergenc
PDEBENCH: An Extensive Benchmark for Scientific Machine Learning
Machine learning-based modeling of physical systems has experienced increased
interest in recent years. Despite some impressive progress, there is still a
lack of benchmarks for Scientific ML that are easy to use but still challenging
and representative of a wide range of problems. We introduce PDEBench, a
benchmark suite of time-dependent simulation tasks based on Partial
Differential Equations (PDEs). PDEBench comprises both code and data to
benchmark the performance of novel machine learning models against both
classical numerical simulations and machine learning baselines. Our proposed
set of benchmark problems contribute the following unique features: (1) A much
wider range of PDEs compared to existing benchmarks, ranging from relatively
common examples to more realistic and difficult problems; (2) much larger
ready-to-use datasets compared to prior work, comprising multiple simulation
runs across a larger number of initial and boundary conditions and PDE
parameters; (3) more extensible source codes with user-friendly APIs for data
generation and baseline results with popular machine learning models (FNO,
U-Net, PINN, Gradient-Based Inverse Method). PDEBench allows researchers to
extend the benchmark freely for their own purposes using a standardized API and
to compare the performance of new models to existing baseline methods. We also
propose new evaluation metrics with the aim to provide a more holistic
understanding of learning methods in the context of Scientific ML. With those
metrics we identify tasks which are challenging for recent ML methods and
propose these tasks as future challenges for the community. The code is
available at https://github.com/pdebench/PDEBench.Comment: 16 pages (main body) + 34 pages (supplemental material), accepted for
publication in NeurIPS 2022 Track Datasets and Benchmark
SNPs Array Karyotyping Reveals a Novel Recurrent 20p13 Amplification in Primary Myelofibrosis
The molecular pathogenesis of primary mielofibrosis (PMF) is still largely unknown. Recently, single-nucleotide polymorphism arrays (SNP-A) allowed for genome-wide profiling of copy-number alterations and acquired uniparental disomy (aUPD) at high-resolution. In this study we analyzed 20 PMF patients using the Genome-Wide Human SNP Array 6.0 in order to identify novel recurrent genomic abnormalities. We observed a complex karyotype in all cases, detecting all the previously reported lesions (del(5q), del(20q), del(13q), +8, aUPD at 9p24 and abnormalities on chromosome 1). In addition, we identified several novel cryptic lesions. In particular, we found a recurrent alteration involving cytoband 20p13 in 55% of patients. We defined a minimal affected region (MAR), an amplification of 9,911 base-pair (bp) overlapping the SIRPB1 gene locus. Noteworthy, by extending the analysis to the adjacent areas, the cytoband was overall affected in 95% of cases. Remarkably, these results were confirmed by real-time PCR and validated in silico in a large independent series of myeloproliferative diseases. Finally, by immunohistochemistry we found that SIRPB1 was over-expressed in the bone marrow of PMF patients carrying 20p13 amplification. In conclusion, we identified a novel highly recurrent genomic lesion in PMF patients, which definitely warrant further functional and clinical characterization