338 research outputs found
Learning the Roots of Visual Domain Shift
In this paper we focus on the spatial nature of visual domain shift,
attempting to learn where domain adaptation originates in each given image of
the source and target set. We borrow concepts and techniques from the CNN
visualization literature, and learn domainnes maps able to localize the degree
of domain specificity in images. We derive from these maps features related to
different domainnes levels, and we show that by considering them as a
preprocessing step for a domain adaptation algorithm, the final classification
performance is strongly improved. Combined with the whole image representation,
these features provide state of the art results on the Office dataset.Comment: Extended Abstrac
A Weaker Faithfulness Assumption based on Triple Interactions
One of the core assumptions in causal discovery is the faithfulness assumption---i.e. assuming that independencies found in the data are due to separations in the true causal graph. This assumption can, however, be violated in many ways, including xor connections, deterministic functions or cancelling paths. In this work, we propose a weaker assumption that we call 2-adjacency faithfulness. In contrast to adjacency faithfulness, which assumes that there is no conditional independence between each pair of variables that are connected in the causal graph, we only require no conditional independence between a node and a subset of its Markov blanket that can contain up to two nodes. Equivalently, we adapt orientation faithfulness to this setting. We further propose a sound orientation rule for causal discovery that applies under weaker assumptions. As a proof of concept, we derive a modified Grow and Shrink algorithm that recovers the Markov blanket of a target node and prove its correctness under strictly weaker assumptions than the standard faithfulness assumption
Conditional BRUNO: A neural process for exchangeable labelled data
We present a neural process that models exchangeable sequences of high-dimensional complex observations conditionally on a set of labels or tags. Our model combines the expressiveness of deep neural networks with the data-efficiency of Gaussian processes, resulting in a probabilistic model for which the posterior distribution is easy to evaluate and sample from, and the computational complexity scales linearly with the number of observations. The advantages of the proposed architecture are demonstrated on a challenging few-shot view reconstruction task which requires generalisation from short sequences of viewpoints
Decision-Theoretic Planning with non-Markovian Rewards
A decision process in which rewards depend on history rather than merely on
the current state is called a decision process with non-Markovian rewards
(NMRDP). In decision-theoretic planning, where many desirable behaviours are
more naturally expressed as properties of execution sequences rather than as
properties of states, NMRDPs form a more natural model than the commonly
adopted fully Markovian decision process (MDP) model. While the more tractable
solution methods developed for MDPs do not directly apply in the presence of
non-Markovian rewards, a number of solution methods for NMRDPs have been
proposed in the literature. These all exploit a compact specification of the
non-Markovian reward function in temporal logic, to automatically translate the
NMRDP into an equivalent MDP which is solved using efficient MDP solution
methods. This paper presents NMRDPP (Non-Markovian Reward Decision Process
Planner), a software platform for the development and experimentation of
methods for decision-theoretic planning with non-Markovian rewards. The current
version of NMRDPP implements, under a single interface, a family of methods
based on existing as well as new approaches which we describe in detail. These
include dynamic programming, heuristic search, and structured methods. Using
NMRDPP, we compare the methods and identify certain problem features that
affect their performance. NMRDPPs treatment of non-Markovian rewards is
inspired by the treatment of domain-specific search control knowledge in the
TLPlan planner, which it incorporates as a special case. In the First
International Probabilistic Planning Competition, NMRDPP was able to compete
and perform well in both the domain-independent and hand-coded tracks, using
search control knowledge in the latter
Practical Kernel Tests of Conditional Independence
We describe a data-efficient, kernel-based approach to statistical testing of
conditional independence. A major challenge of conditional independence
testing, absent in tests of unconditional independence, is to obtain the
correct test level (the specified upper bound on the rate of false positives),
while still attaining competitive test power. Excess false positives arise due
to bias in the test statistic, which is obtained using nonparametric kernel
ridge regression. We propose three methods for bias control to correct the test
level, based on data splitting, auxiliary data, and (where possible) simpler
function classes. We show these combined strategies are effective both for
synthetic and real-world data
A kernel method for the two-sample-problem
We propose two statistical tests to determine if two samples are from different dis-tributions. Our test statistic is in both cases the distance between the means of the two samples mapped into a reproducing kernel Hilbert space (RKHS). The first test is based on a large deviation bound for the test statistic, while the second is based on the asymptotic distribution of this statistic. The test statistic can be com-puted in O(m2) time. We apply our approach to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where our test performs strongly. We also demonstrate excellent performance when compar-ing distributions over graphs, for which no alternative tests currently exist
Efficient Conditionally Invariant Representation Learning
We introduce the Conditional Independence Regression CovariancE (CIRCE),
a measure of conditional independence for multivariate continuous-valued variables. CIRCE applies as a regularizer in settings where we wish to learn neural
features φ(X) of data X to estimate a target Y , while being conditionally independent of a distractor Z given Y . Both Z and Y are assumed to be continuous-valued
but relatively low dimensional, whereas X and its features may be complex and
high dimensional. Relevant settings include domain-invariant learning, fairness,
and causal learning. The procedure requires just a single ridge regression from Y
to kernelized features of Z, which can be done in advance. It is then only necessary to enforce independence of φ(X) from residuals of this regression, which
is possible with attractive estimation properties and consistency guarantees. By
contrast, earlier measures of conditional feature dependence require multiple regressions for each step of feature learning, resulting in more severe bias and variance, and greater computational cost. When sufficiently rich features are used,
we establish that CIRCE is zero if and only if φ(X) ⊥⊥ Z | Y . In experiments,
we show superior performance to previous methods on challenging benchmarks,
including learning conditionally invariant image features
- …