10,401 research outputs found
Interpretable statistics for complex modelling: quantile and topological learning
As the complexity of our data increased exponentially in the last decades, so has our
need for interpretable features. This thesis revolves around two paradigms to approach
this quest for insights.
In the first part we focus on parametric models, where the problem of interpretability
can be seen as a “parametrization selection”. We introduce a quantile-centric
parametrization and we show the advantages of our proposal in the context of regression,
where it allows to bridge the gap between classical generalized linear (mixed)
models and increasingly popular quantile methods.
The second part of the thesis, concerned with topological learning, tackles the
problem from a non-parametric perspective. As topology can be thought of as a way
of characterizing data in terms of their connectivity structure, it allows to represent
complex and possibly high dimensional through few features, such as the number of
connected components, loops and voids. We illustrate how the emerging branch of
statistics devoted to recovering topological structures in the data, Topological Data
Analysis, can be exploited both for exploratory and inferential purposes with a special
emphasis on kernels that preserve the topological information in the data.
Finally, we show with an application how these two approaches can borrow strength
from one another in the identification and description of brain activity through fMRI
data from the ABIDE project
kLog: A Language for Logical and Relational Learning with Kernels
We introduce kLog, a novel approach to statistical relational learning.
Unlike standard approaches, kLog does not represent a probability distribution
directly. It is rather a language to perform kernel-based learning on
expressive logical and relational representations. kLog allows users to specify
learning problems declaratively. It builds on simple but powerful concepts:
learning from interpretations, entity/relationship data modeling, logic
programming, and deductive databases. Access by the kernel to the rich
representation is mediated by a technique we call graphicalization: the
relational representation is first transformed into a graph --- in particular,
a grounded entity/relationship diagram. Subsequently, a choice of graph kernel
defines the feature space. kLog supports mixed numerical and symbolic data, as
well as background knowledge in the form of Prolog or Datalog programs as in
inductive logic programming systems. The kLog framework can be applied to
tackle the same range of tasks that has made statistical relational learning so
popular, including classification, regression, multitask learning, and
collective classification. We also report about empirical comparisons, showing
that kLog can be either more accurate, or much faster at the same level of
accuracy, than Tilde and Alchemy. kLog is GPLv3 licensed and is available at
http://klog.dinfo.unifi.it along with tutorials
The Importance of Forgetting: Limiting Memory Improves Recovery of Topological Characteristics from Neural Data
We develop of a line of work initiated by Curto and Itskov towards
understanding the amount of information contained in the spike trains of
hippocampal place cells via topology considerations. Previously, it was
established that simply knowing which groups of place cells fire together in an
animal's hippocampus is sufficient to extract the global topology of the
animal's physical environment. We model a system where collections of place
cells group and ungroup according to short-term plasticity rules. In
particular, we obtain the surprising result that in experiments with spurious
firing, the accuracy of the extracted topological information decreases with
the persistence (beyond a certain regime) of the cell groups. This suggests
that synaptic transience, or forgetting, is a mechanism by which the brain
counteracts the effects of spurious place cell activity
Metaphoric coherence: Distinguishing verbal metaphor from `anomaly\u27
Theories and computational models of metaphor comprehension generally circumvent the question of metaphor versus “anomaly” in favor of a treatment of metaphor versus literal language. Making the distinction between metaphoric and “anomalous” expressions is subject to wide variation in judgment, yet humans agree that some potentially metaphoric expressions are much more comprehensible than others. In the context of a program which interprets simple isolated sentences that are potential instances of cross‐modal and other verbal metaphor, I consider some possible coherence criteria which must be satisfied for an expression to be “conceivable” metaphorically. Metaphoric constraints on object nominals are represented as abstracted or extended along with the invariant structural components of the verb meaning in a metaphor. This approach distinguishes what is preserved in metaphoric extension from that which is “violated”, thus referring to both “similarity” and “dissimilarity” views of metaphor. The role and potential limits of represented abstracted properties and constraints is discussed as they relate to the recognition of incoherent semantic combinations and the rejection or adjustment of metaphoric interpretations
Statistical Analysis and Parameter Selection for Mapper
In this article, we study the question of the statistical convergence of the
1-dimensional Mapper to its continuous analogue, the Reeb graph. We show that
the Mapper is an optimal estimator of the Reeb graph, which gives, as a
byproduct, a method to automatically tune its parameters and compute confidence
regions on its topological features, such as its loops and flares. This allows
to circumvent the issue of testing a large grid of parameters and keeping the
most stable ones in the brute-force setting, which is widely used in
visualization, clustering and feature selection with the Mapper.Comment: Minor modification
Persistent Homology in Sparse Regression and its Application to Brain Morphometry
Sparse systems are usually parameterized by a tuning parameter that
determines the sparsity of the system. How to choose the right tuning parameter
is a fundamental and difficult problem in learning the sparse system. In this
paper, by treating the the tuning parameter as an additional dimension,
persistent homological structures over the parameter space is introduced and
explored. The structures are then further exploited in speeding up the
computation using the proposed soft-thresholding technique. The topological
structures are further used as multivariate features in the tensor-based
morphometry (TBM) in characterizing white matter alterations in children who
have experienced severe early life stress and maltreatment. These analyses
reveal that stress-exposed children exhibit more diffuse anatomical
organization across the whole white matter region.Comment: submitted to IEEE Transactions on Medical Imagin
- …