5 research outputs found
Why do networks have inhibitory/negative connections?
Why do brains have inhibitory connections? Why do deep networks have negative
weights? We propose an answer from the perspective of representation capacity.
We believe representing functions is the primary role of both (i) the brain in
natural intelligence, and (ii) deep networks in artificial intelligence. Our
answer to why there are inhibitory/negative weights is: to learn more
functions. We prove that, in the absence of negative weights, neural networks
with non-decreasing activation functions are not universal approximators. While
this may be an intuitive result to some, to the best of our knowledge, there is
no formal theory, in either machine learning or neuroscience, that demonstrates
why negative weights are crucial in the context of representation capacity.
Further, we provide insights on the geometric properties of the representation
space that non-negative deep networks cannot represent. We expect these
insights will yield a deeper understanding of more sophisticated inductive
priors imposed on the distribution of weights that lead to more efficient
biological and machine learning.Comment: ICCV2023 camera-read
Omnidirectional Transfer for Quasilinear Lifelong Learning
In biological learning, data are used to improve performance not only on the
current task, but also on previously encountered and as yet unencountered
tasks. In contrast, classical machine learning starts from a blank slate, or
tabula rasa, using data only for the single task at hand. While typical
transfer learning algorithms can improve performance on future tasks, their
performance on prior tasks degrades upon learning new tasks (called
catastrophic forgetting). Many recent approaches for continual or lifelong
learning have attempted to maintain performance given new tasks. But striving
to avoid forgetting sets the goal unnecessarily low: the goal of lifelong
learning, whether biological or artificial, should be to improve performance on
all tasks (including past and future) with any new data. We propose
omnidirectional transfer learning algorithms, which includes two special cases
of interest: decision forests and deep networks. Our key insight is the
development of the omni-voter layer, which ensembles representations learned
independently on all tasks to jointly decide how to proceed on any given new
data point, thereby improving performance on both past and future tasks. Our
algorithms demonstrate omnidirectional transfer in a variety of simulated and
real data scenarios, including tabular data, image data, spoken data, and
adversarial tasks. Moreover, they do so with quasilinear space and time
complexity
Deep discriminative to kernel generative modeling
The fight between discriminative versus generative goes deep, in both the
study of artificial and natural intelligence. In our view, both camps have
complementary value, so, we sought to synergistic combine them. Here, we
propose a methodology to convert deep discriminative networks to kernel
generative networks. We leveraged the fact that deep models, including both
random forests and deep networks, learn internal representations which are
unions of polytopes with affine activation functions to conceptualize them both
as generalized partitioning rules. From that perspective, we used foundational
results on the relationship between histogram rules and kernel density
estimators to obtain class conditional kernel density estimators from the deep
models. We then studied the trade-offs we observed from implementing this
strategy in low-dimensional settings, both theoretically and empirically, as a
first step towards understanding. Theoretically, we show conditions under which
our generative models are more efficient than the corresponding discriminative
approaches. Empirically, when sample sizes are relatively high, the
discriminative models tend to perform as well or better on discriminative
metrics, such as classification rates and posterior calibration. However, when
sample sizes are relatively low, the generative models outperform the
discriminative ones even on discriminative metrics. Moreover, the generative
ones can also sample from the distribution, obtain smoother posteriors, and
extrapolate beyond the convex hull of the training data to handle OOD inputs
more reasonably. Via human experiments we illustrate that our kernel generative
networks (Kragen) behave more like humans than deep discriminative networks. We
believe this approach may be an important step in unifying the thinking and
approaches across the discriminative and generative divide
Prospective Learning: Back to the Future
Research on both natural intelligence (NI) and artificial intelligence (AI)
generally assumes that the future resembles the past: intelligent agents or
systems (what we call 'intelligence') observe and act on the world, then use
this experience to act on future experiences of the same kind. We call this
'retrospective learning'. For example, an intelligence may see a set of
pictures of objects, along with their names, and learn to name them. A
retrospective learning intelligence would merely be able to name more pictures
of the same objects. We argue that this is not what true intelligence is about.
In many real world problems, both NIs and AIs will have to learn for an
uncertain future. Both must update their internal models to be useful for
future tasks, such as naming fundamentally new objects and using these objects
effectively in a new context or to achieve previously unencountered goals. This
ability to learn for the future we call 'prospective learning'. We articulate
four relevant factors that jointly define prospective learning. Continual
learning enables intelligences to remember those aspects of the past which it
believes will be most useful in the future. Prospective constraints (including
biases and priors) facilitate the intelligence finding general solutions that
will be applicable to future problems. Curiosity motivates taking actions that
inform future decision making, including in previously unmet situations. Causal
estimation enables learning the structure of relations that guide choosing
actions for specific outcomes, even when the specific action-outcome
contingencies have never been observed before. We argue that a paradigm shift
from retrospective to prospective learning will enable the communities that
study intelligence to unite and overcome existing bottlenecks to more
effectively explain, augment, and engineer intelligences
Guidance on mucositis assessment from the MASCC Mucositis Study Group and ISOO: an international Delphi studyResearch in context
Summary: Background: Mucositis is a common and highly impactful side effect of conventional and emerging cancer therapy and thus the subject of intense investigation. Although common practice, mucositis assessment is heterogeneously adopted and poorly guided, impacting evidence synthesis and translation. The Multinational Association of Supportive Care in Cancer (MASCC) Mucositis Study Group (MSG) therefore aimed to establish expert recommendations for how existing mucositis assessment tools should be used, in clinical care and trials contexts, to improve the consistency of mucositis assessment. Methods: This study was conducted over two stages (January 2022–July 2023). The first phase involved a survey to MASCC-MSG members (January 2022–May 2022), capturing current practices, challenges and preferences. These then informed the second phase, in which a set of initial recommendations were prepared and refined using the Delphi method (February 2023–May 2023). Consensus was defined as agreement on a parameter by >80% of respondents. Findings: Seventy-two MASCC-MSG members completed the first phase of the study (37 females, 34 males, mainly oral care specialists). High variability was noted in the use of mucositis assessment tools, with a high reliance on clinician assessment compared to patient reported outcome measures (PROMs, 47% vs 3%, 37% used a combination). The World Health Organization (WHO) and Common Terminology Criteria for Adverse Events (CTCAE) scales were most commonly used to assess mucositis across multiple settings. Initial recommendations were reviewed by experienced MSG members and following two rounds of Delphi survey consensus was achieved in 91 of 100 recommendations. For example, in patients receiving chemotherapy, the recommended tool for clinician assessment in clinical practice is WHO for oral mucositis (89.5% consensus), and WHO or CTCAE for gastrointestinal mucositis (85.7% consensus). The recommended PROM in clinical trials is OMD/WQ for oral mucositis (93.3% consensus), and PRO-CTCAE for gastrointestinal mucositis (83.3% consensus). Interpretation: These new recommendations provide much needed guidance on mucositis assessment and may be applied in both clinical practice and research to streamline comparison and synthesis of global data sets, thus accelerating translation of new knowledge into clinical practice. Funding: No funding was received