1,167 research outputs found
PredNet and Predictive Coding: A Critical Review
PredNet, a deep predictive coding network developed by Lotter et al.,
combines a biologically inspired architecture based on the propagation of
prediction error with self-supervised representation learning in video. While
the architecture has drawn a lot of attention and various extensions of the
model exist, there is a lack of a critical analysis. We fill in the gap by
evaluating PredNet both as an implementation of the predictive coding theory
and as a self-supervised video prediction model using a challenging video
action classification dataset. We design an extended model to test if
conditioning future frame predictions on the action class of the video improves
the model performance. We show that PredNet does not yet completely follow the
principles of predictive coding. The proposed top-down conditioning leads to a
performance gain on synthetic data, but does not scale up to the more complex
real-world action classification dataset. Our analysis is aimed at guiding
future research on similar architectures based on the predictive coding theory
Planning Interdisciplinary Artificial Intelligence Courses For Engineering Students
As Artificial Intelligence (AI) becomes increasingly important in engineering, instructors need to incorporate AI concepts into their subject-specific courses. However, many teachers may lack the expertise to do so effectively or don’t know where to start. To address this challenge, we have developed the AI Course Design Planning Framework to help instructors structure their teaching of domain-specific AI skills. This workshop aimed to equip participants with an understanding of the framework and its application to their courses. The workshop was designed for instructors in engineering education who are interested in interdisciplinary teaching and teaching about AI in the context of their domain. Throughout the workshop, participants worked hands-on in groups with the framework, applied it to their intended courses and reflected on the use. The workshop revealed challenges in defining domain-specific AI use cases and assessing learners\u27 skills and instructors\u27 competencies. At the same time, participants found the framework effective in early course development. Overall, the results of the workshop highlight the need for AI integration in engineering education and equipping educators with effective tools and training. It is clear that further efforts are needed to fully embrace AI in engineering education
PredProp: Bidirectional Stochastic Optimization with Precision Weighted Predictive Coding
We present PredProp, a method for bidirectional, parallel and local
optimisation of weights, activities and precision in neural networks. PredProp
jointly addresses inference and learning, scales learning rates dynamically and
weights gradients by the curvature of the loss function by optimizing
prediction error precision. PredProp optimizes network parameters with
Stochastic Gradient Descent and error forward propagation based strictly on
prediction errors and variables locally available to each layer. Neighboring
layers optimise shared activity variables so that prediction errors can
propagate forward in the network, while predictions propagate backwards. This
process minimises the negative Free Energy, or evidence lower bound of the
entire network. We show that networks trained with PredProp resemble gradient
based predictive coding when the number of weights between neighboring activity
variables is one. In contrast to related work, PredProp generalizes towards
backward connections of arbitrary depth and optimizes precision for any deep
network architecture. Due to the analogy between prediction error precision and
the Fisher information for each layer, PredProp implements a form of Natural
Gradient Descent. When optimizing DNN models, layer-wise PredProp renders the
model a bidirectional predictive coding network. Alternatively DNNs can
parameterize the weights between two activity variables. We evaluate PredProp
for dense DNNs on simple inference, learning and combined tasks. We show that,
without an explicit sampling step in the network, PredProp implements a form of
variational inference that allows to learn disentangled embeddings from low
amounts of data and leave evaluation on more complex tasks and datasets to
future work
Transfer Learning for Speech Recognition on a Budget
End-to-end training of automated speech recognition (ASR) systems requires
massive data and compute resources. We explore transfer learning based on model
adaptation as an approach for training ASR models under constrained GPU memory,
throughput and training data. We conduct several systematic experiments
adapting a Wav2Letter convolutional neural network originally trained for
English ASR to the German language. We show that this technique allows faster
training on consumer-grade resources while requiring less training data in
order to achieve the same accuracy, thereby lowering the cost of training ASR
models in other languages. Model introspection revealed that small adaptations
to the network's weights were sufficient for good performance, especially for
inner layers.Comment: Accepted for 2nd ACL Workshop on Representation Learning for NL
PI-RADS v2 Compliant Automated Segmentation of Prostate Zones Using co-training Motivated Multi-task Dual-Path CNN
The detailed images produced by Magnetic Resonance Imaging (MRI) provide
life-critical information for the diagnosis and treatment of prostate cancer.
To provide standardized acquisition, interpretation and usage of the complex
MRI images, the PI-RADS v2 guideline was proposed. An automated segmentation
following the guideline facilitates consistent and precise lesion detection,
staging and treatment. The guideline recommends a division of the prostate into
four zones, PZ (peripheral zone), TZ (transition zone), DPU (distal prostatic
urethra) and AFS (anterior fibromuscular stroma). Not every zone shares a
boundary with the others and is present in every slice. Further, the
representations captured by a single model might not suffice for all zones.
This motivated us to design a dual-branch convolutional neural network (CNN),
where each branch captures the representations of the connected zones
separately. Further, the representations from different branches act
complementary to each other at the second stage of training, where they are
fine-tuned through an unsupervised loss. The loss penalises the difference in
predictions from the two branches for the same class. We also incorporate
multi-task learning in our framework to further improve the segmentation
accuracy. The proposed approach improves the segmentation accuracy of the
baseline (mean absolute symmetric distance) by 7.56%, 11.00%, 58.43% and 19.67%
for PZ, TZ, DPU and AFS zones respectively.Comment: Authors Arnab Das and Suhita Ghosh contributed equally. Submitted in
ISBI 202
A Systematic Comparison of Music Similarity Adaptation Approaches
In order to support individual user perspectives and different retrieval tasks, music similarity can no longer be considered as a static element of Music Information Retrieval (MIR) systems. Various approaches have been proposed recently that allow dynamic adaptation of music similarity measures. This paper provides a systematic comparison of algorithms for metric learning and higher-level facet distance weighting on the MagnaTagATune dataset. A crossvalidation variant taking into account clip availability is presented. Applied on user generated similarity data, its effect on adaptation performance is analyzed. Special attention is paid to the amount of training data necessary for making similarity predictions on unknown data, the number of model parameters and the amount of information available about the music itself. 1
Improving Voice Conversion for Dissimilar Speakers Using Perceptual Losses
The rising trend of using voice as a means of interacting with smart devices
has sparked worries over the protection of users' privacy and data security.
These concerns have become more pressing, especially after the European Union's
adoption of the General Data Protection Regulation (GDPR). The information
contained in an utterance encompasses critical personal details about the
speaker, such as their age, gender, socio-cultural origins and more. If there
is a security breach and the data is compromised, attackers may utilise the
speech data to circumvent the speaker verification systems or imitate
authorised users. Therefore, it is pertinent to anonymise the speech data
before being shared across devices, such that the source speaker of the
utterance cannot be traced. Voice conversion (VC) can be used to achieve speech
anonymisation, which involves altering the speaker's characteristics while
preserving the linguistic content.Comment: Accepted in The German Annual Conference on Acoustics 2023 (DAGA
- …