177 research outputs found
Predictive Coding for Dynamic Visual Processing: Development of Functional Hierarchy in a Multiple Spatio-Temporal Scales RNN Model
The current paper proposes a novel predictive coding type neural network
model, the predictive multiple spatio-temporal scales recurrent neural network
(P-MSTRNN). The P-MSTRNN learns to predict visually perceived human whole-body
cyclic movement patterns by exploiting multiscale spatio-temporal constraints
imposed on network dynamics by using differently sized receptive fields as well
as different time constant values for each layer. After learning, the network
becomes able to proactively imitate target movement patterns by inferring or
recognizing corresponding intentions by means of the regression of prediction
error. Results show that the network can develop a functional hierarchy by
developing a different type of dynamic structure at each layer. The paper
examines how model performance during pattern generation as well as predictive
imitation varies depending on the stage of learning. The number of limit cycle
attractors corresponding to target movement patterns increases as learning
proceeds. And, transient dynamics developing early in the learning process
successfully perform pattern generation and predictive imitation tasks. The
paper concludes that exploitation of transient dynamics facilitates successful
task performance during early learning periods.Comment: Accepted in Neural Computation (MIT press
Higher-order Neural Additive Models: An Interpretable Machine Learning Model with Feature Interactions
Black-box models, such as deep neural networks, exhibit superior predictive
performances, but understanding their behavior is notoriously difficult. Many
explainable artificial intelligence methods have been proposed to reveal the
decision-making processes of black box models. However, their applications in
high-stakes domains remain limited. Recently proposed neural additive models
(NAM) have achieved state-of-the-art interpretable machine learning. NAM can
provide straightforward interpretations with slight performance sacrifices
compared with multi-layer perceptron. However, NAM can only model
1-order feature interactions; thus, it cannot capture the
co-relationships between input features. To overcome this problem, we propose a
novel interpretable machine learning method called higher-order neural additive
models (HONAM) and a feature interaction method for high interpretability.
HONAM can model arbitrary orders of feature interactions. Therefore, it can
provide the high predictive performance and interpretability that high-stakes
domains need. In addition, we propose a novel hidden unit to effectively learn
sharp-shape functions. We conducted experiments using various real-world
datasets to examine the effectiveness of HONAM. Furthermore, we demonstrate
that HONAM can achieve fair AI with a slight performance sacrifice. The source
code for HONAM is publicly available
Generating Goal-directed Visuomotor Plans with Supervised Learning using a Predictive Coding Deep Visuomotor Recurrent Neural Network
The ability to plan and visualize object manipulation in advance is vital for both humans and robots to smoothly reach a desired goal state. In this work, we demonstrate how our predictive coding based deep visuomotor recurrent neural network (PDVMRNN) can generate plans for a robot to manipulate objects based on a visual goal. A Tokyo Robotics Torobo Arm robot and a basic USB camera were used to record visuo-proprioceptive sequences of object manipulation. Although limitations in resolution resulted in lower success rates when plans were executed with the robot, our model is able to generate long predictions from novel start and goal states based on the learned patterns
A Dual-Stream Neural Network Explains the Functional Segregation of Dorsal and Ventral Visual Pathways in Human Brains
The human visual system uses two parallel pathways for spatial processing and
object recognition. In contrast, computer vision systems tend to use a single
feedforward pathway, rendering them less robust, adaptive, or efficient than
human vision. To bridge this gap, we developed a dual-stream vision model
inspired by the human eyes and brain. At the input level, the model samples two
complementary visual patterns to mimic how the human eyes use magnocellular and
parvocellular retinal ganglion cells to separate retinal inputs to the brain.
At the backend, the model processes the separate input patterns through two
branches of convolutional neural networks (CNN) to mimic how the human brain
uses the dorsal and ventral cortical pathways for parallel visual processing.
The first branch (WhereCNN) samples a global view to learn spatial attention
and control eye movements. The second branch (WhatCNN) samples a local view to
represent the object around the fixation. Over time, the two branches interact
recurrently to build a scene representation from moving fixations. We compared
this model with the human brains processing the same movie and evaluated their
functional alignment by linear transformation. The WhereCNN and WhatCNN
branches were found to differentially match the dorsal and ventral pathways of
the visual cortex, respectively, primarily due to their different learning
objectives. These model-based results lead us to speculate that the distinct
responses and representations of the ventral and dorsal streams are more
influenced by their distinct goals in visual attention and object recognition
than by their specific bias or selectivity in retinal inputs. This dual-stream
model takes a further step in brain-inspired computer vision, enabling parallel
neural networks to actively explore and understand the visual surroundings
Achieving Synergy in Cognitive Behavior of Humanoids via Deep Learning of Dynamic Visuo-Motor-Attentional Coordination
The current study examines how adequate coordination among different
cognitive processes including visual recognition, attention switching, action
preparation and generation can be developed via learning of robots by
introducing a novel model, the Visuo-Motor Deep Dynamic Neural Network (VMDNN).
The proposed model is built on coupling of a dynamic vision network, a motor
generation network, and a higher level network allocated on top of these two.
The simulation experiments using the iCub simulator were conducted for
cognitive tasks including visual object manipulation responding to human
gestures. The results showed that synergetic coordination can be developed via
iterative learning through the whole network when spatio-temporal hierarchy and
temporal one can be self-organized in the visual pathway and in the motor
pathway, respectively, such that the higher level can manipulate them with
abstraction.Comment: submitted to 2015 IEEE-RAS International Conference on Humanoid
Robot
A novel approach for holographic 3D content generation without depth map
In preparation for observing holographic 3D content, acquiring a set of RGB
color and depth map images per scene is necessary to generate
computer-generated holograms (CGHs) when using the fast Fourier transform (FFT)
algorithm. However, in real-world situations, these paired formats of RGB color
and depth map images are not always fully available. We propose a deep
learning-based method to synthesize the volumetric digital holograms using only
the given RGB image, so that we can overcome environments where RGB color and
depth map images are partially provided. The proposed method uses only the
input of RGB image to estimate its depth map and then generate its CGH
sequentially. Through experiments, we demonstrate that the volumetric hologram
generated through our proposed model is more accurate than that of competitive
models, under the situation that only RGB color data can be provided
The cross crypto scheme cipher integration for securing SCADA component communication
Critical Infrastructures became more vulnerable to attacks from adversaries as SCADA systems become connected to the Internet. The open standards for SCADA Communications make it very easy for attackers to gain in-depth knowledge about the working and operations of SCADA networks. A number of Intenrnet SCADA security issues were raised that have compromised the authenticity, confidentiality, integrity and non-repudiation of information transfer between SCADA Components. This paper presents an integration of the Cross Crypto Scheme Cipher to secure communications for SCADA components. The proposed scheme integrates both the best features of symmetric and asymmetric encryptiontechniques. It also utilizes the MD5 hashing algorithm to ensure the integrity of information being transmitted
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
We introduce QUICK, a group of novel optimized CUDA kernels for the efficient
inference of quantized Large Language Models (LLMs). QUICK addresses the shared
memory bank-conflict problem of state-of-the-art mixed precision matrix
multiplication kernels. Our method interleaves the quantized weight matrices of
LLMs offline to skip the shared memory write-back after the dequantization. We
demonstrate up to 1.91x speedup over existing kernels of AutoAWQ on larger
batches and up to 1.94x throughput gain on representative LLM models on various
NVIDIA GPU devices.Comment: 9 pages, 8 figure
- …