1,020 research outputs found

    Complex Neural Networks for Audio

    Get PDF
    Audio is represented in two mathematically equivalent ways: the real-valued time domain (i.e., waveform) and the complex-valued frequency domain (i.e., spectrum). There are advantages to the frequency-domain representation, e.g., the human auditory system is known to process sound in the frequency-domain. Furthermore, linear time-invariant systems are convolved with sources in the time-domain, whereas they may be factorized in the frequency-domain. Neural networks have become rather useful when applied to audio tasks such as machine listening and audio synthesis, which are related by their dependencies on high quality acoustic models. They ideally encapsulate fine-scale temporal structure, such as that encoded in the phase of frequency-domain audio, yet there are no authoritative deep learning methods for complex audio. This manuscript is dedicated to addressing the shortcoming. Chapter 2 motivates complex networks by their affinity with complex-domain audio, while Chapter 3 contributes methods for building and optimizing complex networks. We show that the naive implementation of Adam optimization is incorrect for complex random variables and show that selection of input and output representation has a significant impact on the performance of a complex network. Experimental results with novel complex neural architectures are provided in the second half of this manuscript. Chapter 4 introduces a complex model for binaural audio source localization. We show that, like humans, the complex model can generalize to different anatomical filters, which is important in the context of machine listening. The complex model\u27s performance is better than that of the real-valued models, as well as real- and complex-valued baselines. Chapter 5 proposes a two-stage method for speech enhancement. In the first stage, a complex-valued stochastic autoencoder projects complex vectors to a discrete space. In the second stage, long-term temporal dependencies are modeled in the discrete space. The autoencoder raises the performance ceiling for state of the art speech enhancement, but the dynamic enhancement model does not outperform other baselines. We discuss areas for improvement and note that the complex Adam optimizer improves training convergence over the naive implementation

    Applying the Free-Energy Principle to Complex Adaptive Systems

    Get PDF
    The free energy principle is a mathematical theory of the behaviour of self-organising systems that originally gained prominence as a unified model of the brain. Since then, the theory has been applied to a plethora of biological phenomena, extending from single-celled and multicellular organisms through to niche construction and human culture, and even the emergence of life itself. The free energy principle tells us that perception and action operate synergistically to minimize an organism’s exposure to surprising biological states, which are more likely to lead to decay. A key corollary of this hypothesis is active inference—the idea that all behavior involves the selective sampling of sensory data so that we experience what we expect to (in order to avoid surprises). Simply put, we act upon the world to fulfill our expectations. It is now widely recognized that the implications of the free energy principle for our understanding of the human mind and behavior are far-reaching and profound. To date, however, its capacity to extend beyond our brain—to more generally explain living and other complex adaptive systems—has only just begun to be explored. The aim of this collection is to showcase the breadth of the free energy principle as a unified theory of complex adaptive systems—conscious, social, living, or not

    Spatio-temporally efficient coding: A computational principle of biological neural networks

    Get PDF
    Department of Biomedical Engineering (Human Factors Engineering)One of the major goals of neuroscience is to understand how the external world is represented in the brain. This is a neural coding problem: the coding from the external world to its neural representations. There are two different kinds of problems with neural coding. One is to study the types of neuronal activity that represent the external world. Representative examples here are rate coding and temporal coding. In this study, we will present the spike distance method that reads temporal coding-related information from neural data. Another is to study what principles make such neural representations possible. This is an approach to the computational principle and the main topic of the present study. The brain sensory system has hierarchical structures. It is important to find the principles assigning functions to the hierarchical structures. On the one hand, the hierarchical structures of the brain sensory system contain both bottom-up and top-down pathways. In this bidirectional hierarchical structure, two types of neuronal noise are generated. One of them is noise generated as neural information fluctuates across the hierarchy according to the initial condition of the neural response, even if the external sensory input is static. Another is noise, precisely error, caused by coding different information in each hierarchy because of the transmission delay of information when external sensory input is dynamic. Despite these noise problems, it seems that sensory information processing is performed without any major problems in the sensory system of the real brain. Therefore, a neural coding principle that can overcome these noise problems is neededHow can the brain overcome these noise problems? Efficient coding is one of representative neural coding principles, however, existing efficient coding does not take into account these noise problems. To treat these noise problems, as one of efficient coding principles, we devised spatio-temporal efficient coding, which was inspired by the efficient use of given space and time resources, to optimize bidirectional information transmission on the hierarchical structures. This optimization is to learn smooth neural responses on time domain. In simulations, we showed spatio-temporal efficient coding was able to solve above two noise problems. We expect that spatio-temporal efficient coding helps us to understand how the brain computes.ope

    Analysis and Control of Mobile Robots in Various Environmental Conditions

    Get PDF
    The world sees new inventions each day, made to make the lifestyle of humans more easy and luxurious. In such global scenario, the robots have proved themselves to be an invention of great importance. The robots are being used in almost each and every field of the human world. Continuous studies are being done on them to make them simpler and easier to work with. All fields are being unraveled to make them work better in the human world without human interference. We focus on the navigation field of these mobile robots. The aim of this thesis is to find the controller that produces the most optimal path for the robot to reach its destination without colliding or damaging itself or the environment. The techniques like Fuzzy logic, Type 2 fuzzy logic, Neural networks and Artificial bee colony have been discussed and experimented to find the best controller that could find the most optimal path for the robot to reach its goal position. Simulation and Experiments have been done alike to find out the optimal path for the robot

    ForeNet: fourier recurrent neural networks for time series prediction.

    Get PDF
    Ying-Qian Zhang.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 115-124).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Background --- p.1Chapter 1.2 --- Objective --- p.2Chapter 1.3 --- Contributions --- p.3Chapter 1.4 --- Thesis Overview --- p.4Chapter 2 --- Literature Review --- p.6Chapter 2.1 --- Takens' Theorem --- p.6Chapter 2.2 --- Linear Models for Prediction --- p.7Chapter 2.2.1 --- Autoregressive Model --- p.7Chapter 2.2.2 --- Moving Average Model --- p.8Chapter 2.2.3 --- Autoregressive-moving Average Model --- p.9Chapter 2.2.4 --- Fitting a Linear Model to a Given Time Series --- p.9Chapter 2.2.5 --- State-space Reconstruction --- p.10Chapter 2.3 --- Neural Network Models for Time Series Processing --- p.11Chapter 2.3.1 --- Feed-forward Neural Networks --- p.11Chapter 2.3.2 --- Recurrent Neural Networks --- p.14Chapter 2.3.3 --- Training Algorithms for Recurrent Networks --- p.18Chapter 2.4 --- Combining Neural Networks and other approximation techniques --- p.22Chapter 3 --- ForeNet: Model and Representation --- p.24Chapter 3.1 --- Fourier Recursive Prediction Equation --- p.24Chapter 3.1.1 --- Fourier Analysis of Time Series --- p.25Chapter 3.1.2 --- Recursive Form --- p.25Chapter 3.2 --- Fourier Recurrent Neural Network Model (ForeNet) --- p.27Chapter 3.2.1 --- Neural Networks Representation --- p.28Chapter 3.2.2 --- Architecture of ForeNet --- p.29Chapter 4 --- ForeNet: Implementation --- p.32Chapter 4.1 --- Improvement on ForeNet --- p.33Chapter 4.1.1 --- Number of Hidden Neurons --- p.33Chapter 4.1.2 --- Real-valued Outputs --- p.34Chapter 4.2 --- Parameters Initialization --- p.37Chapter 4.3 --- Application of ForeNet: the Process of Time Series Prediction --- p.38Chapter 4.4 --- Some Implications --- p.39Chapter 5 --- ForeNet: Initialization --- p.40Chapter 5.1 --- Unfolded Form of ForeNet --- p.40Chapter 5.2 --- Coefficients Analysis --- p.43Chapter 5.2.1 --- "Analysis of the Coefficients Set, vn " --- p.43Chapter 5.2.2 --- "Analysis of the Coefficients Set, μn(d) " --- p.44Chapter 5.3 --- Experiments of ForeNet Initialization --- p.47Chapter 5.3.1 --- Objective and Experiment Setting --- p.47Chapter 5.3.2 --- Prediction of Sunspot Series --- p.49Chapter 5.3.3 --- Prediction of Mackey-Glass Series --- p.53Chapter 5.3.4 --- Prediction of Laser Data --- p.56Chapter 5.3.5 --- Three More Series --- p.59Chapter 5.4 --- Some Implications on the Proposed Initialization Method --- p.63Chapter 6 --- ForeNet: Learning Algorithms --- p.67Chapter 6.1 --- Complex Real Time Recurrent Learning (CRTRL) --- p.68Chapter 6.2 --- Batch-mode Learning --- p.70Chapter 6.3 --- Time Complexity --- p.71Chapter 6.4 --- Property Analysis and Experimental Results --- p.72Chapter 6.4.1 --- Efficient initialization:compared with random initialization --- p.74Chapter 6.4.2 --- Complex-valued network:compared with real-valued net- work --- p.78Chapter 6.4.3 --- Simple architecture:compared with ring-structure RNN . --- p.79Chapter 6.4.4 --- Linear model: compared with nonlinear ForeNet --- p.80Chapter 6.4.5 --- Small number of hidden units --- p.88Chapter 6.5 --- Comparison with Some Other Models --- p.89Chapter 6.5.1 --- Comparison with AR model --- p.91Chapter 6.5.2 --- Comparison with TDNN Networks and FIR Networks . --- p.93Chapter 6.5.3 --- Comparison to a few more results --- p.94Chapter 6.6 --- Summarization --- p.95Chapter 7 --- Learning and Prediction: On-Line Training --- p.98Chapter 7.1 --- On-Line Learning Algorithm --- p.98Chapter 7.1.1 --- Advantages and Disadvantages --- p.98Chapter 7.1.2 --- Training Process --- p.99Chapter 7.2 --- Experiments --- p.101Chapter 7.3 --- Predicting Stock Time Series --- p.105Chapter 8 --- Discussions and Conclusions --- p.109Chapter 8.1 --- Limitations of ForeNet --- p.109Chapter 8.2 --- Advantages of ForeNet --- p.111Chapter 8.3 --- Future Works --- p.112Bibliography --- p.11

    On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

    Get PDF
    Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise

    Development of New Models for Vision-Based Human Activity Recognition

    Get PDF
    Els mètodes de reconeixement d'accions permeten als sistemes intel·ligents reconèixer accions humanes en vídeos de la vida quotidiana. No obstant, molts mètodes de reconeixement d'accions donen taxes notables d’error de classificació degut a les grans variacions dins dels vídeos de la mateixa classe i als canvis en el punt de vista, l'escala i el fons. Per reduir la classificació incorrecta , proposem un nou mètode de representació de vídeo que captura l'evolució temporal de l'acció que succeeix en el vídeo, un nou mètode per a la segmentació de mans i un nou mètode per al reconeixement d'activitats humanes en imatges fixes.Los métodos de reconocimiento de acciones permiten que los sistemas inteligentes reconozcan acciones humanas en videos de la vida cotidiana. No obstante, muchos métodos de reconocimiento de acciones dan tasas notables de error de clasificación debido a las grandes variaciones dentro de los videos de la misma clase y los cambios en el punto de vista, la escala y el fondo. Para reducir la clasificación errónea, Łproponemos un nuevo método de representación de video que captura la evolución temporal de la acción que ocurre en el video completo, un nuevo método para la segmentación de manos y un nuevo método para el reconocimiento de actividades humanas en imágenes fijas.Action recognition methods enable intelligent systems to recognize human actions in daily life videos. However, many action recognition methods give noticeable misclassification rates due to the big variations within the videos of the same class, and the changes in viewpoint, scale and background. To reduce the misclassification rate, we propose a new video representation method that captures the temporal evolution of the action happening in the whole video, a new method for human hands segmentation and a new method for human activity recognition in still images
    corecore