46 research outputs found
Learning audio and image representations with bio-inspired trainable feature extractors
Recent advancements in pattern recognition and signal processing concern the
automatic learning of data representations from labeled training samples.
Typical approaches are based on deep learning and convolutional neural
networks, which require large amount of labeled training samples. In this work,
we propose novel feature extractors that can be used to learn the
representation of single prototype samples in an automatic configuration
process. We employ the proposed feature extractors in applications of audio and
image processing, and show their effectiveness on benchmark data sets.Comment: Accepted for publication in the journal "Eleectronic Letters on
Computer Vision and Image Understanding
Learning sound representations using trainable COPE feature extractors
Sound analysis research has mainly been focused on speech and music
processing. The deployed methodologies are not suitable for analysis of sounds
with varying background noise, in many cases with very low signal-to-noise
ratio (SNR). In this paper, we present a method for the detection of patterns
of interest in audio signals. We propose novel trainable feature extractors,
which we call COPE (Combination of Peaks of Energy). The structure of a COPE
feature extractor is determined using a single prototype sound pattern in an
automatic configuration process, which is a type of representation learning. We
construct a set of COPE feature extractors, configured on a number of training
patterns. Then we take their responses to build feature vectors that we use in
combination with a classifier to detect and classify patterns of interest in
audio signals. We carried out experiments on four public data sets: MIVIA audio
events, MIVIA road events, ESC-10 and TU Dortmund data sets. The results that
we achieved (recognition rate equal to 91.71% on the MIVIA audio events, 94% on
the MIVIA road events, 81.25% on the ESC-10 and 94.27% on the TU Dortmund)
demonstrate the effectiveness of the proposed method and are higher than the
ones obtained by other existing approaches. The COPE feature extractors have
high robustness to variations of SNR. Real-time performance is achieved even
when the value of a large number of features is computed.Comment: Accepted for publication in Pattern Recognitio
Learning skeleton representations for human action recognition
Automatic interpretation of human actions gained strong interest among researchers in patter recognition and computer vision because of its wide range of applications, such as in social and home robotics, elderly people health care, surveillance, among others. In this paper, we propose a method for recognition of human actions by analysis of skeleton poses. The method that we propose is based on novel trainable feature extractors, which can learn the representation of prototype skeleton examples and can be employed to recognize skeleton poses of interest. We combine the proposed feature extractors with an approach for classification of pose sequences based on string kernels. We carried out experiments on three benchmark data sets (MIVIA-S, MSRSDA and MHAD) and the results that we achieved are comparable or higher than the ones obtained by other existing methods. A further important contribution of this work is the MIVIA-S dataset, that we collected and made publicly available
Machine vision based system for flower counting in strawberry plants
Background: For strawberry production, accurate yield prediction is very important to help growers increase their profit by efficiently managing their harvesting operation and setting their contracts with buyers. Strawberry plants produce flowers and fruits simultaneously throughout the season. Strawberry flowers are white in color with a yellow pollen at the center, which later becomes a fruit. Strawberry yield can be estimated by counting the number of flowers in a field in advance of harvesting. The objective of this project is to count the number of flowers using image processing techniques, create a map of flower counts using gee-tagging and provide farmers with an estimate of the yield in a given area.
Methods: Strawberry flowers could be at different stages of maturation during imaging. We pre-process images using edge-preserving smoothing filter to remove noise without removing fine features. The next stage involves segmentation of flowers from the background. Since flowers are brighter than most other components of plants, simple thresholding with segmentation algorithm will produce candidate pixels. Then flower detection will be conducted using traditional feature engineering along with a classifier such as Histogram of Oriented Gradients, Wavelet Transform, Local Binary Patterns, and the Deep Learning based techniques.
Results: Once flowers are detected, the number of flowers is counted to provide farmers with an estimate of yield and variability at different locations in the field.
Discussions: One of the biggest challenges with outdoor imaging is the variable lighting conditions. We propose a camera mounted autonomous system to go over rows of strawberry plants to capture images with geo-tags. Cameras are positioned to capture images from different angles to capture occluded flowers.
Conclusion: A novel image processing method for accurate strawberry yield prediction is proposed by counting the number of flowers from images for efficient crop management
Feature learning for information-extreme classifier
The feature learning algorithm for
information-extreme classifier by clustering of Fast
Retina Keypoint binary descriptor, calculated for
local features, and usage of spatial pyramid kernel for
increasing noise immunity and informativeness of
feature representation are considered. Proposed a
method of parameters optimization for feature
extractor and decision rules based on multi-level
coarse features coding using information criterion and
population-based search algorithm
Feature learning for information-extreme classifier
The feature learning algorithm for
information-extreme classifier by clustering of Fast
Retina Keypoint binary descriptor, calculated for
local features, and usage of spatial pyramid kernel for
increasing noise immunity and informativeness of
feature representation are considered. Proposed a
method of parameters optimization for feature
extractor and decision rules based on multi-level
coarse features coding using information criterion and
population-based search algorithm
Feature learning for information-extreme classifier
The feature learning algorithm for
information-extreme classifier by clustering of Fast
Retina Keypoint binary descriptor, calculated for
local features, and usage of spatial pyramid kernel for
increasing noise immunity and informativeness of
feature representation are considered. Proposed a
method of parameters optimization for feature
extractor and decision rules based on multi-level
coarse features coding using information criterion and
population-based search algorithm
A data-reuse aware accelerator for large-scale convolutional networks
This paper presents a clustered SIMD accelerator template for Convolutional Networks. These networks significantly outperform other methods in detection and classification tasks in the vision domain. Due to the excessive compute and data transfer requirements these applications benefit a lot from a dedicated accelerator. The proposed accelerator reduces memory traffic by loop transformations such as tiling and fusion to merge successive layers. Although fusion can introduce redundant computations it often reduces the data transfer, and therefore can remove performance bottlenecks. The SIMD cluster is mapped to a Xilinx Zynq FPGA, which can achieve 6.4 Gops performance with a small amount of resources. The performance can be scaled by using multiple clusters
Scaling Deep Learning on GPU and Knights Landing clusters
The speed of deep neural networks training has become a big bottleneck of
deep learning research and development. For example, training GoogleNet by
ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training
process, the current deep learning systems heavily rely on the hardware
accelerators. However, these accelerators have limited on-chip memory compared
with CPUs. To handle large datasets, they need to fetch data from either CPU
memory or remote processors. We use both self-hosted Intel Knights Landing
(KNL) clusters and multi-GPU clusters as our target platforms. From an
algorithm aspect, current distributed machine learning systems are mainly
designed for cloud systems. These methods are asynchronous because of the slow
network and high fault-tolerance requirement on cloud systems. We focus on
Elastic Averaging SGD (EASGD) to design algorithms for HPC clusters. Original
EASGD used round-robin method for communication and updating. The communication
is ordered by the machine rank ID, which is inefficient on HPC clusters.
First, we redesign four efficient algorithms for HPC systems to improve
EASGD's poor scaling on clusters. Async EASGD, Async MEASGD, and Hogwild EASGD
are faster \textcolor{black}{than} their existing counterparts (Async SGD,
Async MSGD, and Hogwild SGD, resp.) in all the comparisons. Finally, we design
Sync EASGD, which ties for the best performance among all the methods while
being deterministic. In addition to the algorithmic improvements, we use some
system-algorithm codesign techniques to scale up the algorithms. By reducing
the percentage of communication from 87% to 14%, our Sync EASGD achieves 5.3x
speedup over original EASGD on the same platform. We get 91.5% weak scaling
efficiency on 4253 KNL cores, which is higher than the state-of-the-art
implementation
New Transfer Learning Approach Based on a CNN for Fault Diagnosis
Induction motors operate in difficult environments in the industry. Monitoring the performance of motors in such circumstances is significant, which can provide a reliable operation system. This paper intends to develop a new model for fault diagnosis based on the knowledge of transfer learning using the ImageNet dataset. The development of this framework provides a novel technique for the diagnosis of single and multiple induction motor faults. A transfer learning model based on a VGG-19 convolutional neural network (CNN) was implemented, which provided a quick and fast training process with higher accuracy. Thermal images with different induction motor conditions were captured with the help of an FLIR camera and applied as inputs to investigate the proposed model. The implementation of this task involved the use of a VGG-19 CNN-based pre-trained network, which provides autonomous features learning based on minimum human intervention. Next, a dense-connected classifier was applied to predict the true class. The experimental results confirmed the robustness and reliability of the developed technique, which was successfully able to classify the induction motor faults, achieving a classification accuracy of 99.8%. The use of a VGG-19 network allowed the attributes to be automatically extracted and associated with the decision-making part. Furthermore, this model was further compared with other applications based on related topics; it successfully proved its superiority and robustness