2,240 research outputs found
Chipmunk: A Systolically Scalable 0.9 mm, 3.08 Gop/s/mW @ 1.2 mW Accelerator for Near-Sensor Recurrent Neural Network Inference
Recurrent neural networks (RNNs) are state-of-the-art in voice
awareness/understanding and speech recognition. On-device computation of RNNs
on low-power mobile and wearable devices would be key to applications such as
zero-latency voice-based human-machine interfaces. Here we present Chipmunk, a
small (<1 mm) hardware accelerator for Long-Short Term Memory RNNs in UMC
65 nm technology capable to operate at a measured peak efficiency up to 3.08
Gop/s/mW at 1.24 mW peak power. To implement big RNN models without incurring
in huge memory transfer overhead, multiple Chipmunk engines can cooperate to
form a single systolic array. In this way, the Chipmunk architecture in a 75
tiles configuration can achieve real-time phoneme extraction on a demanding RNN
topology proposed by Graves et al., consuming less than 13 mW of average power
Energy-Efficient Inference Accelerator for Memory-Augmented Neural Networks on an FPGA
Memory-augmented neural networks (MANNs) are designed for question-answering
tasks. It is difficult to run a MANN effectively on accelerators designed for
other neural networks (NNs), in particular on mobile devices, because MANNs
require recurrent data paths and various types of operations related to
external memory access. We implement an accelerator for MANNs on a
field-programmable gate array (FPGA) based on a data flow architecture.
Inference times are also reduced by inference thresholding, which is a
data-based maximum inner-product search specialized for natural language tasks.
Measurements on the bAbI data show that the energy efficiency of the
accelerator (FLOPS/kJ) was higher than that of an NVIDIA TITAN V GPU by a
factor of about 125, increasing to 140 with inference thresholdingComment: Accepted to DATE 201
Artificial Intelligence in Invoice Recognition: a Systematic Literature Review
In the era marked by a flourishing economy and rapid advancements in information
technology, the proliferation of invoice data has accentuated the urgent need for
automated invoice recognition. Traditional manual methods, long relied upon for this
task, have proven to be inefficient, error-prone, and incapable of coping with the rising
volume of invoices. This research endeavours to addresses the imperative of automating
invoice recognition by exploring, assessing, and advancing cutting-edge algorithms,
techniques, and methods within the domain of Artificial Intelligence (AI).
This research conducts a comprehensive Systematic Literature Review (SLR) to
investigate Computer Vision (CV) approaches, encompassing image preprocessing,
Layout Analysis (LA), Optical Character Recognition (OCR), and Information Extraction
(IE). The objective is to provide valuable insights into these fundamental components of
invoice recognition, emphasizing their significance in achieving accuracy and efficiency.
This exploration aims to contribute to the development of more effective automated
systems for extracting information from invoices, addressing the challenges posed by
diverse formats and content.
The results indicate that in LA, the combination of Mask Region-based Convolutional
Neural Networks (M-RCNN) and Feature Pyramid Network (FPN) achieves goods
results. In OCR, algorithms like Convolutional Recurrent Neural Network (CRNN), You
Only Look Once version 4 (YOLOv4) and models inspired by M-RCNN and Faster
Region-based Convolutional Neural Network (F-RCNN) with ResNetXt-101 as the
backbone demonstrate remarkable performance. When it comes to IE, algorithms inspired
by F-RCNN and Region Proposal Network (RPN), Grid Convolutional Neural Network
(G-CNN) and Layer Graph Convolutional Networks (LGCN), and Gated Graph
Convolutional Network (GatedGCN) consistently deliver the best results
Recommended from our members
Sequence Classification Restricted Boltzmann Machines With Gated Units
For the classification of sequential data, dynamic Bayesian networks and recurrent neural networks (RNNs) are the preferred models. While the former can explicitly model the temporal dependences between the variables, and the latter have the capability of learning representations. The recurrent temporal restricted Boltzmann machine (RTRBM) is a model that combines these two features. However, learning and inference in RTRBMs can be difficult because of the exponential nature of its gradient computations when maximizing log likelihoods. In this article, first, we address this intractability by optimizing a conditional rather than a joint probability distribution when performing sequence classification. This results in the ``sequence classification restricted Boltzmann machine'' (SCRBM). Second, we introduce gated SCRBMs (gSCRBMs), which use an information processing gate, as an integration of SCRBMs with long short-term memory (LSTM) models. In the experiments reported in this article, we evaluate the proposed models on optical character recognition, chunking, and multiresident activity recognition in smart homes. The experimental results show that gSCRBMs achieve the performance comparable to that of the state of the art in all three tasks. gSCRBMs require far fewer parameters in comparison with other recurrent networks with memory gates, in particular, LSTMs and gated recurrent units (GRUs)
Enhancing Automation with Label Defect Detection and Content Parsing Algorithms
The stable operation of power transmission and distribution is closely related to the overall performance and construction quality of circuit breakers. Focusing on circuit breakers as the research subject, we propose a machine vision method for automated defect detection, which can be applied in intelligent robots to improve detection efficiency, reduce costs, and address the issues related to performance and assembly quality. Based on the LeNet-5 convolutional neural network, a method for the detection of character defects on labels is proposed. This method is then combined with squeezing and excitation networks to achieve more precise classification with a feature graph mechanism. The experimental results show the accuracy of the LeNet-CB model can reach up to 99.75%, while the average time for single character detection is 17.9 milliseconds. Although the LeNet-SE model demonstrates certain limitations in handling some easily confused characters, it maintains an average accuracy of 98.95%. Through further optimization, a label content detection method based on the LSTM framework is constructed, with an average accuracy of 99.57%, and an average detection time of 84 milliseconds. Overall, the system meets the detection accuracy requirements and delivers a rapid response. making the results of this research a meaningful contribution to the practical foundation for ongoing improvements in robot intelligence and machine vision
- …