15 research outputs found
Acylated 2-(N-arylaminomethylene)benzo[b]thiophene-3(2H)-Ones: Molecular Switches with Varying Migrants and Substituents
Synthesis and properties of photochromic acylated 2-(N-arylaminomethylene)benzo[b]thiophene-3(2H)-ones are described. Their structure largely depends on the nature of acyl migrant and in a less degree on N-aryl substituent
Efficient Hardware Architectures for 1D- and MD-LSTM Networks
Recurrent Neural Networks, in particular One-dimensional and Multidimensional Long Short-Term Memory (1D-LSTM and MD-LSTM) have achieved state-of-the-art classification accuracy in many applications such as machine translation, image caption generation, handwritten text recognition, medical imaging and many more. However, high classification accuracy comes at high compute, storage, and memory bandwidth requirements, which make their deployment challenging, especially for energy-constrained platforms such as portable devices. In comparison to CNNs, not so many investigations exist on efficient hardware implementations for 1D-LSTM especially under energy constraints, and there is no research publication on hardware architecture for MD-LSTM. In this article, we present two novel architectures for LSTM inference: a hardware architecture for MD-LSTM, and a DRAM-based Processing-in-Memory (DRAM-PIM) hardware architecture for 1D-LSTM. We present for the first time a hardware architecture for MD-LSTM, and show a trade-off analysis for accuracy and hardware cost for various precisions. We implement the new architecture as an FPGA-based accelerator that outperforms NVIDIA K80 GPU implementation in terms of runtime by up to 84× and energy efficiency by up to 1238× for a challenging dataset for historical document image binarization from DIBCO 2017 contest, and a well known MNIST dataset for handwritten digits recognition. Our accelerator demonstrates highest accuracy and comparable throughput in comparison to state-of-the-art FPGA-based implementations of multilayer perceptron for MNIST dataset. Furthermore, we present a new DRAM-PIM architecture for 1D-LSTM targeting energy efficient compute platforms such as portable devices. The DRAM-PIM architecture integrates the computation units in a close proximity to the DRAM cells in order to maximize the data parallelism and energy efficiency. The proposed DRAM-PIM design is 16.19 × more energy efficient as compared to FPGA implementation. The total chip area overhead of this design is 18 % compared to a commodity 8 Gb DRAM chip. Our experiments show that the DRAM-PIM implementation delivers a throughput of 1309.16 GOp/s for an optical character recognition application
Embedded Face Recognition for Personalized Services in the Assistive Robotics
Recently, the field of assistive robotics has drawn much attention in the health care sector. In combination with modern machine learning-supported person recognition systems, they can deliver highly personalized services. However, common algorithms for person recognition such as convolutional neural networks (CNNs) consume high amounts of power and show low energy efficiency when executed on general-purpose computing platforms.
In this paper, we present our hardware architecture and field programmable gate array (FPGA) accelerator to enable on-device person recognition in the context of assistive robotics. Therefore, we optimize a neural network based on the SqueezeNet topology and implement it on an FPGA for a high degree of flexibility and reconfigurability. By pruning redundant filters and quantization of weights and activations, we are able to find a well-fitting neural network that achieves a high identification accuracy of 84%. On a Xilinx Zynq Ultra96v2, we achieve a power consumption of 4.8 W, a latency of 31 ms and an efficiency of 6.738 FPS/W. Our results outperform the latency by 1.6x compared to recent person recognition systems in assistive robots and energy efficiency by 1.7x for embedded face recognition, respectively
iDocChip: A Configurable Hardware Architecture for Historical Document Image Processing
In recent years, ◂...▸optical character recognition (OCR) systems have been used to digitally preserve historical archives. To transcribe historical archives into a machine-readable form, first, the documents are scanned, then an OCR is applied. In order to digitize documents without the need to remove them from where they are archived, it is valuable to have a portable device that combines scanning and OCR capabilities. Nowadays, there exist many commercial and open-source document digitization techniques, which are optimized for contemporary documents. However, they fail to give sufficient text recognition accuracy for transcribing historical documents due to the severe quality degradation of such documents. On the contrary, the anyOCR system, which is designed to mainly digitize historical documents, provides high accuracy. However, this comes at a cost of high computational complexity resulting in long runtime and high power consumption. To tackle these challenges, we propose a low power energy-efficient accelerator with real-time capabilities called iDocChip, which is a configurable hybrid hardware-software programmable ◂...▸System-on-Chip (SoC) based on anyOCR for digitizing historical documents. In this paper, we focus on one of the most crucial processing steps in the anyOCR system: Text and Image Segmentation, which makes use of a multi-resolution morphology-based algorithm. Moreover, an optimized FPGA-based hybrid architecture of this anyOCR step along with its optimized software implementations are presented. We demonstrate our results on multiple embedded and general-purpose platforms with respect to runtime and power consumption. The resulting hardware accelerator outperforms the existing anyOCR by 6.2×, while achieving 207× higher energy-efficiency and maintaining its high accuracy
iDocChip: A Configurable Hardware Accelerator for an End-to-End Historical Document Image Processing
In recent years, there has been an increasing demand to digitize and electronically access historical records. Optical character recognition (OCR) is typically applied to scanned historical archives to transcribe them from document images into machine-readable texts. Many libraries offer special stationary equipment for scanning historical documents. However, to digitize these records without removing them from where they are archived, portable devices that combine scanning and OCR capabilities are required. An existing end-to-end OCR software called anyOCR achieves high recognition accuracy for historical documents. However, it is unsuitable for portable devices, as it exhibits high computational complexity resulting in long runtime and high power consumption. Therefore, we have designed and implemented a configurable hardware-software programmable SoC called iDocChip that makes use of anyOCR techniques to achieve high accuracy. As a low-power and energy-efficient system with real-time capabilities, the iDocChip delivers the required portability. In this paper, we present the hybrid CPU-FPGA architecture of iDocChip along with the optimized software implementations of the anyOCR. We demonstrate our results on multiple platforms with respect to runtime and power consumption. The iDocChip system outperforms the existing anyOCR by 44× while achieving 2201× higher energy efficiency and a 3.8% increase in recognition accuracy
An In-DRAM Neural Network Processing Engine
Many advanced neural network inference engines are bounded by the available memory bandwidth. The conventional approach to address this issue is to employ high bandwidth memory devices or to adapt data compression techniques (reduced precision, sparse weight matrices). Alternatively, an emerging approach to bridge the memory-computation gap and to exploit extreme data parallelism is Processing in Memory (PIM). The close proximity of the computation units to the memory cells reduces the amount of external data transactions and it increases the overall energy efficiency of the memory system. In this work, we present a novel PIM based Binary Weighted Network (BWN) inference accelerator design that is inline with the commodity Dynamic Random Access Memory (DRAM) design and process. In order to exploit data parallelism and minimize energy, the proposed architecture integrates the basic BWN computation units at the output of the Primary Sense Amplifiers (PSAs) and the rest of the substantial logic near the Secondary Sense Amplifiers (SSAs). The power and area values are obtained at sub-array (SA) level using exhaustive circuit level simulations and full-custom layout. The proposed architecture results in an area overhead of 25 % compared to a commodity 8 Gb DRAM and delivers a throughput of 63.59 FPS (Frames per Second) for AlexNet. We also demonstrate that our architecture is extremely energy efficient, 7.25× higher FPS/W, as compared to previous works
Synthesis of photo- and ionochromic N-acylated 2-(aminomethylene)benzo[b]thiophene-3(2Н)-ones with a terminal phenanthroline group
A series of novel photo- and ionochromic N-acylated 2-(aminomethylene)benzo[b]thiophene-3(2Н)-ones with a terminal phenanthroline receptor substituent was synthesized. Upon irradiation in acetonitrile or DMSO with light of 436 nm, they underwent Z–E isomerization of the C=C bond, followed by very fast N→O migration of the acyl group and the formation of nonemissive O-acylated isomers. These isomers were isolated preparatively and fully characterized by IR, 1H, and 13C NMR spectroscopy as well as HRMS and XRD methods. The reverse thermal reaction was catalyzed by protonic acids. N-Acylated compounds exclusively with Fe2+ formed nonfluorescent complexes with a contrast naked-eye effect: a color change of the solutions from yellow to dark orange. Subsequent selective interaction with AcO− led to the restoration of the initial absorption and emission properties. Thus, the obtained compounds represent dual-mode “on–off–on” switches of optical and fluorescent properties under sequential exposure to light and H+ or sequential addition of Fe2+ and AcO− ions