523 research outputs found

    Dense semantic labeling of sub-decimeter resolution images with convolutional neural networks

    Full text link
    Semantic labeling (or pixel-level land-cover classification) in ultra-high resolution imagery (< 10cm) requires statistical models able to learn high level concepts from spatial data, with large appearance variations. Convolutional Neural Networks (CNNs) achieve this goal by learning discriminatively a hierarchy of representations of increasing abstraction. In this paper we present a CNN-based system relying on an downsample-then-upsample architecture. Specifically, it first learns a rough spatial map of high-level representations by means of convolutions and then learns to upsample them back to the original resolution by deconvolutions. By doing so, the CNN learns to densely label every pixel at the original resolution of the image. This results in many advantages, including i) state-of-the-art numerical accuracy, ii) improved geometric accuracy of predictions and iii) high efficiency at inference time. We test the proposed system on the Vaihingen and Potsdam sub-decimeter resolution datasets, involving semantic labeling of aerial images of 9cm and 5cm resolution, respectively. These datasets are composed by many large and fully annotated tiles allowing an unbiased evaluation of models making use of spatial information. We do so by comparing two standard CNN architectures to the proposed one: standard patch classification, prediction of local label patches by employing only convolutions and full patch labeling by employing deconvolutions. All the systems compare favorably or outperform a state-of-the-art baseline relying on superpixels and powerful appearance descriptors. The proposed full patch labeling CNN outperforms these models by a large margin, also showing a very appealing inference time.Comment: Accepted in IEEE Transactions on Geoscience and Remote Sensing, 201

    A Construction Kit for Efficient Low Power Neural Network Accelerator Designs

    Get PDF
    Implementing embedded neural network processing at the edge requires efficient hardware acceleration that couples high computational performance with low power consumption. Driven by the rapid evolution of network architectures and their algorithmic features, accelerator designs are constantly updated and improved. To evaluate and compare hardware design choices, designers can refer to a myriad of accelerator implementations in the literature. Surveys provide an overview of these works but are often limited to system-level and benchmark-specific performance metrics, making it difficult to quantitatively compare the individual effect of each utilized optimization technique. This complicates the evaluation of optimizations for new accelerator designs, slowing-down the research progress. This work provides a survey of neural network accelerator optimization approaches that have been used in recent works and reports their individual effects on edge processing performance. It presents the list of optimizations and their quantitative effects as a construction kit, allowing to assess the design choices for each building block separately. Reported optimizations range from up to 10'000x memory savings to 33x energy reductions, providing chip designers an overview of design choices for implementing efficient low power neural network accelerators

    Deep Learning Based Abnormal Gait Classification System Study with Heterogeneous Sensor Network

    Get PDF
    Gait is one of the important biological characteristics of the human body. Abnormal gait is mostly related to the lesion site and has been demonstrated to play a guiding role in clinical research such as medical diagnosis and disease prevention. In order to promote the research of automatic gait pattern recognition, this paper introduces the research status of abnormal gait recognition and systems analysis of the common gait recognition technologies. Based on this, two gait information extraction methods, sensor-based and vision-based, are studied, including wearable system design and deep neural network-based algorithm design. In the sensor-based study, we proposed a lower limb data acquisition system. The experiment was designed to collect acceleration signals and sEMG signals under normal and pathological gaits. Specifically, wearable hardware-based on MSP430 and upper computer software based on Labview is designed. The hardware system consists of EMG foot ring, high-precision IMU and pressure-sensitive intelligent insole. Data of 15 healthy persons and 15 hemiplegic patients during walking were collected. The classification of gait was carried out based on sEMG and the average accuracy rate can reach 92.8% for CNN. For IMU signals five kinds of abnormal gait are trained based on three models: BPNN, LSTM, and CNN. The experimental results show that the system combined with the neural network can classify different pathological gaits well, and the average accuracy rate of the six-classifications task can reach 93%. In vision-based research, by using human keypoint detection technology, we obtain the precise location of the key points through the fusion of thermal mapping and offset, thus extracts the space-time information of the key points. However, the results show that even the state-of-the-art is not good enough for replacing IMU in gait analysis and classification. The good news is the rhythm wave can be observed within 2 m, which proves that the temporal and spatial information of the key points extracted is highly correlated with the acceleration information collected by IMU, which paved the way for the visual-based abnormal gait classification algorithm.步态指人走路时表现出来的姿态,是人体重要生物特征之一。异常步态多与病变部位有关,作为反映人体健康状况和行为能力的重要特征,其被论证在医疗诊断、疾病预防等临床研究中具有指导作用。为了促进步态模式自动识别的研究,本文介绍了异常步态识别的研究现状,系统地分析了常见步态识别技术以及算法,以此为基础研究了基于传感器与基于视觉两种步态信息提取方法,内容包括可穿戴系统设计与基于深度神经网络的算法设计。 在基于传感器的研究中,本工作开发了下肢步态信息采集系统,并利用该信息采集系统设计实验,采集正常与不同病理步态下的加速度信号与肌电信号,搭建深度神经网络完成分类任务。具体的,在系统搭建部分设计了基于MSP430的可穿戴硬件设备以及基于Labview的上位机软件,该硬件系统由肌电脚环,高精度IMU以及压感智能鞋垫组成,该上位机软件接收、解包蓝牙数据并计算出步频步长等常用步态参数。 在基于运动信号与基于表面肌电的研究中,采集了15名健康人与15名偏瘫病人的步态数据,并针对表面肌电信号训练卷积神经网络进行帕金森步态的识别与分类,平均准确率可达92.8%。针对运动信号训练了反向传播神经网络,LSTM以及卷积神经网络三种模型进行五种异常步态的分类任务。实验结果表明,本工作中步态信息采集系统结合神经网络模型,可以很好地对不同病理步态进行分类,六分类平均正确率可达93%。 在基于视觉的研究中,本文利用人体关键点检测技术,首先检测出图片中的一个或多个人,接着对边界框做图像分割,接着采用全卷积resnet对每一个边界框中的人物的主要关节点做热力图并分析偏移量,最后通过热力图与偏移的融合得到关键点的精确定位。通过该算法提取了不同步态下姿态关键点时空信息,为基于视觉的步态分析系统提供了基础条件。但实验结果表明目前最高准确率的人体关键点检测算法不足以替代IMU实现步态分析与分类。但在2m之内可以观察到节律信息,证明了所提取的关键点时空信息与IMU采集的加速度信息呈现较高相关度,为基于视觉的异常步态分类算法铺平了道路

    Multimodal machine learning for intelligent mobility

    Get PDF
    Scientific problems are solved by finding the optimal solution for a specific task. Some problems can be solved analytically while other problems are solved using data driven methods. The use of digital technologies to improve the transportation of people and goods, which is referred to as intelligent mobility, is one of the principal beneficiaries of data driven solutions. Autonomous vehicles are at the heart of the developments that propel Intelligent Mobility. Due to the high dimensionality and complexities involved in real-world environments, it needs to become commonplace for intelligent mobility to use data-driven solutions. As it is near impossible to program decision making logic for every eventuality manually. While recent developments of data-driven solutions such as deep learning facilitate machines to learn effectively from large datasets, the application of techniques within safety-critical systems such as driverless cars remain scarce.Autonomous vehicles need to be able to make context-driven decisions autonomously in different environments in which they operate. The recent literature on driverless vehicle research is heavily focused only on road or highway environments but have discounted pedestrianized areas and indoor environments. These unstructured environments tend to have more clutter and change rapidly over time. Therefore, for intelligent mobility to make a significant impact on human life, it is vital to extend the application beyond the structured environments. To further advance intelligent mobility, researchers need to take cues from multiple sensor streams, and multiple machine learning algorithms so that decisions can be robust and reliable. Only then will machines indeed be able to operate in unstructured and dynamic environments safely. Towards addressing these limitations, this thesis investigates data driven solutions towards crucial building blocks in intelligent mobility. Specifically, the thesis investigates multimodal sensor data fusion, machine learning, multimodal deep representation learning and its application of intelligent mobility. This work demonstrates that mobile robots can use multimodal machine learning to derive driver policy and therefore make autonomous decisions.To facilitate autonomous decisions necessary to derive safe driving algorithms, we present an algorithm for free space detection and human activity recognition. Driving these decision-making algorithms are specific datasets collected throughout this study. They include the Loughborough London Autonomous Vehicle dataset, and the Loughborough London Human Activity Recognition dataset. The datasets were collected using an autonomous platform design and developed in house as part of this research activity. The proposed framework for Free-Space Detection is based on an active learning paradigm that leverages the relative uncertainty of multimodal sensor data streams (ultrasound and camera). It utilizes an online learning methodology to continuously update the learnt model whenever the vehicle experiences new environments. The proposed Free Space Detection algorithm enables an autonomous vehicle to self-learn, evolve and adapt to new environments never encountered before. The results illustrate that online learning mechanism is superior to one-off training of deep neural networks that require large datasets to generalize to unfamiliar surroundings. The thesis takes the view that human should be at the centre of any technological development related to artificial intelligence. It is imperative within the spectrum of intelligent mobility where an autonomous vehicle should be aware of what humans are doing in its vicinity. Towards improving the robustness of human activity recognition, this thesis proposes a novel algorithm that classifies point-cloud data originated from Light Detection and Ranging sensors. The proposed algorithm leverages multimodality by using the camera data to identify humans and segment the region of interest in point cloud data. The corresponding 3-dimensional data was converted to a Fisher Vector Representation before being classified by a deep Convolutional Neural Network. The proposed algorithm classifies the indoor activities performed by a human subject with an average precision of 90.3%. When compared to an alternative point cloud classifier, PointNet[1], [2], the proposed framework out preformed on all classes. The developed autonomous testbed for data collection and algorithm validation, as well as the multimodal data-driven solutions for driverless cars, is the major contributions of this thesis. It is anticipated that these results and the testbed will have significant implications on the future of intelligent mobility by amplifying the developments of intelligent driverless vehicles.</div

    A Decade of Neural Networks: Practical Applications and Prospects

    Get PDF
    The Jet Propulsion Laboratory Neural Network Workshop, sponsored by NASA and DOD, brings together sponsoring agencies, active researchers, and the user community to formulate a vision for the next decade of neural network research and application prospects. While the speed and computing power of microprocessors continue to grow at an ever-increasing pace, the demand to intelligently and adaptively deal with the complex, fuzzy, and often ill-defined world around us remains to a large extent unaddressed. Powerful, highly parallel computing paradigms such as neural networks promise to have a major impact in addressing these needs. Papers in the workshop proceedings highlight benefits of neural networks in real-world applications compared to conventional computing techniques. Topics include fault diagnosis, pattern recognition, and multiparameter optimization

    Mixed-mode cellular array processor realization for analyzing brain electrical activity in epilepsy

    Get PDF
    This thesis deals with the realization of hardware that is capable of computing algorithms that can be described using the theory of polynomial cellular neural/nonlinear networks (CNNs). The goal is to meet the requirements of an algorithm for predicting the onset of an epileptic seizure. The analysis associated with this application requires extensive computation of data that consists of segments of brain electrical activity. Different types of computer architectures are overviewed. Since the algorithm requires operations in which data is manipulated locally, special emphasis is put on assessing different parallel architectures. An array computer is potentially able to perform local computational tasks effectively and rapidly. Based on the requirements of the algorithm, a mixed-mode CNN is proposed. A mixed-mode CNN combines analog and digital processing so that the couplings and the polynomial terms are implemented with analog blocks, whereas the integrator is digital. A/D and D/A converters are used to interface between the analog blocks and the integrator. Based on the mixed-mode CNN architecture a cellular array processor is realized. In the realized array processor the processing units are coupled with programmable polynomial (linear, quadratic and cubic) first neighborhood feedback terms. A 10 mm2, 1.027 million transistor cellular array processor, with 2×72 processing units and 36 layers of memory in each is manufactured using a 0.25 μm digital CMOS process. The array processor can perform gray-scale Heun's integration of spatial convolutions with linear, quadratic and cubic activation functions for 72×72 data while keeping all I/O operations during processing local. One complete Heun's iteration round takes 166.4 μs, while the power consumption during processing is 192 mW. Experimental results of statistical variations in the multipliers and polynomial circuits are shown. Descriptions regarding improvements in the design are also explained. The results of this thesis can be used to assess the suitability of the mixed-mode approach for implementing an implantable system for predicting epileptic seizures. The results can also be used to assess the suitability of the approach for implementing other applications.reviewe
    corecore