521 research outputs found

    Computer Vision Applications for Autonomous Aerial Vehicles

    Get PDF
    Undoubtedly, unmanned aerial vehicles (UAVs) have experienced a great leap forward over the last decade. It is not surprising anymore to see a UAV being used to accomplish a certain task, which was previously carried out by humans or a former technology. The proliferation of special vision sensors, such as depth cameras, lidar sensors and thermal cameras, and major breakthroughs in computer vision and machine learning fields accelerated the advance of UAV research and technology. However, due to certain unique challenges imposed by UAVs, such as limited payload capacity, unreliable communication link with the ground stations and data safety, UAVs are compelled to perform many tasks on their onboard embedded processing units, which makes it difficult to readily implement the most advanced algorithms on UAVs. This thesis focuses on computer vision and machine learning applications for UAVs equipped with onboard embedded platforms, and presents algorithms that utilize data from multiple modalities. The presented work covers a broad spectrum of algorithms and applications for UAVs, such as indoor UAV perception, 3D understanding with deep learning, UAV localization, and structural inspection with UAVs. Visual guidance and scene understanding without relying on pre-installed tags or markers is the desired approach for fully autonomous navigation of UAVs in conjunction with the global positioning systems (GPS), or especially when GPS information is either unavailable or unreliable. Thus, semantic and geometric understanding of the surroundings become vital to utilize vision as guidance in the autonomous navigation pipelines. In this context, first, robust altitude measurement, safe landing zone detection and doorway detection methods are presented for autonomous UAVs operating indoors. These approaches are implemented on Google Project Tango platform, which is an embedded platform equipped with various sensors including a depth camera. Next, a modified capsule network for 3D object classification is presented with weight optimization so that the network can be fit and run on memory-constrained platforms. Then, a semantic segmentation method for 3D point clouds is developed for a more general visual perception on a UAV equipped with a 3D vision sensor. Next, this thesis presents algorithms for structural health monitoring applications involving UAVs. First, a 3D point cloud-based, drift-free and lightweight localization method is presented for depth camera-equipped UAVs that perform bridge inspection, where GPS signal is unreliable. Next, a thermal leakage detection algorithm is presented for detecting thermal anomalies on building envelopes using aerial thermography from UAVs. Then, building on our thermal anomaly identification expertise gained on the previous task, a novel performance anomaly identification metric (AIM) is presented for more reliable performance evaluation of thermal anomaly identification methods

    An Adaptive Locally Connected Neuron Model: Focusing Neuron

    Full text link
    This paper presents a new artificial neuron model capable of learning its receptive field in the topological domain of inputs. The model provides adaptive and differentiable local connectivity (plasticity) applicable to any domain. It requires no other tool than the backpropagation algorithm to learn its parameters which control the receptive field locations and apertures. This research explores whether this ability makes the neuron focus on informative inputs and yields any advantage over fully connected neurons. The experiments include tests of focusing neuron networks of one or two hidden layers on synthetic and well-known image recognition data sets. The results demonstrated that the focusing neurons can move their receptive fields towards more informative inputs. In the simple two-hidden layer networks, the focusing layers outperformed the dense layers in the classification of the 2D spatial data sets. Moreover, the focusing networks performed better than the dense networks even when 70%\% of the weights were pruned. The tests on convolutional networks revealed that using focusing layers instead of dense layers for the classification of convolutional features may work better in some data sets.Comment: 45 pages, a national patent filed, submitted to Turkish Patent Office, No: -2017/17601, Date: 09.11.201

    Shifting capsule networks from the cloud to the deep edge

    Get PDF
    Capsule networks (CapsNets) are an emerging trend in image processing. In contrast to a convolutional neural network, CapsNets are not vulnerable to object deformation, as the relative spatial information of the objects is preserved across the network. However, their complexity is mainly related to the capsule structure and the dynamic routing mechanism, which makes it almost unreasonable to deploy a CapsNet, in its original form, in a resource-constrained device powered by a small microcontroller (MCU). In an era where intelligence is rapidly shifting from the cloud to the edge, this high complexity imposes serious challenges to the adoption of CapsNets at the very edge. To tackle this issue, we present an API for the execution of quantized CapsNets in Arm Cortex-M and RISC-V MCUs. Our software kernels extend the Arm CMSIS-NN and RISC-V PULP-NN to support capsule operations with 8-bit integers as operands. Along with it, we propose a framework to perform post-training quantization of a CapsNet. Results show a reduction in memory footprint of almost 75%, with accuracy loss ranging from 0.07% to 0.18%. In terms of throughput, our Arm Cortex-M API enables the execution of primary capsule and capsule layers with medium-sized kernels in just 119.94 and 90.60 milliseconds (ms), respectively (STM32H755ZIT6U, Cortex-M7 @ 480 MHz). For the GAP-8 SoC (RISC-V RV32IMCXpulp @ 170 MHz), the latency drops to 7.02 and 38.03 ms, respectively

    New Descriptor for Glomerulus Detection in Kidney Microscopy Image

    Get PDF
    Glomerulus detection is a key step in histopathological evaluation of microscopy images of kidneys. However, the task of automatic detection of glomeruli poses challenges due to the disparity in sizes and shapes of glomeruli in renal sections. Moreover, extensive variations of their intensities due to heterogeneity in immunohistochemistry staining are also encountered. Despite being widely recognized as a powerful descriptor for general object detection, the rectangular histogram of oriented gradients (Rectangular HOG) suffers from many false positives due to the aforementioned difficulties in the context of glomerulus detection. A new descriptor referred to as Segmental HOG is developed to perform a comprehensive detection of hundreds of glomeruli in images of whole kidney sections. The new descriptor possesses flexible blocks that can be adaptively fitted to input images to acquire robustness to deformations of glomeruli. Moreover, the novel segmentation technique employed herewith generates high quality segmentation outputs and the algorithm is assured to converge to an optimal solution. Consequently, experiments using real world image data reveal that Segmental HOG achieves significant improvements in detection performance compared to Rectangular HOG. The proposed descriptor and method for glomeruli detection present promising results and is expected to be useful in pathological evaluation

    Enhanced Capsule-based Networks and Their Applications

    Get PDF
    Current deep models have achieved human-like accuracy in many computer vision tasks, even defeating humans sometimes. However, these deep models still suffer from significant weaknesses. To name a few, it is hard to interpret how they reach decisions, and it is easy to attack them with tiny perturbations. A capsule, usually implemented as a vector, represents an object or object part. Capsule networks and GLOM consist of classic and generalized capsules respectively, where the difference is whether the capsule is limited to representing a fixed thing. Both models are designed to parse their input into a part-whole hierarchy as humans do, where each capsule corresponds to an entity of the hierarchy. That is, the first layer finds the lowest-level vision patterns, and the following layers assemble the larger patterns till the entire object, e.g., from nostril to nose, face, and person. This design enables capsule networks and GLOM the potential of solving the above problems of current deep models, by mimicking how humans overcome these problems with the part-whole hierarchy. However, their current implementations are not perfect on fulfilling their potentials and require further improvements, including intrinsic interpretability, guaranteed equivariance, robustness to adversarial attacks, a more efficient routing algorithm, compatibility with other models, etc. In this dissertation, I first briefly introduce the motivations, essential ideas, and existing implementations of capsule networks and GLOM, then focus on addressing some limitations of these implementations. The improvements are briefly summarized as follows. First, a fast non-iterative routing algorithm is proposed for capsule networks, which facilitates their applications in many tasks such as image classification and segmentation. Second, a new architecture, named Twin-Islands, is proposed based on GLOM, which achieves the many desired properties such as equivariance, model interpretability, and adversarial robustness. Lastly, the essential idea of capsule networks and GLOM is re-implemented in a small group ensemble block, which can also be used along with other types of neural networks, e.g., CNNs, on various tasks such as image classification, segmentation, and retrieval

    Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead

    Get PDF
    Currently, Machine Learning (ML) is becoming ubiquitous in everyday life. Deep Learning (DL) is already present in many applications ranging from computer vision for medicine to autonomous driving of modern cars as well as other sectors in security, healthcare, and finance. However, to achieve impressive performance, these algorithms employ very deep networks, requiring a significant computational power, both during the training and inference time. A single inference of a DL model may require billions of multiply-and-accumulated operations, making the DL extremely compute-and energy-hungry. In a scenario where several sophisticated algorithms need to be executed with limited energy and low latency, the need for cost-effective hardware platforms capable of implementing energy-efficient DL execution arises. This paper first introduces the key properties of two brain-inspired models like Deep Neural Network (DNN), and Spiking Neural Network (SNN), and then analyzes techniques to produce efficient and high-performance designs. This work summarizes and compares the works for four leading platforms for the execution of algorithms such as CPU, GPU, FPGA and ASIC describing the main solutions of the state-of-the-art, giving much prominence to the last two solutions since they offer greater design flexibility and bear the potential of high energy-efficiency, especially for the inference process. In addition to hardware solutions, this paper discusses some of the important security issues that these DNN and SNN models may have during their execution, and offers a comprehensive section on benchmarking, explaining how to assess the quality of different networks and hardware systems designed for them

    Capsule Networks for Object Detection in UAV Imagery

    Get PDF
    Recent advances in Convolutional Neural Networks (CNNs) have attracted great attention in remote sensing due to their high capability to model high-level semantic content of Remote Sensing (RS) images. However, CNNs do not explicitly retain the relative position of objects in an image and, thus, the effectiveness of the obtained features is limited in the framework of the complex object detection problems. To address this problem, in this paper we introduce Capsule Networks (CapsNets) for object detection in Unmanned Aerial Vehicle-acquired images. Unlike CNNs, CapsNets extract and exploit the information content about objects’ relative position across several layers, which enables parsing crowded scenes with overlapping objects. Experimental results obtained on two datasets for car and solar panel detection problems show that CapsNets provide similar object detection accuracies when compared to state-of-the-art deep models with significantly reduced computational time. This is due to the fact that CapsNets emphasize dynamic routine instead of the depth.EC/H2020/759764/EU/Accurate and Scalable Processing of Big Data in Earth Observation/BigEart
    corecore