11,012 research outputs found

    Deep saliency detection-based pedestrian detection with multispectral multi-scale features fusion network

    Get PDF
    In recent years, there has been increased interest in multispectral pedestrian detection using visible and infrared image pairs. This is due to the complementary visual information provided by these modalities, which enhances the robustness and reliability of pedestrian detection systems. However, current research in multispectral pedestrian detection faces the challenge of effectively integrating different modalities to reduce miss rates in the system. This article presents an improved method for multispectral pedestrian detection. The method utilises a saliency detection technique to modify the infrared image and obtain an infrared-enhanced map with clear pedestrian features. Subsequently, a multiscale image features fusion network is designed to efficiently fuse visible and IR-enhanced maps. Finally, the fusion network is supervised by three loss functions for illumination perception, light intensity, and texture information in conjunction with the light perception sub-network. The experimental results demonstrate that the proposed method improves the logarithmic mean miss rate for the three main subgroups (all day, day and night) to 3.12%, 3.06%, and 4.13% respectively, at “reasonable” settings. This is an improvement over the traditional method, which achieved rates of 3.11%, 2.77%, and 2.56% respectively, thus demonstrating the effectiveness of the proposed method

    Improved YOLOv7-based steel surface defect detection algorithm

    Get PDF
    In response to the limited detection ability and low model generalization ability of the YOLOv7 algorithm for small targets, this paper proposes a detection algorithm based on the improved YOLOv7 algorithm for steel surface defect detection. First, the Transformer-InceptionDWConvolution (TI) module is designed, which combines the Transformer module and InceptionDWConvolution to increase the network's ability to detect small objects. Second, the spatial pyramid pooling fast cross-stage partial channel (SPPFCSPC) structure is introduced to enhance the network training performance. Third, a global attention mechanism (GAM) attention mechanism is designed to optimize the network structure, weaken the irrelevant information in the defect image, and increase the algorithm's ability to detect small defects. Meanwhile, the Mish function is used as the activation function of the feature extraction network to improve the model's generalization ability and feature extraction ability. Finally, a minimum partial distance intersection over union (MPDIoU) loss function is designed to locate the loss and solve the mismatch problem between the complete intersection over union (CIoU) prediction box and the real box directions. The experimental results show that on the Northeastern University Defect Detection (NEU-DET) dataset, the improved YOLOv7 network model improves the mean Average precision (mAP) performance by 6% when compared to the original algorithm, while on the VOC2012 dataset, the mAP performance improves by 2.6%. These results indicate that the proposed algorithm can effectively improve the small defect detection performance on steel surface defects

    RD-VIO: Robust Visual-Inertial Odometry for Mobile Augmented Reality in Dynamic Environments

    Full text link
    It is typically challenging for visual or visual-inertial odometry systems to handle the problems of dynamic scenes and pure rotation. In this work, we design a novel visual-inertial odometry (VIO) system called RD-VIO to handle both of these two problems. Firstly, we propose an IMU-PARSAC algorithm which can robustly detect and match keypoints in a two-stage process. In the first state, landmarks are matched with new keypoints using visual and IMU measurements. We collect statistical information from the matching and then guide the intra-keypoint matching in the second stage. Secondly, to handle the problem of pure rotation, we detect the motion type and adapt the deferred-triangulation technique during the data-association process. We make the pure-rotational frames into the special subframes. When solving the visual-inertial bundle adjustment, they provide additional constraints to the pure-rotational motion. We evaluate the proposed VIO system on public datasets. Experiments show the proposed RD-VIO has obvious advantages over other methods in dynamic environments

    Multi-scenario pear tree inflorescence detection based on improved YOLOv7 object detection algorithm

    Get PDF
    Efficient and precise thinning during the orchard blossom period is a crucial factor in enhancing both fruit yield and quality. The accurate recognition of inflorescence is the cornerstone of intelligent blossom equipment. To advance the process of intelligent blossom thinning, this paper addresses the issue of suboptimal performance of current inflorescence recognition algorithms in detecting dense inflorescence at a long distance. It introduces an inflorescence recognition algorithm, YOLOv7-E, based on the YOLOv7 neural network model. YOLOv7 incorporates an efficient multi-scale attention mechanism (EMA) to enable cross-channel feature interaction through parallel processing strategies, thereby maximizing the retention of pixel-level features and positional information on the feature maps. Additionally, the SPPCSPC module is optimized to preserve target area features as much as possible under different receptive fields, and the Soft-NMS algorithm is employed to reduce the likelihood of missing detections in overlapping regions. The model is trained on a diverse dataset collected from real-world field settings. Upon validation, the improved YOLOv7-E object detection algorithm achieves an average precision and recall of 91.4% and 89.8%, respectively, in inflorescence detection under various time periods, distances, and weather conditions. The detection time for a single image is 80.9 ms, and the model size is 37.6 Mb. In comparison to the original YOLOv7 algorithm, it boasts a 4.9% increase in detection accuracy and a 5.3% improvement in recall rate, with a mere 1.8% increase in model parameters. The YOLOv7-E object detection algorithm presented in this study enables precise inflorescence detection and localization across an entire tree at varying distances, offering robust technical support for differentiated and precise blossom thinning operations by thinning machinery in the future

    Vision-enhanced Peg-in-Hole for automotive body parts using semantic image segmentation and object detection

    Get PDF
    Artificial Intelligence (AI) is an enabling technology in the context of Industry 4.0. In particular, the automotive sector is among those who can benefit most of the use of AI in conjunction with advanced vision techniques. The scope of this work is to integrate deep learning algorithms in an industrial scenario involving a robotic Peg-in-Hole task. More in detail, we focus on a scenario where a human operator manually positions a carbon fiber automotive part in the workspace of a 7 Degrees of Freedom (DOF) manipulator. To cope with the uncertainty on the relative position between the robot and the workpiece, we adopt a three stage strategy. The first stage concerns the Three-Dimensional (3D) reconstruction of the workpiece using a registration algorithm based on the Iterative Closest Point (ICP) paradigm. Such a procedure is integrated with a semantic image segmentation neural network, which is in charge of removing the background of the scene to improve the registration. The adoption of such network allows to reduce the registration time of about 28.8%. In the second stage, the reconstructed surface is compared with a Computer Aided Design (CAD) model of the workpiece to locate the holes and their axes. In this stage, the adoption of a Convolutional Neural Network (CNN) allows to improve the holes’ position estimation of about 57.3%. The third stage concerns the insertion of the peg by implementing a search phase to handle the remaining estimation errors. Also in this case, the use of the CNN reduces the search phase duration of about 71.3%. Quantitative experiments, including a comparison with a previous approach without both the segmentation network and the CNN, have been conducted in a realistic scenario. The results show the effectiveness of the proposed approach and how the integration of AI techniques improves the success rate from 84.5% to 99.0%

    Face Emotion Recognition Based on Machine Learning: A Review

    Get PDF
    Computers can now detect, understand, and evaluate emotions thanks to recent developments in machine learning and information fusion. Researchers across various sectors are increasingly intrigued by emotion identification, utilizing facial expressions, words, body language, and posture as means of discerning an individual's emotions. Nevertheless, the effectiveness of the first three methods may be limited, as individuals can consciously or unconsciously suppress their true feelings. This article explores various feature extraction techniques, encompassing the development of machine learning classifiers like k-nearest neighbour, naive Bayesian, support vector machine, and random forest, in accordance with the established standard for emotion recognition. The paper has three primary objectives: firstly, to offer a comprehensive overview of effective computing by outlining essential theoretical concepts; secondly, to describe in detail the state-of-the-art in emotion recognition at the moment; and thirdly, to highlight important findings and conclusions from the literature, with an emphasis on important obstacles and possible future paths, especially in the creation of state-of-the-art machine learning algorithms for the identification of emotions

    Image Memorability Prediction with Vision Transformers

    Full text link
    Behavioral studies have shown that the memorability of images is similar across groups of people, suggesting that memorability is a function of the intrinsic properties of images, and is unrelated to people's individual experiences and traits. Deep learning networks can be trained on such properties and be used to predict memorability in new data sets. Convolutional neural networks (CNN) have pioneered image memorability prediction, but more recently developed vision transformer (ViT) models may have the potential to yield even better predictions. In this paper, we present the ViTMem, a new memorability model based on ViT, and evaluate memorability predictions obtained by it with state-of-the-art CNN-derived models. Results showed that ViTMem performed equal to or better than state-of-the-art models on all data sets. Additional semantic level analyses revealed that ViTMem is particularly sensitive to the semantic content that drives memorability in images. We conclude that ViTMem provides a new step forward, and propose that ViT-derived models can replace CNNs for computational prediction of image memorability. Researchers, educators, advertisers, visual designers and other interested parties can leverage the model to improve the memorability of their image material

    Dissecting structural and biochemical features of DNA methyltransferase 1

    Get PDF
    DNA methylation is an epigenetic modification found in every branch of life. An essential enzyme for the maintenance of DNA methylation patterns in mammals is DNA methyltransferase 1 (DNMT1). Its recruitment is regulated through its large N-terminus, which contains six annotated domains. Although most of these have been assigned a function, we are still lacking a holistic understanding of the enzyme's spatio-temporal regulation. Interestingly, a large segment of the N-terminus is devoid of any known domain and appears to be disordered in its sequence. Over the past years, such disordered sequences have increasingly gained attention, due to their role in forming biomolecular condensates through liquid-liquid phase separation (LLPS). These liquid compartments offer specific environmental conditions distinct from the surrounding that can enhance protein recruitment and function. In this work, we explore a potential role for the intrinsically disordered domain (IDR) in the recruitment of DNMT1. Taking an evolutionary approach, we uncover that structural features of the region that are key for IDR function are highly conserved. Moreover, we find conserved biochemical signatures compatible with a role in LLPS. Using a reconstitution assay and an opto-genetic approach in cells, we for the first time show that the DNMT1 IDR is capable of undergoing LLPS in vitro and in vivo. In addition, we define a novel region of interest (ROI) of about 120 amino acids in the IDR that appears to have been inserted in the ancestor of eutherian mammals. Although the ROI has a distinct biochemical signature, we find no effect on the LLPS behavior of the IDR. Therefore, we discuss other potential roles of the ROI related to DNA methylation, for example, imprinting. Finally, we lay the foundation for investigating a biological function of the IDR and establish a system for screening DNMT1 mutant phenotypes in mouse embryonic stem cells. Swift depletion of the endogenous protein is enabled by degron-mediated degradation, while our optimized construct design and efficient derivation strategy ensure the robust expression of the large transgenes. In combination with different methods for DNA methylation read-out, this system can now be used to study the role of the IDR and ROI in maintaining the steady-state level of DNA methylation against mechanisms of passive and active demethylation, but also for studying phenotypes affecting the efficiency of DNMT1 recruitment in the future

    Reliable Sensor Intelligence in Resource Constrained and Unreliable Environment

    Get PDF
    The objective of this research is to design a sensor intelligence that is reliable in a resource constrained, unreliable environment. There are various sources of variations and uncertainty involved in intelligent sensor system, so it is critical to build reliable sensor intelligence. Many prior works seek to design reliable sensor intelligence by developing robust and reliable task. This thesis suggests that along with improving task itself, task reliability quantification based early warning can further improve sensor intelligence. DNN based early warning generator quantifies task reliability based on spatiotemporal characteristics of input, and the early warning controls sensor parameters and avoids system failure. This thesis presents an early warning generator that predicts task failure due to sensor hardware induced input corruption and controls the sensor operation. Moreover, lightweight uncertainty estimator is presented to take account of DNN model uncertainty in task reliability quantification without prohibitive computation from stochastic DNN. Cross-layer uncertainty estimation is also discussed to consider the effect of PIM variations.Ph.D
    corecore