56 research outputs found

    Decoupled Attention Network for Text Recognition

    Full text link
    Text recognition has attracted considerable research interests because of its various applications. The cutting-edge text recognition methods are based on attention mechanisms. However, most of attention methods usually suffer from serious alignment problem due to its recurrency alignment operation, where the alignment relies on historical decoding results. To remedy this issue, we propose a decoupled attention network (DAN), which decouples the alignment operation from using historical decoding results. DAN is an effective, flexible and robust end-to-end text recognizer, which consists of three components: 1) a feature encoder that extracts visual features from the input image; 2) a convolutional alignment module that performs the alignment operation based on visual features from the encoder; and 3) a decoupled text decoder that makes final prediction by jointly using the feature map and attention maps. Experimental results show that DAN achieves state-of-the-art performance on multiple text recognition tasks, including offline handwritten text recognition and regular/irregular scene text recognition.Comment: 9 pages, 8 figures, 6 tables, accepted by AAAI-202

    Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

    Full text link
    Visual information extraction (VIE) has attracted considerable attention recently owing to its various advanced applications such as document understanding, automatic marking and intelligent education. Most existing works decoupled this problem into several independent sub-tasks of text spotting (text detection and recognition) and information extraction, which completely ignored the high correlation among them during optimization. In this paper, we propose a robust visual information extraction system (VIES) towards real-world scenarios, which is a unified end-to-end trainable framework for simultaneous text detection, recognition and information extraction by taking a single document image as input and outputting the structured information. Specifically, the information extraction branch collects abundant visual and semantic representations from text spotting for multimodal feature fusion and conversely, provides higher-level semantic clues to contribute to the optimization of text spotting. Moreover, regarding the shortage of public benchmarks, we construct a fully-annotated dataset called EPHOIE (https://github.com/HCIILAB/EPHOIE), which is the first Chinese benchmark for both text spotting and visual information extraction. EPHOIE consists of 1,494 images of examination paper head with complex layouts and background, including a total of 15,771 Chinese handwritten or printed text instances. Compared with the state-of-the-art methods, our VIES shows significant superior performance on the EPHOIE dataset and achieves a 9.01% F-score gain on the widely used SROIE dataset under the end-to-end scenario.Comment: 8 pages, 5 figures, to be published in AAAI 202

    Parameter design of coal pillar in highwall mining under action of dynamic-static load

    Get PDF
    In view of the application of end slope shearer mining technology to recover a large amount of residual coal, the determination of reasonable width of supporting coal pillar is a key factor whether it can be safely and efficiently popularization and application, especially considering the influence of blasting vibration on the stability of supporting coal pillar. Based on the southern end slope at the open-pit coal mine of Pingshuo, field vibration test, theoretical analysis and numerical calculation were used to study the web pillar stability in open-pit highwall mining and its parameter design under the action of triangular load and blasting vibration on the side slope. Based on the theory of limit balance and the mutation theory, the stress distribution at the coal pillar was analyzed, combined with Mohr-Coulomb failure criterion. Besides, the ultimate strength function expression of coal pillar under the influence of mining height, mining width, load stress of overlying strata, cohesion and internal friction angle of coal pillar is established. The calculation formula of the maximum allowable plastic zone width and rational width of web pillar under different safety reserve factor conditions are established. The three-dimensional simple harmonic vibration response model of the supported coal pillar was established, and the blasting parameters such as the amount of single shot, elevation difference and horizontal distance of the blast center were studied on the response of the maximum instantaneous dynamic stress of the coal pillar, which revealed the influence mechanism of the blasting dynamic load effect on the width and stability of the plastic zone of the supported coal pillar and proposed the design method of the parameters of the supported coal pillar under the blasting dynamic load. The results show that the blasting vibration has a greater influence on the stability of coal pillar, and the instantaneous maximum dynamic stress response of coal pillar under the blasting dynamic load is positively correlated with the amount of single shot, and negatively correlated with the elevation difference and horizontal distance. With the increase of the maximum instantaneous dynamic stress response of coal pillar, the width of plastic zone of coal pillar increases proportionally, and the safety factor of coal pillar decays in an approximately linear pattern. The width of coal pillar under dynamic-static load is determined to be 5 m, and its reasonableness is verified by engineering practice

    A Lightweight High-Resolution RS Image Road Extraction Method Combining Multi-Scale and Attention Mechanism

    No full text
    Road information plays an indispensable role in human society’s development. However, owing to the diversity and complexity of roads, it is difficult to obtain satisfactory road-extraction result. Some typical factors, such as discontinuity, loss of edge details, and long-time consumption, have negative impacts on obtaining accurate road information. These problems are particularly prominent during road extraction when high-resolution remote-sensing images are used. To obtain accurate road information, a novel lightweight deep learning neural network was pro-posed in this study by integrating a multiscale module and attention mechanisms. As an excellent multiscale segmentation module, the atrous spatial pyramid pooling was selected to enhance the road extraction ability of remote sensing images. In addition, an attention mechanism was employed to solve the problems of discontinuity and loss of edge details in road extraction, and MobileNet V2 was selected as the backbone of DeepLab V3+ because of its lightweight structure, which can help solve the problem of excessive training time consumption. The experimental verification was carried out on the Ottawa road dataset and the Massachusetts road dataset. Experimental results show that compared with U-Net, SegNet and MDeeplab v3+ networks, the proposed algorithm is the best in IoU, Recall, OA and Kappa. Among them, on the Ottawa road dataset, the OA and Kappa of the algorithm in this paper are 98.92 % and 95.02 %, respectively. On the Massachusetts road dataset, OA and Kappa 98.29% and 89.87%. In addition, the training time was significantly shorter than that of the other deep learning networks. The proposed method exhibited a good performance in road extraction

    Remote Sensing Image Road Segmentation Method Integrating CNN-Transformer and UNet

    No full text
    Real-time and accurate road information is crucial for updating electronic navigation maps. To address the problem of low precision and poor robustness in current semantic segmentation methods for road extraction from remote sensing imagery, we proposed a UNet road semantic segmentation model based on attention mechanism improvement. First, we introduce a CNN-Transformer hybrid structure to the encoder to enhance the feature extraction capabilities of global and local details. Second, the traditional upsampling module in the decoder is replaced with a dual upsampling module to improve feature extraction capabilities and segmentation accuracy. Furthermore, the hard-swish activation function is used instead of ReLU activation function to smooth the curve, which helps to improve the generalization and non-linear feature extraction abilities and avoid gradient vanishing. Finally, a comprehensive loss function combining cross entropy and dice is used to strengthen the segmentation result constraints and further improve segmentation accuracy. Experimental validation is performed on the Ottawa Road Dataset and the Massachusetts Road Dataset. Experimental results show that compared with U-Net, PSPNet, DeepLab V3 and TransUNet networks, this algorithm is the best in terms of MIoU, MPA and F1 score. Among them, on the Ottawa road data set, the MPA of this algorithm reached 95.48%. On the Massachusetts road data set, MPA is 92.56%. This method shows good performance in road extraction

    Roles of NAD+ and Its Metabolites Regulated Calcium Channels in Cancer

    No full text
    Nicotinamide adenine dinucleotide (NAD+) is an essential cofactor for redox enzymes, but also moonlights as a regulator for ion channels, the same as its metabolites. Ca2+ homeostasis is dysregulated in cancer cells and affects processes such as tumorigenesis, angiogenesis, autophagy, progression, and metastasis. Herein, we summarize the regulation of the most common calcium channels (TRPM2, TPCs, RyRs, and TRPML1) by NAD+ and its metabolites, with a particular focus on their roles in cancers. Although the mechanisms of NAD+ metabolites in these pathological processes are yet to be clearly elucidated, these ion channels are emerging as potential candidates of alternative targets for anticancer therapy

    Evaluating Slope Deformation of Earth Dams Due to Earthquake Shaking Using MARS and GMDH Techniques

    No full text
    Assessing the behavior of earth dams under dynamic loads is one of the most significant problems with the design of such large structures. The purpose of this study is to provide new models for predicting dam dispersion in real earthquake conditions. In the first phase, 103 real cases of deformation in earth dams were collected and analyzed due to earthquakes that occurred over recent years. Using nonlinear and machine learning techniques, i.e., group method of data handling (GMDH) and multivariate adaptive regression splines (MARS), two models for prediction of the slope deformation in earth dams under the various types of earthquakes were applied and developed. The main parameters used in these simulation techniques were earthquake magnitude (Mw), fundamental period ratio (Td/Tp), yield acceleration ratio (ay/amax) as inputs and value of slope deformation (Dave) as output. Finally, in order to check the accuracy of the results of the new models, a comparison was made with the previous relations and models in seismic conditions for the slope deformation in earth dams. The results showed that the MARS model, which is able to provide a mathematical equation, has a better result than the GMDH model. These new models are recommended to be used for future analyses based on their flexible capabilities
    • …
    corecore