19 research outputs found

    LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

    Full text link
    Self-supervised pre-training techniques have achieved remarkable progress in Document AI. Most multimodal pre-trained models use a masked language modeling objective to learn bidirectional representations on the text modality, but they differ in pre-training objectives for the image modality. This discrepancy adds difficulty to multimodal representation learning. In this paper, we propose LayoutLMv3 to pre-train multimodal Transformers for Document AI with unified text and image masking. Additionally, LayoutLMv3 is pre-trained with a word-patch alignment objective to learn cross-modal alignment by predicting whether the corresponding image patch of a text word is masked. The simple unified architecture and training objectives make LayoutLMv3 a general-purpose pre-trained model for both text-centric and image-centric Document AI tasks. Experimental results show that LayoutLMv3 achieves state-of-the-art performance not only in text-centric tasks, including form understanding, receipt understanding, and document visual question answering, but also in image-centric tasks such as document image classification and document layout analysis. The code and models are publicly available at https://aka.ms/layoutlmv3.Comment: Work in Progres

    TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering

    Full text link
    The diffusion model has been proven a powerful generative model in recent years, yet remains a challenge in generating visual text. Several methods alleviated this issue by incorporating explicit text position and content as guidance on where and what text to render. However, these methods still suffer from several drawbacks, such as limited flexibility and automation, constrained capability of layout prediction, and restricted style diversity. In this paper, we present TextDiffuser-2, aiming to unleash the power of language models for text rendering. Firstly, we fine-tune a large language model for layout planning. The large language model is capable of automatically generating keywords for text rendering and also supports layout modification through chatting. Secondly, we utilize the language model within the diffusion model to encode the position and texts at the line level. Unlike previous methods that employed tight character-level guidance, this approach generates more diverse text images. We conduct extensive experiments and incorporate user studies involving human participants as well as GPT-4V, validating TextDiffuser-2's capacity to achieve a more rational text layout and generation with enhanced diversity. The code and model will be available at \url{https://aka.ms/textdiffuser-2}

    Kosmos-2.5: A Multimodal Literate Model

    Full text link
    We present Kosmos-2.5, a multimodal literate model for machine reading of text-intensive images. Pre-trained on large-scale text-intensive images, Kosmos-2.5 excels in two distinct yet cooperative transcription tasks: (1) generating spatially-aware text blocks, where each block of text is assigned its spatial coordinates within the image, and (2) producing structured text output that captures styles and structures into the markdown format. This unified multimodal literate capability is achieved through a shared Transformer architecture, task-specific prompts, and flexible text representations. We evaluate Kosmos-2.5 on end-to-end document-level text recognition and image-to-markdown text generation. Furthermore, the model can be readily adapted for any text-intensive image understanding task with different prompts through supervised fine-tuning, making it a general-purpose tool for real-world applications involving text-rich images. This work also paves the way for the future scaling of multimodal large language models

    A Novel Method for AI-Assisted INS/GNSS Navigation System Based on CNN-GRU and CKF during GNSS Outage

    No full text
    In the fields of positioning and navigation, the integrated inertial navigation system (INS)/global navigation satellite systems (GNSS) are frequently employed. Currently, high-precision INS typically utilizes fiber optic gyroscopes (FOGs) and quartz flexural accelerometers (QFAs) rather than MEMS sensors. But when GNSS signals are not available, the errors of high-precision INS also disperse rapidly, similar to MEMS-INS when GNSS signals would be unavailable for a long time, leading to a serious degradation of the navigation accuracy. This paper presents a new AI-assisted method for the integrated high-precision INS/GNSS navigation system. The position increments during GNSS outage are predicted by the convolutional neural network-gated recurrent unit (CNN-GRU). In the process, the CNN is utilized to quickly extract the multi-dimensional sequence features, and GRU is used to model the time series. In addition, a new real-time training strategy is proposed for practical application scenarios, where the duration of the GNSS outage time and the motion state information of the vehicle are taken into account in the training strategy. The real road test results verify that the proposed algorithm has the advantages of high prediction accuracy and high training efficiency

    A Novel Method for AI-Assisted INS/GNSS Navigation System Based on CNN-GRU and CKF during GNSS Outage

    No full text
    In the fields of positioning and navigation, the integrated inertial navigation system (INS)/global navigation satellite systems (GNSS) are frequently employed. Currently, high-precision INS typically utilizes fiber optic gyroscopes (FOGs) and quartz flexural accelerometers (QFAs) rather than MEMS sensors. But when GNSS signals are not available, the errors of high-precision INS also disperse rapidly, similar to MEMS-INS when GNSS signals would be unavailable for a long time, leading to a serious degradation of the navigation accuracy. This paper presents a new AI-assisted method for the integrated high-precision INS/GNSS navigation system. The position increments during GNSS outage are predicted by the convolutional neural network-gated recurrent unit (CNN-GRU). In the process, the CNN is utilized to quickly extract the multi-dimensional sequence features, and GRU is used to model the time series. In addition, a new real-time training strategy is proposed for practical application scenarios, where the duration of the GNSS outage time and the motion state information of the vehicle are taken into account in the training strategy. The real road test results verify that the proposed algorithm has the advantages of high prediction accuracy and high training efficiency

    An Improvement of a Mapping Method Based on Ant Colony Algorithm Applied to Smart Cities

    No full text
    The ant colony algorithm has been widely used in the field of data analysis of smart cities. However, the research of the traditional ant colony algorithm is more focused on one-to-one scenarios and there is insufficient research on many-to-one scenarios. Therefore, for the many-to-one topology mapping problem, this paper proposes a mapping method based on the ant colony algorithm. The design purpose of the mapping algorithm is to study the optimal mapping scheme, which can effectively reduce the cost of solving the problem. The core of the mapping algorithm is to design the objective function of the algorithm optimization. The commonly used optimization objective function and evaluation index is the average hop count; the average hop count is the most important indicator to measure the entire system. The smaller the average hop count, the less the pulse data needs to be forwarded, which can reduce the communication pressure of the system, reduce congestion, reduce the energy consumption caused by communication, and reduce the delay from the generation of pulse data to the response, etc. Therefore, this paper chooses the average hop count as the optimization objective and reduces the average hop count by designing a mapping algorithm. Through the simulation and verification of the improved ant colony algorithm in the scenario of many-to-one topology mapping, it is concluded that the final convergence result and convergence speed of the improved ant colony algorithm are significantly better than those of the traditional ant colony algorithm

    Role of Demyelination Efficiency within Acellular Nerve Scaffolds during Nerve Regeneration across Peripheral Defects

    No full text
    Hudson’s optimized chemical processing method is the most commonly used chemical method to prepare acellular nerve scaffolds for the reconstruction of large peripheral nerve defects. However, residual myelin attached to the basal laminar tube has been observed in acellular nerve scaffolds prepared using Hudson’s method. Here, we describe a novel method of producing acellular nerve scaffolds that eliminates residual myelin more effectively than Hudson’s method through the use of various detergent combinations of sulfobetaine-10, sulfobetaine-16, Triton X-200, sodium deoxycholate, and peracetic acid. In addition, the efficacy of this new scaffold in repairing a 1.5 cm defect in the sciatic nerve of rats was examined. The modified method produced a higher degree of demyelination than Hudson’s method, resulting in a minor host immune response in vivo and providing an improved environment for nerve regeneration and, consequently, better functional recovery. A morphological study showed that the number of regenerated axons in the modified group and Hudson group did not differ. However, the autograft and modified groups were more similar in myelin sheath regeneration than the autograft and Hudson groups. These results suggest that the modified method for producing a demyelinated acellular scaffold may aid functional recovery in general after nerve defects

    Sinusoidal Phase-Modulated Angle Interferometer for Angular Vibration Measurement

    No full text
    Primary angular vibration calibration devices based on laser interferometers play a crucial role in evaluating the dynamic performance of inertial sensing devices. Here, we propose a sinusoidal phase-modulated angle interferometer (SPMAI) to realize angular vibration measurements over a frequency range of 1–1000 Hz, in which the sinusoidal measurement retro-reflector (SMR) and the phase generation carrier (PGC) demodulation algorithm are adopted to track the dynamic angle variation. A comprehensive theoretical analysis is presented to reveal the relationship between demodulation performance of the SPMAI and several factors, such as phase modulation depth, carrier phase delay and sampling frequency. Both the simulated and experimental results demonstrate that the proposed SPMAI can achieve an angular vibration measurement with amplitude of sub-arcsecond under given parameters. Using the proposed SPMAI, the frequency bandwidth of an interferometric fiber-optic gyroscope (IFOG) is successfully determined to be 848 Hz

    Research on the Frequency-Dependent Halfwave Voltage of a Multifunction Integrated Optical Chip in an Interferometric Fiber Optic Gyroscope

    No full text
    The multifunction integrated optical chip (MIOC) is one of the most critical parts of the interferometric fiber optic gyroscope (IFOG), and research on the halfwave voltage of the MIOC is meaningful for a high-precision IFOG. In this paper, the correlation between the frequency and halfwave voltage, which affects the interference light intensity of IFOG, is presented theoretically. A widespread measurement method for frequency dependence of the halfwave voltage, based on lock-in amplification and sinusoidal modulation, is proposed. Further, the measurement result and the oscillation of interference light intensity in the Sagnac interferometer are presented, which are in great agreement with the theory. This paper proposes the frequency dependence of the halfwave voltage and provides a new error research direction for the improvement of the MIOC in a high-precision IFOG

    Fiber Optic All-Polarization Weak Magnetic Field Sensor Based on Sagnac Interferometer

    No full text
    A novel fiber-optic magnetic field sensor, based on a Sagnac structure, is proposed with the approach of polarization interference detection. The sensor takes advantage of common path interference, combining with a high magnetic field sensitivity sensing unit, composed of magneto-optical crystal, and magnetic field concentrators, to achieve high resolution, high stability, and large dynamic measurement of DC magnetic field signals. In this paper, the theoretical model is established and the related theory is derived in detail. The key technologies in the system are thoroughly investigated and verified. Experimental research on the proposed system is demonstrated and the results show that a DC magnetic field resolution of 5.6 nT and a dynamic range of larger than 70 dB is achieved. Furthermore, the linearity of the system is greater than 99.8% and the instability is less than 0.5%
    corecore