70 research outputs found

    PDF-VQA: A New Dataset for Real-World VQA on PDF Documents

    Full text link
    Document-based Visual Question Answering examines the document understanding of document images in conditions of natural language questions. We proposed a new document-based VQA dataset, PDF-VQA, to comprehensively examine the document understanding from various aspects, including document element recognition, document layout structural understanding as well as contextual understanding and key information extraction. Our PDF-VQA dataset extends the current scale of document understanding that limits on the single document page to the new scale that asks questions over the full document of multiple pages. We also propose a new graph-based VQA model that explicitly integrates the spatial and hierarchically structural relationships between different document elements to boost the document structural understanding. The performances are compared with several baselines over different question types and tasks\footnote{The full dataset will be released after paper acceptance

    Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis

    Full text link
    Recognizing the layout of unstructured digital documents is crucial when parsing the documents into the structured, machine-readable format for downstream applications. Recent studies in Document Layout Analysis usually rely on computer vision models to understand documents while ignoring other information, such as context information or relation of document components, which are vital to capture. Our Doc-GCN presents an effective way to harmonize and integrate heterogeneous aspects for Document Layout Analysis. We first construct graphs to explicitly describe four main aspects, including syntactic, semantic, density, and appearance/visual information. Then, we apply graph convolutional networks for representing each aspect of information and use pooling to integrate them. Finally, we aggregate each aspect and feed them into 2-layer MLPs for document layout component classification. Our Doc-GCN achieves new state-of-the-art results in three widely used DLA datasets.Comment: Accepted by COLING 202

    Training Robust Spiking Neural Networks on Neuromorphic Data with Spatiotemporal Fragments

    Full text link
    Neuromorphic vision sensors (event cameras) are inherently suitable for spiking neural networks (SNNs) and provide novel neuromorphic vision data for this biomimetic model. Due to the spatiotemporal characteristics, novel data augmentations are required to process the unconventional visual signals of these cameras. In this paper, we propose a novel Event SpatioTemporal Fragments (ESTF) augmentation method. It preserves the continuity of neuromorphic data by drifting or inverting fragments of the spatiotemporal event stream to simulate the disturbance of brightness variations, leading to more robust spiking neural networks. Extensive experiments are performed on prevailing neuromorphic datasets. It turns out that ESTF provides substantial improvements over pure geometric transformations and outperforms other event data augmentation methods. It is worth noting that the SNNs with ESTF achieve the state-of-the-art accuracy of 83.9\% on the CIFAR10-DVS dataset.Comment: Accepted by ICASSP 202

    D-IF: Uncertainty-aware Human Digitization via Implicit Distribution Field

    Full text link
    Realistic virtual humans play a crucial role in numerous industries, such as metaverse, intelligent healthcare, and self-driving simulation. But creating them on a large scale with high levels of realism remains a challenge. The utilization of deep implicit function sparks a new era of image-based 3D clothed human reconstruction, enabling pixel-aligned shape recovery with fine details. Subsequently, the vast majority of works locate the surface by regressing the deterministic implicit value for each point. However, should all points be treated equally regardless of their proximity to the surface? In this paper, we propose replacing the implicit value with an adaptive uncertainty distribution, to differentiate between points based on their distance to the surface. This simple ``value to distribution'' transition yields significant improvements on nearly all the baselines. Furthermore, qualitative results demonstrate that the models trained using our uncertainty distribution loss, can capture more intricate wrinkles, and realistic limbs. Code and models are available for research purposes at https://github.com/psyai-net/D-IF_release

    Research Progress in the Regulation Mechanisms of White and Brown Adipose Tissue in the Body by Functionally Active Factors

    Get PDF
    Brown adipose tissue (BAT) improves the metabolic level of the body by promoting energy expenditure, which can contribute to the prevention and treatment of metabolic diseases such as obesity and diabetes, and BAT has become a new target for the treatment of metabolic diseases. BAT activity enhancement in the body is a hot topic but also a challenge for researchers, and research and analysis of functionally active factors in foods that regulate BAT can help to develop new nutritional activators. In this paper, we summarize the development and thermogenesis of BAT and thermogenesis-related factors, and review active ingredients in foods that regulate brown fat and their mechanisms of action, and briefly introduce the effects of white adipose tissue (WAT) and BAT on the body’s health. We also discuss recent developments in understanding the role of BAT in regulating energy metabolic balance and various diseases in the body. We hope that the present review will provide a theoretical basis for future development of brown adipose nutritional activators and improvement of individualized healthy dietary management programs in order to prevent and treat various diseases

    Training Stronger Spiking Neural Networks with Biomimetic Adaptive Internal Association Neurons

    Full text link
    As the third generation of neural networks, spiking neural networks (SNNs) are dedicated to exploring more insightful neural mechanisms to achieve near-biological intelligence. Intuitively, biomimetic mechanisms are crucial to understanding and improving SNNs. For example, the associative long-term potentiation (ALTP) phenomenon suggests that in addition to learning mechanisms between neurons, there are associative effects within neurons. However, most existing methods only focus on the former and lack exploration of the internal association effects. In this paper, we propose a novel Adaptive Internal Association~(AIA) neuron model to establish previously ignored influences within neurons. Consistent with the ALTP phenomenon, the AIA neuron model is adaptive to input stimuli, and internal associative learning occurs only when both dendrites are stimulated at the same time. In addition, we employ weighted weights to measure internal associations and introduce intermediate caches to reduce the volatility of associations. Extensive experiments on prevailing neuromorphic datasets show that the proposed method can potentiate or depress the firing of spikes more specifically, resulting in better performance with fewer spikes. It is worth noting that without adding any parameters at inference, the AIA model achieves state-of-the-art performance on DVS-CIFAR10~(83.9\%) and N-CARS~(95.64\%) datasets.Comment: Accepted by ICASSP 202

    Form-NLU: Dataset for the Form Language Understanding

    Full text link
    Compared to general document analysis tasks, form document structure understanding and retrieval are challenging. Form documents are typically made by two types of authors; A form designer, who develops the form structure and keys, and a form user, who fills out form values based on the provided keys. Hence, the form values may not be aligned with the form designer's intention (structure and keys) if a form user gets confused. In this paper, we introduce Form-NLU, the first novel dataset for form structure understanding and its key and value information extraction, interpreting the form designer's intent and the alignment of user-written value on it. It consists of 857 form images, 6k form keys and values, and 4k table keys and values. Our dataset also includes three form types: digital, printed, and handwritten, which cover diverse form appearances and layouts. We propose a robust positional and logical relation-based form key-value information extraction framework. Using this dataset, Form-NLU, we first examine strong object detection models for the form layout understanding, then evaluate the key information extraction task on the dataset, providing fine-grained results for different types of forms and keys. Furthermore, we examine it with the off-the-shelf pdf layout extraction tool and prove its feasibility in real-world cases.Comment: Accepted by SIGIR 202

    Molecular epidemiology and antimicrobial resistance of outbreaks of Klebsiella pneumoniae clinical mastitis in Chinese dairy farms

    Get PDF
    Klebsiella pneumoniae is an opportunistic pathogen that causes serious infections in humans and animals. However, the availability of epidemiological information on clinical mastitis due to K. pneumoniae is limited. To acquire new information regarding K. pneumoniae mastitis, data were mined about K. pneumoniae strains on dairy cattle farms (farms A to H) in 7 Chinese provinces in 2021. Hypermucoviscous strains of K. pneumoniae were obtained by the string test. MICs of antimicrobial agents were determined via the broth microdilution method. Ten antimicrobial resistance genes and virulence genes were identified by PCR. The prevalence of K. pneumoniae was 35.91% (65/181), and 100% of the bacteria were sensitive to enrofloxacin. Nine antimicrobial resistance genes and virulence genes were identified and compared among farms. The hypermucoviscous phenotype was present in 94.44% of isolates from farm B, which may be a function of the rmpA virulence gene. Based on these data, the multidrug-resistant strains SD-14 and HB-21 were chosen and sequenced. Genotypes were assayed for K. pneumoniae isolates from different countries and different hosts using multilocus sequence typing (MLST). Ninety-four sequence types (STs) were found, and 6 STs present a risk for spreading in specific regions. Interestingly, ST43 was observed in bovine isolates for the first time. Our study partially reveals the current distribution characteristics of bovine K. pneumoniae in China and may provide a theoretical basis for the prevention and treatment of bovine K. pneumoniae mastitis
    • …
    corecore