43 research outputs found

    PDF-VQA: A New Dataset for Real-World VQA on PDF Documents

    Full text link
    Document-based Visual Question Answering examines the document understanding of document images in conditions of natural language questions. We proposed a new document-based VQA dataset, PDF-VQA, to comprehensively examine the document understanding from various aspects, including document element recognition, document layout structural understanding as well as contextual understanding and key information extraction. Our PDF-VQA dataset extends the current scale of document understanding that limits on the single document page to the new scale that asks questions over the full document of multiple pages. We also propose a new graph-based VQA model that explicitly integrates the spatial and hierarchically structural relationships between different document elements to boost the document structural understanding. The performances are compared with several baselines over different question types and tasks\footnote{The full dataset will be released after paper acceptance

    Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis

    Full text link
    Recognizing the layout of unstructured digital documents is crucial when parsing the documents into the structured, machine-readable format for downstream applications. Recent studies in Document Layout Analysis usually rely on computer vision models to understand documents while ignoring other information, such as context information or relation of document components, which are vital to capture. Our Doc-GCN presents an effective way to harmonize and integrate heterogeneous aspects for Document Layout Analysis. We first construct graphs to explicitly describe four main aspects, including syntactic, semantic, density, and appearance/visual information. Then, we apply graph convolutional networks for representing each aspect of information and use pooling to integrate them. Finally, we aggregate each aspect and feed them into 2-layer MLPs for document layout component classification. Our Doc-GCN achieves new state-of-the-art results in three widely used DLA datasets.Comment: Accepted by COLING 202

    Topological triply-degenerate point with double Fermi arcs

    Full text link
    Unconventional chiral particles have recently been predicted to appear in certain three dimensional (3D) crystal structures containing three- or more-fold linear band degeneracy points (BDPs). These BDPs carry topological charges, but are distinct from the standard twofold Weyl points or fourfold Dirac points, and cannot be described in terms of an emergent relativistic field theory. Here, we report on the experimental observation of a topological threefold BDP in a 3D phononic crystal. Using direct acoustic field mapping, we demonstrate the existence of the threefold BDP in the bulk bandstructure, as well as doubled Fermi arcs of surface states consistent with a topological charge of 2. Another novel BDP, similar to a Dirac point but carrying nonzero topological charge, is connected to the threefold BDP via the doubled Fermi arcs. These findings pave the way to using these unconventional particles for exploring new emergent physical phenomena

    Form-NLU: Dataset for the Form Language Understanding

    Full text link
    Compared to general document analysis tasks, form document structure understanding and retrieval are challenging. Form documents are typically made by two types of authors; A form designer, who develops the form structure and keys, and a form user, who fills out form values based on the provided keys. Hence, the form values may not be aligned with the form designer's intention (structure and keys) if a form user gets confused. In this paper, we introduce Form-NLU, the first novel dataset for form structure understanding and its key and value information extraction, interpreting the form designer's intent and the alignment of user-written value on it. It consists of 857 form images, 6k form keys and values, and 4k table keys and values. Our dataset also includes three form types: digital, printed, and handwritten, which cover diverse form appearances and layouts. We propose a robust positional and logical relation-based form key-value information extraction framework. Using this dataset, Form-NLU, we first examine strong object detection models for the form layout understanding, then evaluate the key information extraction task on the dataset, providing fine-grained results for different types of forms and keys. Furthermore, we examine it with the off-the-shelf pdf layout extraction tool and prove its feasibility in real-world cases.Comment: Accepted by SIGIR 202

    EVD Surgical Guidance with Retro-Reflective Tool Tracking and Spatial Reconstruction using Head-Mounted Augmented Reality Device

    Full text link
    Augmented Reality (AR) has been used to facilitate surgical guidance during External Ventricular Drain (EVD) surgery, reducing the risks of misplacement in manual operations. During this procedure, the pivotal challenge is the accurate estimation of spatial relationship between pre-operative images and actual patient anatomy in AR environment. In this research, we propose a novel framework utilizing Time of Flight (ToF) depth sensors integrated in commercially available AR Head Mounted Devices (HMD) for precise EVD surgical guidance. As previous studies have proven depth errors for ToF sensors, we first conducted a comprehensive assessment for the properties of this error on AR-HMDs. Subsequently, a depth error model and patient-specific model parameter identification method, is introduced for accurate surface information. After that, a tracking procedure combining retro-reflective markers and point clouds is proposed for accurate head tracking, where head surface is reconstructed using ToF sensor data for spatial registration, avoiding fixing tracking targets rigidly on the patient's cranium. Firstly, 7.580±1.488mm7.580\pm 1.488 mm ToF sensor depth value error was revealed on human skin, indicating the significance of depth correction. Our results showed that the ToF sensor depth error was reduced by over 85%85\% using proposed depth correction method on head phantoms in different materials. Meanwhile, the head surface reconstructed with corrected depth data achieved sub-millimeter accuracy. Experiment on a sheep head revealed 0.79mm0.79 mm reconstruction error. Furthermore, a user study was conducted for the performance of proposed framework in simulated EVD surgery, where 5 surgeons performed 9 k-wire injections on a head phantom with virtual guidance. Results of this study revealed 2.09±0.16mm2.09 \pm 0.16 mm translational accuracy and 2.97±0.912.97\pm 0.91 ^\circ orientational accuracy

    Observation of vortex-string chiral modes in metamaterials

    Full text link
    As a hypothetical topological defect in the geometry of spacetime, vortex strings play a crucial role in shaping the clusters of galaxies that exist today, and their distinct features can provide observable clues about the early universe's evolution. A key feature of vortex strings is that they can interact with Weyl fermionic modes and support topological chiral-anomaly states with massless dispersions at the core of strings. To date, despite many attempts to detect vortex strings in astrophysics or to emulate them in artificially created systems, observation of these topological vortex-string chiral modes remains experimentally elusive. Here we report the experimental observation of such vortex-string chiral modes using a metamaterial system. This is implemented by inhomogeneous perturbation of a Yang-monopole phononic metamaterial. The measured linear dispersion and modal profiles confirm the existence of topological modes bound to and propagating along the vortex string with the chiral anomaly. Our work not only provides a platform for studying diverse cosmic topological defects in astrophysics but also offers intriguing device applications as topological fibres in signal processing and communication techniques.Comment: 3 Figure

    DDI-MuG: Multi-aspect graphs for drug-drug interaction extraction

    Get PDF
    IntroductionDrug-drug interaction (DDI) may lead to adverse reactions in patients, thus it is important to extract such knowledge from biomedical texts. However, previously proposed approaches typically focus on capturing sentence-aspect information while ignoring valuable knowledge concerning the whole corpus. In this paper, we propose a Multi-aspect Graph-based DDI extraction model, named DDI-MuG.MethodsWe first employ a bio-specific pre-trained language model to obtain the token contextualized representations. Then we use two graphs to get syntactic information from input instance and word co-occurrence information within the entire corpus, respectively. Finally, we combine the representations of drug entities and verb tokens for the final classificationResultsTo validate the effectiveness of the proposed model, we perform extensive experiments on two widely used DDI extraction dataset, DDIExtraction-2013 and TAC 2018. It is encouraging to see that our model outperforms all twelve state-of-the-art models.DiscussionIn contrast to the majority of earlier models that rely on the black-box approach, our model enables visualization of crucial words and their interrelationships by utilizing edge information from two graphs. To the best of our knowledge, this is the first model that explores multi-aspect graphs to the DDI extraction task, and we hope it can establish a foundation for more robust multi-aspect works in the future

    Robust estimation of bacterial cell count from optical density

    Get PDF
    Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data

    Roadmap on energy harvesting materials

    Get PDF
    Ambient energy harvesting has great potential to contribute to sustainable development and address growing environmental challenges. Converting waste energy from energy-intensive processes and systems (e.g. combustion engines and furnaces) is crucial to reducing their environmental impact and achieving net-zero emissions. Compact energy harvesters will also be key to powering the exponentially growing smart devices ecosystem that is part of the Internet of Things, thus enabling futuristic applications that can improve our quality of life (e.g. smart homes, smart cities, smart manufacturing, and smart healthcare). To achieve these goals, innovative materials are needed to efficiently convert ambient energy into electricity through various physical mechanisms, such as the photovoltaic effect, thermoelectricity, piezoelectricity, triboelectricity, and radiofrequency wireless power transfer. By bringing together the perspectives of experts in various types of energy harvesting materials, this Roadmap provides extensive insights into recent advances and present challenges in the field. Additionally, the Roadmap analyses the key performance metrics of these technologies in relation to their ultimate energy conversion limits. Building on these insights, the Roadmap outlines promising directions for future research to fully harness the potential of energy harvesting materials for green energy anytime, anywhere

    The Effects of De-Capacity Policy on Steel and Coal Firms’ Profitability: Evidence from China’s Listed Companies

    No full text
    Chinese overcapacity in the steel and coal industry has been on the rise since 2013, which leads to the misallocation of resources and decreases in production efficiency. In 2015, the Chinese central government adopted a series of de-capacity policies to resolve excess capacity and improve corporate profitability. However, there is scant evidence on the impacts of de-capacity policies on the firm profitability. Based on the data from Chinese listed companies in the steel and coal industry, this study constructs the difference-in-difference (DID) method to investigate the effects of the de-capacity policy on the profitability of listed companies in the steel and coal industry empirically. The results show that the de-capacity policy significantly increases the return on equity (ROE) of the experimental group, which is higher than that of the control group by 12.4%. That is partially because of the improvement in gross profit margin, management efficiency, and return on manpower due to the de-capacity policy. This study offers new evidence on the efficiency of China’s de-capacity policy toward the steel and coal industries through data at the enterprise level
    corecore