43 research outputs found
PDF-VQA: A New Dataset for Real-World VQA on PDF Documents
Document-based Visual Question Answering examines the document understanding
of document images in conditions of natural language questions. We proposed a
new document-based VQA dataset, PDF-VQA, to comprehensively examine the
document understanding from various aspects, including document element
recognition, document layout structural understanding as well as contextual
understanding and key information extraction. Our PDF-VQA dataset extends the
current scale of document understanding that limits on the single document page
to the new scale that asks questions over the full document of multiple pages.
We also propose a new graph-based VQA model that explicitly integrates the
spatial and hierarchically structural relationships between different document
elements to boost the document structural understanding. The performances are
compared with several baselines over different question types and
tasks\footnote{The full dataset will be released after paper acceptance
Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis
Recognizing the layout of unstructured digital documents is crucial when
parsing the documents into the structured, machine-readable format for
downstream applications. Recent studies in Document Layout Analysis usually
rely on computer vision models to understand documents while ignoring other
information, such as context information or relation of document components,
which are vital to capture. Our Doc-GCN presents an effective way to harmonize
and integrate heterogeneous aspects for Document Layout Analysis. We first
construct graphs to explicitly describe four main aspects, including syntactic,
semantic, density, and appearance/visual information. Then, we apply graph
convolutional networks for representing each aspect of information and use
pooling to integrate them. Finally, we aggregate each aspect and feed them into
2-layer MLPs for document layout component classification. Our Doc-GCN achieves
new state-of-the-art results in three widely used DLA datasets.Comment: Accepted by COLING 202
Topological triply-degenerate point with double Fermi arcs
Unconventional chiral particles have recently been predicted to appear in
certain three dimensional (3D) crystal structures containing three- or
more-fold linear band degeneracy points (BDPs). These BDPs carry topological
charges, but are distinct from the standard twofold Weyl points or fourfold
Dirac points, and cannot be described in terms of an emergent relativistic
field theory. Here, we report on the experimental observation of a topological
threefold BDP in a 3D phononic crystal. Using direct acoustic field mapping, we
demonstrate the existence of the threefold BDP in the bulk bandstructure, as
well as doubled Fermi arcs of surface states consistent with a topological
charge of 2. Another novel BDP, similar to a Dirac point but carrying nonzero
topological charge, is connected to the threefold BDP via the doubled Fermi
arcs. These findings pave the way to using these unconventional particles for
exploring new emergent physical phenomena
Form-NLU: Dataset for the Form Language Understanding
Compared to general document analysis tasks, form document structure
understanding and retrieval are challenging. Form documents are typically made
by two types of authors; A form designer, who develops the form structure and
keys, and a form user, who fills out form values based on the provided keys.
Hence, the form values may not be aligned with the form designer's intention
(structure and keys) if a form user gets confused. In this paper, we introduce
Form-NLU, the first novel dataset for form structure understanding and its key
and value information extraction, interpreting the form designer's intent and
the alignment of user-written value on it. It consists of 857 form images, 6k
form keys and values, and 4k table keys and values. Our dataset also includes
three form types: digital, printed, and handwritten, which cover diverse form
appearances and layouts. We propose a robust positional and logical
relation-based form key-value information extraction framework. Using this
dataset, Form-NLU, we first examine strong object detection models for the form
layout understanding, then evaluate the key information extraction task on the
dataset, providing fine-grained results for different types of forms and keys.
Furthermore, we examine it with the off-the-shelf pdf layout extraction tool
and prove its feasibility in real-world cases.Comment: Accepted by SIGIR 202
EVD Surgical Guidance with Retro-Reflective Tool Tracking and Spatial Reconstruction using Head-Mounted Augmented Reality Device
Augmented Reality (AR) has been used to facilitate surgical guidance during
External Ventricular Drain (EVD) surgery, reducing the risks of misplacement in
manual operations. During this procedure, the pivotal challenge is the accurate
estimation of spatial relationship between pre-operative images and actual
patient anatomy in AR environment. In this research, we propose a novel
framework utilizing Time of Flight (ToF) depth sensors integrated in
commercially available AR Head Mounted Devices (HMD) for precise EVD surgical
guidance. As previous studies have proven depth errors for ToF sensors, we
first conducted a comprehensive assessment for the properties of this error on
AR-HMDs. Subsequently, a depth error model and patient-specific model parameter
identification method, is introduced for accurate surface information. After
that, a tracking procedure combining retro-reflective markers and point clouds
is proposed for accurate head tracking, where head surface is reconstructed
using ToF sensor data for spatial registration, avoiding fixing tracking
targets rigidly on the patient's cranium. Firstly, ToF
sensor depth value error was revealed on human skin, indicating the
significance of depth correction. Our results showed that the ToF sensor depth
error was reduced by over using proposed depth correction method on head
phantoms in different materials. Meanwhile, the head surface reconstructed with
corrected depth data achieved sub-millimeter accuracy. Experiment on a sheep
head revealed reconstruction error. Furthermore, a user study was
conducted for the performance of proposed framework in simulated EVD surgery,
where 5 surgeons performed 9 k-wire injections on a head phantom with virtual
guidance. Results of this study revealed translational
accuracy and orientational accuracy
Observation of vortex-string chiral modes in metamaterials
As a hypothetical topological defect in the geometry of spacetime, vortex
strings play a crucial role in shaping the clusters of galaxies that exist
today, and their distinct features can provide observable clues about the early
universe's evolution. A key feature of vortex strings is that they can interact
with Weyl fermionic modes and support topological chiral-anomaly states with
massless dispersions at the core of strings. To date, despite many attempts to
detect vortex strings in astrophysics or to emulate them in artificially
created systems, observation of these topological vortex-string chiral modes
remains experimentally elusive. Here we report the experimental observation of
such vortex-string chiral modes using a metamaterial system. This is
implemented by inhomogeneous perturbation of a Yang-monopole phononic
metamaterial. The measured linear dispersion and modal profiles confirm the
existence of topological modes bound to and propagating along the vortex string
with the chiral anomaly. Our work not only provides a platform for studying
diverse cosmic topological defects in astrophysics but also offers intriguing
device applications as topological fibres in signal processing and
communication techniques.Comment: 3 Figure
DDI-MuG: Multi-aspect graphs for drug-drug interaction extraction
IntroductionDrug-drug interaction (DDI) may lead to adverse reactions in patients, thus it is important to extract such knowledge from biomedical texts. However, previously proposed approaches typically focus on capturing sentence-aspect information while ignoring valuable knowledge concerning the whole corpus. In this paper, we propose a Multi-aspect Graph-based DDI extraction model, named DDI-MuG.MethodsWe first employ a bio-specific pre-trained language model to obtain the token contextualized representations. Then we use two graphs to get syntactic information from input instance and word co-occurrence information within the entire corpus, respectively. Finally, we combine the representations of drug entities and verb tokens for the final classificationResultsTo validate the effectiveness of the proposed model, we perform extensive experiments on two widely used DDI extraction dataset, DDIExtraction-2013 and TAC 2018. It is encouraging to see that our model outperforms all twelve state-of-the-art models.DiscussionIn contrast to the majority of earlier models that rely on the black-box approach, our model enables visualization of crucial words and their interrelationships by utilizing edge information from two graphs. To the best of our knowledge, this is the first model that explores multi-aspect graphs to the DDI extraction task, and we hope it can establish a foundation for more robust multi-aspect works in the future
Robust estimation of bacterial cell count from optical density
Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data
Roadmap on energy harvesting materials
Ambient energy harvesting has great potential to contribute to sustainable development and address growing environmental challenges. Converting waste energy from energy-intensive processes and systems (e.g. combustion engines and furnaces) is crucial to reducing their environmental impact and achieving net-zero emissions. Compact energy harvesters will also be key to powering the exponentially growing smart devices ecosystem that is part of the Internet of Things, thus enabling futuristic applications that can improve our quality of life (e.g. smart homes, smart cities, smart manufacturing, and smart healthcare). To achieve these goals, innovative materials are needed to efficiently convert ambient energy into electricity through various physical mechanisms, such as the photovoltaic effect, thermoelectricity, piezoelectricity, triboelectricity, and radiofrequency wireless power transfer. By bringing together the perspectives of experts in various types of energy harvesting materials, this Roadmap provides extensive insights into recent advances and present challenges in the field. Additionally, the Roadmap analyses the key performance metrics of these technologies in relation to their ultimate energy conversion limits. Building on these insights, the Roadmap outlines promising directions for future research to fully harness the potential of energy harvesting materials for green energy anytime, anywhere
The Effects of De-Capacity Policy on Steel and Coal Firms’ Profitability: Evidence from China’s Listed Companies
Chinese overcapacity in the steel and coal industry has been on the rise since 2013, which leads to the misallocation of resources and decreases in production efficiency. In 2015, the Chinese central government adopted a series of de-capacity policies to resolve excess capacity and improve corporate profitability. However, there is scant evidence on the impacts of de-capacity policies on the firm profitability. Based on the data from Chinese listed companies in the steel and coal industry, this study constructs the difference-in-difference (DID) method to investigate the effects of the de-capacity policy on the profitability of listed companies in the steel and coal industry empirically. The results show that the de-capacity policy significantly increases the return on equity (ROE) of the experimental group, which is higher than that of the control group by 12.4%. That is partially because of the improvement in gross profit margin, management efficiency, and return on manpower due to the de-capacity policy. This study offers new evidence on the efficiency of China’s de-capacity policy toward the steel and coal industries through data at the enterprise level