27 research outputs found
GoSum: Extractive Summarization of Long Documents by Reinforcement Learning and Graph Organized discourse state
Extracting summaries from long documents can be regarded as sentence
classification using the structural information of the documents. How to use
such structural information to summarize a document is challenging. In this
paper, we propose GoSum, a novel graph and reinforcement learning based
extractive model for long-paper summarization. In particular, GoSum encodes
sentence states in reinforcement learning by building a heterogeneous graph for
each input document at different discourse levels. An edge in the graph
reflects the discourse hierarchy of a document for restraining the semantic
drifts across section boundaries. We evaluate GoSum on two datasets of
scientific articles summarization: PubMed and arXiv. The experimental results
have demonstrated that GoSum achieve state-of-the-art results compared with
strong baselines of both extractive and abstractive models. The ablation
studies further validate that the performance of our GoSum benefits from the
use of discourse information
HELLaMA: LLaMA-based Table to Text Generation by Highlighting the Important Evidence
Large models have demonstrated significant progress across various domains,
particularly in tasks related to text generation. In the domain of Table to
Text, many Large Language Model (LLM)-based methods currently resort to
modifying prompts to invoke public APIs, incurring potential costs and
information leaks. With the advent of open-source large models, fine-tuning
LLMs has become feasible. In this study, we conducted parameter-efficient
fine-tuning on the LLaMA2 model. Distinguishing itself from previous
fine-tuning-based table-to-text methods, our approach involves injecting
reasoning information into the input by emphasizing table-specific row data.
Our model consists of two modules: 1) a table reasoner that identifies relevant
row evidence, and 2) a table summarizer that generates sentences based on the
highlighted table. To facilitate this, we propose a search strategy to
construct reasoning labels for training the table reasoner. On both the FetaQA
and QTSumm datasets, our approach achieved state-of-the-art results.
Additionally, we observed that highlighting input tables significantly enhances
the model's performance and provides valuable interpretability
Biomedical Entity Recognition by Detection and Matching
Biomedical named entity recognition (BNER) serves as the foundation for
numerous biomedical text mining tasks. Unlike general NER, BNER require a
comprehensive grasp of the domain, and incorporating external knowledge beyond
training data poses a significant challenge. In this study, we propose a novel
BNER framework called DMNER. By leveraging existing entity representation
models SAPBERT, we tackle BNER as a two-step process: entity boundary detection
and biomedical entity matching. DMNER exhibits applicability across multiple
NER scenarios: 1) In supervised NER, we observe that DMNER effectively
rectifies the output of baseline NER models, thereby further enhancing
performance. 2) In distantly supervised NER, combining MRC and AutoNER as span
boundary detectors enables DMNER to achieve satisfactory results. 3) For
training NER by merging multiple datasets, we adopt a framework similar to
DS-NER but additionally leverage ChatGPT to obtain high-quality phrases in the
training. Through extensive experiments conducted on 10 benchmark datasets, we
demonstrate the versatility and effectiveness of DMNER.Comment: 9 pages content, 2 pages appendi
Proteomics study of changes in soybean lines resistant and sensitive to Phytophthora sojae
<p>Abstract</p> <p>Background</p> <p><it>Phytophthora sojae </it>causes soybean root and stem rot, resulting in an annual loss of 1-2 billion US dollars in soybean production worldwide. A proteomic technique was used to determine the effects on soybean hypocotyls of infection with <it>P. sojae</it>.</p> <p>Results</p> <p>In the present study, 46 differentially expressed proteins were identified in soybean hypocotyls infected with <it>P. sojae</it>, using two-dimensional electrophoresis and matrix-assisted laser desorption/ionization tandem time of flight (MALDI-TOF/TOF). The expression levels of 26 proteins were significantly affected at various time points in the tolerant soybean line, Yudou25, (12 up-regulated and 14 down-regulated). In contrast, in the sensitive soybean line, NG6255, only 20 proteins were significantly affected (11 up-regulated and 9 down-regulated). Among these proteins, 26% were related to energy regulation, 15% to protein destination and storage, 11% to defense against disease, 11% to metabolism, 9% to protein synthesis, 4% to secondary metabolism, and 24% were of unknown function.</p> <p>Conclusion</p> <p>Our study provides important information on the use of proteomic methods for studying protein regulation during plant-oomycete interactions.</p
Self-organized Voids Revisited: Experimental Verification of the Formation Mechanism*
In this paper, several experiments were conducted to further clarify the
formation mechanism of self organized void array induced by a single laser
beam, including energy-related experiments, refractive-index-contrast-related
experiments, depth-related experiments and effective-numerical-aperture
experiment. These experiments indicate that the interface spherical aberration
is indeed responsible for the formation of void arrays
The United States COVID-19 Forecast Hub dataset
Academic researchers, government agencies, industry groups, and individuals have produced forecasts at an unprecedented scale during the COVID-19 pandemic. To leverage these forecasts, the United States Centers for Disease Control and Prevention (CDC) partnered with an academic research lab at the University of Massachusetts Amherst to create the US COVID-19 Forecast Hub. Launched in April 2020, the Forecast Hub is a dataset with point and probabilistic forecasts of incident cases, incident hospitalizations, incident deaths, and cumulative deaths due to COVID-19 at county, state, and national, levels in the United States. Included forecasts represent a variety of modeling approaches, data sources, and assumptions regarding the spread of COVID-19. The goal of this dataset is to establish a standardized and comparable set of short-term forecasts from modeling teams. These data can be used to develop ensemble models, communicate forecasts to the public, create visualizations, compare models, and inform policies regarding COVID-19 mitigation. These open-source data are available via download from GitHub, through an online API, and through R packages