27 research outputs found

    GoSum: Extractive Summarization of Long Documents by Reinforcement Learning and Graph Organized discourse state

    Full text link
    Extracting summaries from long documents can be regarded as sentence classification using the structural information of the documents. How to use such structural information to summarize a document is challenging. In this paper, we propose GoSum, a novel graph and reinforcement learning based extractive model for long-paper summarization. In particular, GoSum encodes sentence states in reinforcement learning by building a heterogeneous graph for each input document at different discourse levels. An edge in the graph reflects the discourse hierarchy of a document for restraining the semantic drifts across section boundaries. We evaluate GoSum on two datasets of scientific articles summarization: PubMed and arXiv. The experimental results have demonstrated that GoSum achieve state-of-the-art results compared with strong baselines of both extractive and abstractive models. The ablation studies further validate that the performance of our GoSum benefits from the use of discourse information

    HELLaMA: LLaMA-based Table to Text Generation by Highlighting the Important Evidence

    Full text link
    Large models have demonstrated significant progress across various domains, particularly in tasks related to text generation. In the domain of Table to Text, many Large Language Model (LLM)-based methods currently resort to modifying prompts to invoke public APIs, incurring potential costs and information leaks. With the advent of open-source large models, fine-tuning LLMs has become feasible. In this study, we conducted parameter-efficient fine-tuning on the LLaMA2 model. Distinguishing itself from previous fine-tuning-based table-to-text methods, our approach involves injecting reasoning information into the input by emphasizing table-specific row data. Our model consists of two modules: 1) a table reasoner that identifies relevant row evidence, and 2) a table summarizer that generates sentences based on the highlighted table. To facilitate this, we propose a search strategy to construct reasoning labels for training the table reasoner. On both the FetaQA and QTSumm datasets, our approach achieved state-of-the-art results. Additionally, we observed that highlighting input tables significantly enhances the model's performance and provides valuable interpretability

    Biomedical Entity Recognition by Detection and Matching

    Full text link
    Biomedical named entity recognition (BNER) serves as the foundation for numerous biomedical text mining tasks. Unlike general NER, BNER require a comprehensive grasp of the domain, and incorporating external knowledge beyond training data poses a significant challenge. In this study, we propose a novel BNER framework called DMNER. By leveraging existing entity representation models SAPBERT, we tackle BNER as a two-step process: entity boundary detection and biomedical entity matching. DMNER exhibits applicability across multiple NER scenarios: 1) In supervised NER, we observe that DMNER effectively rectifies the output of baseline NER models, thereby further enhancing performance. 2) In distantly supervised NER, combining MRC and AutoNER as span boundary detectors enables DMNER to achieve satisfactory results. 3) For training NER by merging multiple datasets, we adopt a framework similar to DS-NER but additionally leverage ChatGPT to obtain high-quality phrases in the training. Through extensive experiments conducted on 10 benchmark datasets, we demonstrate the versatility and effectiveness of DMNER.Comment: 9 pages content, 2 pages appendi

    Proteomics study of changes in soybean lines resistant and sensitive to Phytophthora sojae

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Phytophthora sojae </it>causes soybean root and stem rot, resulting in an annual loss of 1-2 billion US dollars in soybean production worldwide. A proteomic technique was used to determine the effects on soybean hypocotyls of infection with <it>P. sojae</it>.</p> <p>Results</p> <p>In the present study, 46 differentially expressed proteins were identified in soybean hypocotyls infected with <it>P. sojae</it>, using two-dimensional electrophoresis and matrix-assisted laser desorption/ionization tandem time of flight (MALDI-TOF/TOF). The expression levels of 26 proteins were significantly affected at various time points in the tolerant soybean line, Yudou25, (12 up-regulated and 14 down-regulated). In contrast, in the sensitive soybean line, NG6255, only 20 proteins were significantly affected (11 up-regulated and 9 down-regulated). Among these proteins, 26% were related to energy regulation, 15% to protein destination and storage, 11% to defense against disease, 11% to metabolism, 9% to protein synthesis, 4% to secondary metabolism, and 24% were of unknown function.</p> <p>Conclusion</p> <p>Our study provides important information on the use of proteomic methods for studying protein regulation during plant-oomycete interactions.</p

    Self-organized Voids Revisited: Experimental Verification of the Formation Mechanism*

    Get PDF
    In this paper, several experiments were conducted to further clarify the formation mechanism of self organized void array induced by a single laser beam, including energy-related experiments, refractive-index-contrast-related experiments, depth-related experiments and effective-numerical-aperture experiment. These experiments indicate that the interface spherical aberration is indeed responsible for the formation of void arrays

    The United States COVID-19 Forecast Hub dataset

    Get PDF
    Academic researchers, government agencies, industry groups, and individuals have produced forecasts at an unprecedented scale during the COVID-19 pandemic. To leverage these forecasts, the United States Centers for Disease Control and Prevention (CDC) partnered with an academic research lab at the University of Massachusetts Amherst to create the US COVID-19 Forecast Hub. Launched in April 2020, the Forecast Hub is a dataset with point and probabilistic forecasts of incident cases, incident hospitalizations, incident deaths, and cumulative deaths due to COVID-19 at county, state, and national, levels in the United States. Included forecasts represent a variety of modeling approaches, data sources, and assumptions regarding the spread of COVID-19. The goal of this dataset is to establish a standardized and comparable set of short-term forecasts from modeling teams. These data can be used to develop ensemble models, communicate forecasts to the public, create visualizations, compare models, and inform policies regarding COVID-19 mitigation. These open-source data are available via download from GitHub, through an online API, and through R packages
    corecore