Search CORE

34 research outputs found

MemSum-DQA: Adapting An Efficient Long Document Extractive Summarizer for Document Question Answering

Author: Gao Yingqiang
Gu Nianlong
Hahnloser Richard H. R.
Publication venue
Publication date: 10/10/2023
Field of study

We introduce MemSum-DQA, an efficient system for document question answering (DQA) that leverages MemSum, a long document extractive summarizer. By prefixing each text block in the parsed document with the provided question and question type, MemSum-DQA selectively extracts text blocks as answers from documents. On full-document answering tasks, this approach yields a 9% improvement in exact match accuracy over prior state-of-the-art baselines. Notably, MemSum-DQA excels in addressing questions related to child-relationship understanding, underscoring the potential of extractive summarization techniques for DQA tasks.Comment: This paper is the technical research paper of CIKM 2023 DocIU challenges. The authors received the CIKM 2023 DocIU Winner Award, sponsored by Google, Microsoft, and the Centre for data-driven geoscienc

arXiv.org e-Print Archive

MemSum-DQA: Adapting An Efficient Long Document Extractive Summarizer for Document Question Answering

Author: Gao Yingqiang
Gu Nianlong
Hahnloser Richard H R
Publication venue
Publication date: 01/01/2023
Field of study

ZORA

Local Citation Recommendation with Hierarchical-Attention Text Encoder and SciBERT-based Reranking

Author: Gao Yingqiang
Gu Nianlong
Hahnloser Richard H R
Publication venue
Publication date: 01/01/2021
Field of study

The goal of local citation recommendation is to recommend a missing reference from the local citation context and optionally also from the global context. To balance the tradeoff between speed and accuracy of citation recommendation in the context of a large-scale paper database, a viable approach is to first prefetch a limited number of relevant documents using efficient ranking methods and then to perform a fine-grained reranking using more sophisticated models. In that vein, BM25 has been found to be a tough-to-beat approach to prefetching, which is why recent work has focused mainly on the reranking step. Even so, we explore prefetching with nearest neighbor search among text embeddings constructed by a hierarchical attention network. When coupled with a SciBERT reranker fine-tuned on local citation recommendation tasks, our hierarchical Attention encoder (HAtten) achieves high prefetch recall for a given number of candidates to be reranked. Consequently, our reranker requires fewer prefetch candidates to rerank, yet still achieves state-of-the-art performance on various local citation recommendation datasets such as ACL-200, FullTextPeerRead, RefSeer, and arXiv

ZORA

Do Discourse Indicators Reflect the Main Arguments in Scientific Papers?

Author: Gao Yingqiang
Gu Nianlong
Hahnloser Richard H R
Lam Jessica
Publication venue: s.n.
Publication date: 17/10/2022
Field of study

In scientific papers, arguments are essential for explaining authors' findings. As substrates of the reasoning process, arguments are often decorated with discourse indicators such as ``which shows that'' or ``suggesting that''. However, it remains understudied whether discourse indicators by themselves can be used as an effective marker of the local argument components (LACs) in the body text that support the main claim in the abstract, i.e., the global argument. In this work, we investigate whether discourse indicators reflect the global premise and conclusion. We construct a set of regular expressions for over 100 word- and phrase-level discourse indicators and measure the alignment of LACs extracted by discourse indicators with the global arguments. We find a positive correlation between the alignment of local premises and local conclusions. However, compared to a simple textual intersection baseline, discourse indicators achieve lower ROUGE recall and have limited capability of extracting LACs relevant to the global argument; thus their role in scientific reasoning is less salient as expected

ZORA

GreedyCAS: Unsupervised Scientific Abstract Segmentation with Normalized Mutual Information

Author: Gao Yingqiang
Gu Nianlong
Hahnloser Richard H R
Lam Jessica
Publication venue: Association for Computational Linguistics
Publication date: 01/12/2023
Field of study

The abstracts of scientific papers typically contain both premises (e.g., background and observations) and conclusions. Although conclusion sentences are highlighted in structured abstracts, in non-structured abstracts the concluding information is not explicitly marked, which makes the automatic segmentation of conclusions from scientific abstracts a challenging task. In this work, we explore Normalized Mutual Information (NMI) as a means for abstract segmentation. We consider each abstract as a recurrent cycle of sentences and place two segmentation boundaries by greedily optimizing the NMI score between the two segments, assuming that conclusions are strongly semantically linked with preceding premises. On non-structured abstracts, our proposed unsupervised approach GreedyCAS achieves the best performance across all evaluation metrics; on structured abstracts, GreedyCAS outperforms all baseline methods measured by Pk. The strong correlation of NMI to our evaluation metrics reveals the effectiveness of NMI for abstract segmentation

ZORA

Character-Level Translation with Self-attention

Author: Gao Yingqiang
Hahnloser Richard H. R.
Hu Yuhuang
Nikolov Nikola I.
Publication venue
Publication date: 01/01/2020
Field of study

We explore the suitability of self-attention models for character-level neural machine translation. We test the standard transformer model, as well as a novel variant in which the encoder block combines information from nearby characters using convolutions. We perform extensive experiments on WMT and UN datasets, testing both bilingual and multilingual translation to English using up to three input languages (French, Spanish, and Chinese). Our transformer variant consistently outperforms the standard transformer at the character-level and converges faster while learning more robust character-level alignments.Comment: ACL 202

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

ZORA

Using a k-means clustering to identify novel phenotypes of acute ischemic stroke and development of its Clinlabomics models

Author: Boyao Yuan
Chongge You
Lina Gao
Qian Wu
Yao Jiang
Yingqiang Dang
Publication venue: Frontiers Media S.A.
Publication date: 01/03/2024
Field of study

ObjectiveAcute ischemic stroke (AIS) is a heterogeneous condition. To stratify the heterogeneity, identify novel phenotypes, and develop Clinlabomics models of phenotypes that can conduct more personalized treatments for AIS.MethodsIn a retrospective analysis, consecutive AIS and non-AIS inpatients were enrolled. An unsupervised k-means clustering algorithm was used to classify AIS patients into distinct novel phenotypes. Besides, the intergroup comparisons across the phenotypes were performed in clinical and laboratory data. Next, the least absolute shrinkage and selection operator (LASSO) algorithm was used to select essential variables. In addition, Clinlabomics predictive models of phenotypes were established by a support vector machines (SVM) classifier. We used the area under curve (AUC), accuracy, sensitivity, and specificity to evaluate the performance of the models.ResultsOf the three derived phenotypes in 909 AIS patients [median age 64 (IQR: 17) years, 69% male], in phenotype 1 (N = 401), patients were relatively young and obese and had significantly elevated levels of lipids. Phenotype 2 (N = 463) was associated with abnormal ion levels. Phenotype 3 (N = 45) was characterized by the highest level of inflammation, accompanied by mild multiple-organ dysfunction. The external validation cohort prospectively collected 507 AIS patients [median age 60 (IQR: 18) years, 70% male]. Phenotype characteristics were similar in the validation cohort. After LASSO analysis, Clinlabomics models of phenotype 1 and 2 were constructed by the SVM algorithm, yielding high AUC (0.977, 95% CI: 0.961–0.993 and 0.984, 95% CI: 0.971–0.997), accuracy (0.936, 95% CI: 0.922–0.956 and 0.952, 95% CI: 0.938–0.972), sensitivity (0.984, 95% CI: 0.968–0.998 and 0.958, 95% CI: 0.939–0.984), and specificity (0.892, 95% CI: 0.874–0.926 and 0.945, 95% CI: 0.923–0.969).ConclusionIn this study, three novel phenotypes that reflected the abnormal variables of AIS patients were identified, and the Clinlabomics models of phenotypes were established, which are conducive to individualized treatments

Directory of Open Access Journals

MoS2 Nanosheets Assembled on Three-Way Nitrogen-Doped Carbon Tubes for Photocatalytic Water Splitting

Author: Bo Tang
Guanwei Cui
Hongyu Cui
Ping Chen
Wen Gao
Xifeng Shi
Yan Liu
Yanfei Fan
Yingqiang Zhao
Yujia Zhang
Publication venue: 'Frontiers Media SA'
Publication date: 01/05/2019
Field of study

In this work, a micron-sized three-way nitrogen-doped carbon tube covered with MoS2 nanosheets (TNCT@MoS2) was synthesized and applied in photocatalytic water splitting without any sacrificial agents for the first time. The micron-sized three-way nitrogen-doped carbon tube (TNCT) was facilely synthesized by the calcination of commercial sponge. The MoS2 nanosheets were assembled on the carbon tubes by a hydrothermal method. Compared with MoS2, the TNCT@MoS2 heterostructures showed higher H2 evolution rate, which was ascribed to the improved charge separation efficiency and the increased active sites afforded by the TNCT

Directory of Open Access Journals