309 research outputs found

    A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models

    Full text link
    Text summarization research has undergone several significant transformations with the advent of deep neural networks, pre-trained language models (PLMs), and recent large language models (LLMs). This survey thus provides a comprehensive review of the research progress and evolution in text summarization through the lens of these paradigm shifts. It is organized into two main parts: (1) a detailed overview of datasets, evaluation metrics, and summarization methods before the LLM era, encompassing traditional statistical methods, deep learning approaches, and PLM fine-tuning techniques, and (2) the first detailed examination of recent advancements in benchmarking, modeling, and evaluating summarization in the LLM era. By synthesizing existing literature and presenting a cohesive overview, this survey also discusses research trends, open challenges, and proposes promising research directions in summarization, aiming to guide researchers through the evolving landscape of summarization research.Comment: 30 pages, 8 figures, 6 table

    Crustal Structure of the Indochina Peninsula From Ambient Noise Tomography

    Get PDF
    The collision between the Indian and Eurasian plates promotes the southeastward extrusion of the Indochina Peninsula while the internal dynamics of its crustal deformation remain enigmatic. Here, we make use of seismic data from 38 stations and employ the ambient noise tomography to construct a 3‐D crustal shear‐wave velocity (Vs) model beneath the Indochina Peninsula. A low‐Vs anomaly is revealed in the mid‐lower crust of the Shan‐Thai Block and probably corresponds to the southern extension of the crustal flow from SE Tibet. Although the Khorat Plateau behaves as a rigid block, the observed low‐Vs anomalies in the lower crust and also below the Moho indicate that the crust may have been partially modified by mantle‐derived melts. The strike‐slip shearing motions of the Red River Fault may have dominantly developed crustal deformation at its western flank where a low‐Vs anomaly is observed at the upper‐middle crust

    Study on sedimentation stability of silicone oil-based magnetorheological fluids with fumed silica as additive

    Get PDF
    In order to study the sedimentation stability of silicone oil-based magnetorheological fluids with fumed silica as additive, magnetorheological fluids with different mass fractions of fumed silica, particle sizes of carbonyl iron powder and viscosities of silicone oil were prepared. The sedimentation rate of magnetorheological fluids was calculated by observation method, and the zero-field viscosity of magnetorheological fluids was measured by viscometer. The results show that the sedimentation rate and viscosity of magnetorheological fluids increase gradually with the increase of the mass fraction of fumed silica. The mass fraction of fumed silica should not be constant for magnetorheological fluids, but should be determined according to the content of silicone oil in magnetorheological fluids. With the increase of average diameter of carbonyl iron powder, the sedimentation stability of magnetorheological fluids becomes worse. With the increase of viscosity of silicone oil, the sedimentation stability of magnetorheological fluids does not increase significantly. However, the high viscosity of silicone oil will result in wall hanging phenomenon, and increase the start-up difficulty of magnetorheological device. With 2.5 wt% of fumed silica for silicone oil, the magnetorheological fluids has good sedimentation stability and suitable zero-field viscosity

    Unsupervised Multi-document Summarization with Holistic Inference

    Full text link
    Multi-document summarization aims to obtain core information from a collection of documents written on the same topic. This paper proposes a new holistic framework for unsupervised multi-document extractive summarization. Our method incorporates the holistic beam search inference method associated with the holistic measurements, named Subset Representative Index (SRI). SRI balances the importance and diversity of a subset of sentences from the source documents and can be calculated in unsupervised and adaptive manners. To demonstrate the effectiveness of our method, we conduct extensive experiments on both small and large-scale multi-document summarization datasets under both unsupervised and adaptive settings. The proposed method outperforms strong baselines by a significant margin, as indicated by the resulting ROUGE scores and diversity measures. Our findings also suggest that diversity is essential for improving multi-document summary performance.Comment: Findings of IJCNLP-AACL 202

    A network‐based variable selection approach for identification of modules and biomarker genes associated with end‐stage kidney disease

    Full text link
    AimsIntervention for end‐stage kidney disease (ESKD), which is associated with adverse prognoses and major economic burdens, is challenging due to its complex pathogenesis. The study was performed to identify biomarker genes and molecular mechanisms for ESKD by bioinformatics approach.MethodsUsing the Gene Expression Omnibus dataset GSE37171, this study identified pathways and genomic biomarkers associated with ESKD via a multi‐stage knowledge discovery process, including identification of modules of genes by weighted gene co‐expression network analysis, discovery of important involved pathways by Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment analyses, selection of differentially expressed genes by the empirical Bayes method, and screening biomarker genes by the least absolute shrinkage and selection operator (Lasso) logistic regression. The results were validated using GSE70528, an independent testing dataset.ResultsThree clinically important gene modules associated with ESKD, were identified by weighted gene co‐expression network analysis. Within these modules, Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment analyses revealed important biological pathways involved in ESKD, including transforming growth factor‐β and Wnt signalling, RNA‐splicing, autophagy and chromatin and histone modification. Furthermore, Lasso logistic regression was conducted to identify five final genes, namely, CNOT8, MST4, PPP2CB, PCSK7 and RBBP4 that are differentially expressed and associated with ESKD. The accuracy of the final model in distinguishing the ESKD cases and controls was 96.8% and 91.7% in the training and validation datasets, respectively.ConclusionNetwork‐based variable selection approaches can identify biological pathways and biomarker genes associated with ESKD. The findings may inform more in‐depth follow‐up research and effective therapy.SUMMARY AT A GLANCEThis gene–gene network analysis to identify genes associated with end‐stage renal disease is an important step, albeit early, towards the discovery of biomarkers using peripheral blood cells. The findings also provide insight on disease pathophysiology at the molecular level, and hence therapeutic targets for future research.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/162799/2/nep13655.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/162799/1/nep13655_am.pd

    Prediction of DNA i-motifs via machine learning

    Get PDF
    i-Motifs (iMs), are secondary structures formed in cytosine-rich DNA sequences and are involved in multiple functions in the genome. Although putative iM forming sequences are widely distributed in the human genome, the folding status and strength of putative iMs vary dramatically. Much previous research on iM has focused on assessing the iM folding properties using biophysical experiments. However, there are no dedicated computational tools for predicting the folding status and strength of iM structures. Here, we introduce a machine learning pipeline, iM-Seeker, to predict both folding status and structural stability of DNA iMs. The programme iM-Seeker incorporates a Balanced Random Forest classifier trained on genome-wide iMab antibody-based CUT&Tag sequencing data to predict the folding status and an Extreme Gradient Boosting regressor to estimate the folding strength according to both literature biophysical data and our in-house biophysical experiments. iM-Seeker predicts DNA iM folding status with a classification accuracy of 81% and estimates the folding strength with coefficient of determination (R2) of 0.642 on the test set. Model interpretation confirms that the nucleotide composition of the C-rich sequence significantly affects iM stability, with a positive correlation with sequences containing cytosine and thymine and a negative correlation with guanine and adenine

    Genetic characteristics of common variable immunodeficiency patients with autoimmunity

    Get PDF
    Background: The pathogenesis of common variable immunodeficiency disorder (CVID) is complex, especially when combined with autoimmunity. Genetic factors may be potential explanations for this complex situation, and whole genome sequencing (WGS) provide the basis for this potential.Methods: Genetic information of patients with CVID with autoimmunity, together with their first-degree relatives, was collected through WGS. The association between genetic factors and clinical phenotypes was studied using genetic analysis strategies such as sporadic and pedigree.Results: We collected 42 blood samples for WGS (16 CVID patients and 26 first-degree relatives of healthy controls). Through pedigree, sporadic screening strategies and low-frequency deleterious screening of rare diseases, we obtained 9,148 mutation sites, including 8,171 single-nucleotide variants (SNVs) and 977 Insertion-deletions (InDels). Finally, we obtained a total of 28 candidate genes (32 loci), of which the most common mutant was LRBA. The most common autoimmunity in the 16 patients was systematic lupus erythematosis. Through KEGG pathway enrichment, we identified the top ten signaling pathways, including “primary immunodeficiency”, “JAK-STAT signaling pathway”, and “T-cell receptor signaling pathway”. We used PyMOL to predict and analyse the three-dimensional protein structures of the NFKB1, RAG1, TIRAP, NCF2, and MYB genes. In addition, we constructed a PPI network by combining candidate mutants with genes associated with CVID in the OMIM database via the STRING database.Conclusion: The genetic background of CVID includes not only monogenic origins but also oligogenic effects. Our study showed that immunodeficiency and autoimmunity may overlap in genetic backgrounds.Clinical Trial Registration: identifier ChiCTR210004403

    Manufacture of titanium alloy materials with bioactive sandblasted surfaces and evaluation of osseointegration properties

    Get PDF
    Titanium alloys are some of the most important orthopedic implant materials currently available. However, their lack of bioactivity and osteoinductivity limits their osseointegration properties, resulting in suboptimal osseointegration between titanium alloy materials and bone interfaces. In this study, we used a novel sandblasting surface modification process to manufacture titanium alloy materials with bioactive sandblasted surfaces and systematically characterized their surface morphology and physicochemical properties. We also analyzed and evaluated the osseointegration between titanium alloy materials with bioactive sandblasted surfaces and bone interfaces by in vitro experiments with co-culture of osteoblasts and in vivo experiments with a rabbit model. In our in vitro experiments, the proliferation, differentiation, and mineralization of the osteoblasts on the surfaces of the materials with bioactive sandblasted surfaces were better than those in the control group. In addition, our in vivo experiments showed that the titanium alloy materials with bioactive sandblasted surfaces were able to promote the growth of trabecular bone on their surfaces compared to controls. These results indicate that the novel titanium alloy material with bioactive sandblasted surface has satisfactory bioactivity and osteoinductivity and exhibit good osseointegration properties, resulting in improved osseointegration between the material and bone interface. This work lays a foundation for subsequent clinical application research into titanium alloy materials with bioactive sandblasted surfaces

    iM-Seeker: a webserver for DNA i-motifs prediction and scoring via automated machine learning

    Get PDF
    DNA, beyond its canonical B-form double helix, adopts various alternative conformations, among which the i-motif, emerging in cytosine-rich sequences under acidic conditions, holds significant biological implications in transcription modulation and telomere biology. Despite recognizing the crucial role of i-motifs, predictive software for i-motif forming sequences has been limited. Addressing this gap, we introduce 'iM-Seeker', an innovative computational platform designed for the prediction and evaluation of i-motifs. iM-Seeker exhibits the capability to identify potential i-motifs within DNA segments or entire genomes, calculating stability scores for each predicted i-motif based on parameters such as the cytosine tracts number, loop lengths, and sequence composition. Furthermore, the webserver leverages automated machine learning (AutoML) to effortlessly fine-tune the optimal i-motif scoring model, incorporating user-supplied experimental data and customised features. As an advanced, versatile approach, 'iM-Seeker' promises to advance genomic research, highlighting the potential of i-motifs in cell biology and therapeutic applications. The webserver is freely available at https://im-seeker.org

    3D digital modelling and identification of pavement typical internal defects based on GPR measured data

    Get PDF
    A three-dimensional ground-penetrating radar (GPR) captures non-destructively internal pavement distress characteristics. However, interpreting radar images and data analysis pose challenges. To improve the accuracy of distress identification, a three-dimensional digital model of internal pavement distress was established. Firstly, initial electromagnetic signal data were pre-processed to effectively eliminate spurious signals and enhance distress characteristic signals. The distress was located, and GPR images of typical distress were extracted and summarised. Next, the 3D dataset was constructed based on the pre-processed electromagnetic echo signals. A 3D digital model of internal pavement distress was generated using the inverse distance weight and ray-casting methods with trilinear interpolation. Finally, relying on the physical project, cores were extracted to validate the distress model. The method effectively reflects the internal pavement distress, and enables realise the interactive images between the pavement entity and the digital model, which can essentially contribute to the digital twin of pavement systems
    corecore