25 research outputs found

    Leveraging Key Information Modeling to Improve Less-Data Constrained News Headline Generation via Duality Fine-Tuning

    Full text link
    Recent language generative models are mostly trained on large-scale datasets, while in some real scenarios, the training datasets are often expensive to obtain and would be small-scale. In this paper we investigate the challenging task of less-data constrained generation, especially when the generated news headlines are short yet expected by readers to keep readable and informative simultaneously. We highlight the key information modeling task and propose a novel duality fine-tuning method by formally defining the probabilistic duality constraints between key information prediction and headline generation tasks. The proposed method can capture more information from limited data, build connections between separate tasks, and is suitable for less-data constrained generation tasks. Furthermore, the method can leverage various pre-trained generative regimes, e.g., autoregressive and encoder-decoder models. We conduct extensive experiments to demonstrate that our method is effective and efficient to achieve improved performance in terms of language modeling metric and informativeness correctness metric on two public datasets.Comment: Accepted by AACL-IJCNLP 2022 main conferenc

    Systematic benchmarking of nanopore Q20+ kit in SARS-CoV-2 whole genome sequencing

    Get PDF
    Whole genome sequencing provides rapid insight into key information about the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), such as virus typing and key mutation site, and this information is important for precise prevention, control and tracing of coronavirus disease 2019 (COVID-19) outbreak in conjunction with the epidemiological information of the case. Nanopore sequencing is widely used around the world for its short sample-to-result time, simple experimental operation and long sequencing reads. However, because nanopore sequencing is a relatively new sequencing technology, many researchers still have doubts about its accuracy. The combination of the newly launched nanopore sequencing Q20+ kit (LSK112) and flow cell R10.4 is a qualitative improvement over the accuracy of the previous kits. In this study, we firstly used LSK112 kit with flow cell R10.4 to sequence the SARS-CoV-2 whole genome, and summarized the sequencing results of the combination of LSK112 kit and flow cell R10.4 for the 1200bp amplicons of SARS-CoV-2. We found that the proportion of sequences with an accuracy of more than 99% reached 30.1%, and the average sequence accuracy reached 98.34%, while the results of the original combination of LSK109 kit and flow cell R9.4.1 were 0.61% and 96.52%, respectively. The mutation site analysis showed that it was completely consistent with the final consensus sequence of next generation sequencing (NGS). The results showed that the combination of LSK112 kit and flow cell R10.4 allowed rapid whole-genome sequencing of SARS-CoV-2 without the need for verification of NGS

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∌99% of the euchromatic genome and is accurate to an error rate of ∌1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    A Regular k-Shrinkage Thresholding Operator for the Removal of Mixed Gaussian-Impulse Noise

    No full text
    The removal of mixed Gaussian-impulse noise plays an important role in many areas, such as remote sensing. However, traditional methods may be unaware of promoting the degree of the sparsity adaptively after decomposing into low rank component and sparse component. In this paper, a new problem formulation with regular spectral k-support norm and regular k-support l1 norm is proposed. A unified framework is developed to capture the intrinsic sparsity structure of all two components. To address the resulting problem, an efficient minimization scheme within the framework of accelerated proximal gradient is proposed. This scheme is achieved by alternating regular k-shrinkage thresholding operator. Experimental comparison with the other state-of-the-art methods demonstrates the efficacy of the proposed method

    Joint attribute chain prediction for zero‐shot learning

    No full text
    Zero‐shot learning (ZSL) aims to classify the objects without any training samples. Attributes are used to transfer knowledge from the training set to testing one in ZSL. Most ZSL methods based on Direct Attribute Prediction (DAP) assume that attributes are independent of each other. In this study, the authors explore the relationship between attributes and propose Joint Attribute Chain Prediction (JACP). Attribute chains are introduced to represent the relations. Conditional probabilities of attributes are estimated orderly along the chain to calculate the joint posteriors of the testing classes without independence assumptions. To reduce the estimation error, attribute relation clustering algorithm is presented to group the long chain into some unrelated small chains. When the max length of chains is one, JACP is essentially identical with DAP. Experiments on three data sets for zero‐shot problem demonstrate the classification accuracy and efficiency of the authors’ algorithm. The results show that mining attribute relations can greatly improve the performance of ZSL effectively

    Effects of Low Pressure Injection on Fuel Atomization and Mixture Formation for Heavy Fuel Engines

    No full text
    The application of direct injection (DI) technology can effectively improve the atomization effect of heavy fuel to reduce the fuel loss of heavy fuel engines (HFE). The fuel spray characteristics directly affect the combustion performance of the engine. To investigate the atomization process and evaporation characteristics of heavy fuel in-cylinder for an air-assisted direct injection (AADI) engine, a simulation calculation model of AADI HFE was established with the use of a computational fluid dynamics tool. The air-assisted injector model and the one-dimensional performance calculation model were verified by test data. The influences of injection timing and injection pressure on the spray characteristics and mixture formation in the engine cylinder were discussed. The results show that the mixture concentration distribution is uniform after the injection timing is advanced, and the mass fraction of the fuel evaporation increases. The earlier injection timing can provide the fuel with sufficient time to evaporate, while the later injection timing will result in increasing the Sauter mean diameter (SMD) of the fuel droplets, and the unevaporated heavy fuel in the combustion chamber tends to become concentrated. With the increase in air injection pressure, the distribution of the mixed gas in the cylinder becomes uniform, and the SMD of the fuel droplets in the cylinder decreases. When the injection pressure is 0.65 MPa and 0.75 MPa, the difference between the SMD of the fuel droplets in-cylinder decreases, and a favorable fuel atomization effect can be maintained

    An Optogenetic‐Controlled Cell Reprogramming System for Driving Cell Fate and Light‐Responsive Chimeric Mice

    No full text
    Abstract Pluripotent stem cells (PSCs) hold great promise for cell‐based therapies, disease modeling, and drug discovery. Classic somatic cell reprogramming to generate induced pluripotent stem cells (iPSCs) is often achieved based on overexpression of transcription factors (TFs). However, this process is limited by side effect of overexpressed TFs and unpredicted targeting of TFs. Pinpoint control over endogenous TFs expression can provide the ability to reprogram cell fate and tissue function. Here, a light‐inducible cell reprogramming (LIRE) system is developed based on a photoreceptor protein cryptochrome system and clustered regularly interspaced short palindromic repeats/nuclease‐deficient CRISPR‐associated protein 9 for induced PSCs reprogramming. This system enables remote, non‐invasive optogenetical regulation of endogenous Sox2 and Oct4 loci to reprogram mouse embryonic fibroblasts into iPSCs (iPSCLIRE) under light‐emitting diode‐based illumination. iPSCLIRE cells can be efficiently differentiated into different cells by upregulating a corresponding TF. iPSCLIRE cells are used for blastocyst injection and optogenetic chimeric mice are successfully generated, which enables non‐invasive control of user‐defined endogenous genes in vivo, providing a valuable tool for facile and traceless controlled gene expression studies and genetic screens in mice. This LIRE system offers a remote, traceless, and non‐invasive approach for cellular reprogramming and modeling of complex human diseases in basic biological research and regenerative medicine applications

    Table_1_Systematic benchmarking of nanopore Q20+ kit in SARS-CoV-2 whole genome sequencing.XLSX

    No full text
    Whole genome sequencing provides rapid insight into key information about the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), such as virus typing and key mutation site, and this information is important for precise prevention, control and tracing of coronavirus disease 2019 (COVID-19) outbreak in conjunction with the epidemiological information of the case. Nanopore sequencing is widely used around the world for its short sample-to-result time, simple experimental operation and long sequencing reads. However, because nanopore sequencing is a relatively new sequencing technology, many researchers still have doubts about its accuracy. The combination of the newly launched nanopore sequencing Q20+ kit (LSK112) and flow cell R10.4 is a qualitative improvement over the accuracy of the previous kits. In this study, we firstly used LSK112 kit with flow cell R10.4 to sequence the SARS-CoV-2 whole genome, and summarized the sequencing results of the combination of LSK112 kit and flow cell R10.4 for the 1200bp amplicons of SARS-CoV-2. We found that the proportion of sequences with an accuracy of more than 99% reached 30.1%, and the average sequence accuracy reached 98.34%, while the results of the original combination of LSK109 kit and flow cell R9.4.1 were 0.61% and 96.52%, respectively. The mutation site analysis showed that it was completely consistent with the final consensus sequence of next generation sequencing (NGS). The results showed that the combination of LSK112 kit and flow cell R10.4 allowed rapid whole-genome sequencing of SARS-CoV-2 without the need for verification of NGS.</p
    corecore