4,753 research outputs found
Recommended from our members
NAD tagSeq reveals that NAD+-capped RNAs are mostly produced from a large number of protein-coding genes in Arabidopsis.
The 5' end of a eukaryotic mRNA transcript generally has a 7-methylguanosine (m7G) cap that protects mRNA from degradation and mediates almost all other aspects of gene expression. Some RNAs in Escherichia coli, yeast, and mammals were recently found to contain an NAD+ cap. Here, we report the development of the method NAD tagSeq for transcriptome-wide identification and quantification of NAD+-capped RNAs (NAD-RNAs). The method uses an enzymatic reaction and then a click chemistry reaction to label NAD-RNAs with a synthetic RNA tag. The tagged RNA molecules can be enriched and directly sequenced using the Oxford Nanopore sequencing technology. NAD tagSeq can allow more accurate identification and quantification of NAD-RNAs, as well as reveal the sequences of whole NAD-RNA transcripts using single-molecule RNA sequencing. Using NAD tagSeq, we found that NAD-RNAs in Arabidopsis were produced by at least several thousand genes, most of which are protein-coding genes, with the majority of these transcripts coming from <200 genes. For some Arabidopsis genes, over 5% of their transcripts were NAD capped. Gene ontology terms overrepresented in the 2,000 genes that produced the highest numbers of NAD-RNAs are related to photosynthesis, protein synthesis, and responses to cytokinin and stresses. The NAD-RNAs in Arabidopsis generally have the same overall sequence structures as the canonical m7G-capped mRNAs, although most of them appear to have a shorter 5' untranslated region (5' UTR). The identification and quantification of NAD-RNAs and revelation of their sequence features can provide essential steps toward understanding the functions of NAD-RNAs
Beyond MLE: Convex Learning for Text Generation
Maximum likelihood estimation (MLE) is a statistical method used to estimate
the parameters of a probability distribution that best explain the observed
data. In the context of text generation, MLE is often used to train generative
language models, which can then be used to generate new text. However, we argue
that MLE is not always necessary and optimal, especially for closed-ended text
generation tasks like machine translation. In these tasks, the goal of model is
to generate the most appropriate response, which does not necessarily require
it to estimate the entire data distribution with MLE. To this end, we propose a
novel class of training objectives based on convex functions, which enables
text generation models to focus on highly probable outputs without having to
estimate the entire data distribution. We investigate the theoretical
properties of the optimal predicted distribution when applying convex functions
to the loss, demonstrating that convex functions can sharpen the optimal
distribution, thereby enabling the model to better capture outputs with high
probabilities. Experiments on various text generation tasks and models show the
effectiveness of our approach. It enables autoregressive models to bridge the
gap between greedy and beam search, and facilitates the learning of
non-autoregressive models with a maximum improvement of 9+ BLEU points.
Moreover, our approach also exhibits significant impact on large language
models (LLMs), substantially enhancing their generative capability on various
tasks. Source code is available at
\url{https://github.com/ictnlp/Convex-Learning}.Comment: NeurIPS 202
Non-autoregressive Streaming Transformer for Simultaneous Translation
Simultaneous machine translation (SiMT) models are trained to strike a
balance between latency and translation quality. However, training these models
to achieve high quality while maintaining low latency often leads to a tendency
for aggressive anticipation. We argue that such issue stems from the
autoregressive architecture upon which most existing SiMT models are built. To
address those issues, we propose non-autoregressive streaming Transformer
(NAST) which comprises a unidirectional encoder and a non-autoregressive
decoder with intra-chunk parallelism. We enable NAST to generate the blank
token or repetitive tokens to adjust its READ/WRITE strategy flexibly, and
train it to maximize the non-monotonic latent alignment with an alignment-based
latency loss. Experiments on various SiMT benchmarks demonstrate that NAST
outperforms previous strong autoregressive SiMT baselines.Comment: EMNLP 2023 main conference; Source code is available at
https://github.com/ictnlp/NAS
Butane-1,2,3,4-tetracarboxylic acid–4,4′-bipyridine (1/2)
The hydrothermal reaction of butane-1,2,3,4-tetracarboxylic acid (H4butca), 4,4′-bipyridine (bipy) and Mn(SO4)2·H2O afforded a new co-crystal, C8H10O8·2C10H8N2 or H4butca·2(bipy), in which strong O—H⋯N hydrogen-bonding and weak π–π stacking [centroid–centroid distance = 3.8459 (19) Å] interactions assemble the organic molecules into a three-dimensional supramolecular framework. C—H⋯O interactions are also present. The whole molecule has inversion symmetry
Discriminating bipartite mixed states by local operations
Unambiguous state discrimination of two mixed bipartite states via local
operations and classical communications (LOCC) is studied and compared with the
result of a scheme realized via global measurement. We show that the success
probability of a global scheme for mixed-state discrimination can be achieved
perfectly by the local scheme. In addition, we simulate this discrimination via
a pair of pure entangled bipartite states. This simulation is perfect for local
rather than global schemes due to the existence of entanglement and global
coherence in the pure states. We also prove that LOCC protocol and the
sequential state discrimination (SSD) can be interpreted in a unified view. We
then hybridize the LOCC protocol with three protocols (SSD, reproducing and
broadcasting) relying on classical communications. Such hybridizations extend
the gaps between the optimal success probability of global and local schemes,
which can be eliminated only for the SSD rather than the other two protocols
Molecular phylogeny of the antiangiogenic and neurotrophic serpin, pigment epithelium derived factor in vertebrates
BACKGROUND: Pigment epithelium derived factor (PEDF), a member of the serpin family, regulates cell proliferation, promotes survival of neurons, and blocks growth of new blood vessels in mammals. Defining the molecular phylogeny of PEDF by bioinformatic analysis is one approach to understanding the link between its gene structure and its function in these biological processes. RESULTS: From a comprehensive search of available DNA databases we identified a single PEDF gene in all vertebrate species examined. These included four mammalian and six non-mammalian vertebrate species in which PEDF had not previously been described. A five gene cluster around PEDF was found in an approximate 100 kb region in mammals, birds, and amphibians. In ray-finned fish these genes are scattered over three chromosomes although only one PEDF gene was consistently found. The PEDF gene is absent in invertebrates including Drosophila melanogaster (D. melanogaster), Caenorhabditis elegans (C. elegans), and sea squirt (C. intestinalis). The PEDF gene is transcribed in all vertebrate phyla, suggesting it is biologically active throughout vertebrate evolution. The multiple actions of PEDF are likely conserved in evolution since it has the same gene structure across phyla, although the size of the gene ranges from 48.3 kb in X. tropicalis to 2.9 kb in fugu, with human PEDF at a size of 15.6 kb. A strong similarity in the proximal 200 bp of the PEDF promoter in mammals suggests the existence of a possible regulatory region across phyla. Using a non-synonymous/synonymous substitution rate ratio we show that mammalian and fish PEDFs have similar ratios of <0.13, reflecting a strong purifying selection of PEDF gene. A large number of repetitive transposable elements of the SINE and LINE class were found with random distribution in both the promoter and introns of mammalian PEDF. CONCLUSION: The PEDF gene first appears in vertebrates and our studies suggest that the regulation and biological actions of this gene are preserved across vertebrates. This comprehensive analysis of the PEDF gene across phyla provides new information that will aid further characterization of common functional motifs of this serpin in biological processes
The therapeutic evaluation and mechanism on treating bronchial hyper-responsiveness cough by ziyinqingre prescription
Objective: Discussing the effects of Ziyinqingre prescription on the level of airway resistance (Rrs), airway response threshold (Dmin), airway conductance (sGrs) and the level of inflammatory cytokines interleukin-4 (IL-4) and interferon-γ (IFN-γ) of the bronchial hyper-responsiveness (BHR) cough patients.Method: 84 subjects diagnosed as BHR were randomly divided into 42 Chinese Traditional medicine group and 42 control group. The Chinese Traditional Medicine group received Ziyinqingre prescription twice a day and the control group received 10mg Montelukast Sodium tablets once a day for two weeks. Observe the clinical symptoms improvement and the changes of the level of the Rrs, Dmin, sGrs and IL-4, IFN-γ.Results: After receiving the medicine, the symptoms of the Chinese medicine group were obviously alleviated, the outcome was more satisfied than that of the control group. Compared with the control group, the level of Dmin increased and sGrs level decreased more obviously (P<0.05); the level of IL-4 decreased and IFN-γlevel increased more obviously in the Chinese medicine group (P<0.05).Conclusion: Ziyinqingre prescription can not only improve BHR patients’ symptoms, but reduce the level of bronchial responsiveness, which proved a better curative effect of Chinese medicine. The mechanism is probably due to relieving the airway inflammation by keeping the balance between Th1 and Th2 cells.Keywords: Ziyinqingre prescription; cough; bronchial hyper-responsiveness; therapeutic mechanis
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Instruction tuning large language model (LLM) on image-text pairs has
achieved unprecedented vision-language multimodal abilities. However, their
vision-language alignments are only built on image-level, the lack of
region-level alignment limits their advancements to fine-grained multimodal
understanding. In this paper, we propose instruction tuning on
region-of-interest. The key design is to reformulate the bounding box as the
format of spatial instruction. The interleaved sequences of visual features
extracted by the spatial instruction and the language embedding are input to
LLM, and trained on the transformed region-text data in instruction tuning
format. Our region-level vision-language model, termed as GPT4RoI, brings brand
new conversational and interactive experience beyond image-level understanding.
(1) Controllability: Users can interact with our model by both language and
spatial instructions to flexibly adjust the detail level of the question. (2)
Capacities: Our model supports not only single-region spatial instruction but
also multi-region. This unlocks more region-level multimodal capacities such as
detailed region caption and complex region reasoning. (3) Composition: Any
off-the-shelf object detector can be a spatial instruction provider so as to
mine informative object attributes from our model, like color, shape, material,
action, relation to other objects, etc. The code, data, and demo can be found
at https://github.com/jshilong/GPT4RoI.Comment: Code has been released at https://github.com/jshilong/GPT4Ro
Recommended from our members
A Robust Gene Expression Prognostic Signature for Overall Survival in High-Grade Serous Ovarian Cancer.
The objective of this research was to develop a robust gene expression-based prognostic signature and scoring system for predicting overall survival (OS) of patients with high-grade serous ovarian cancer (HGSOC). Transcriptomic data of HGSOC patients were obtained from six independent studies in the NCBI GEO database. Genes significantly deregulated and associated with OS in HGSOCs were selected using GEO2R and Kaplan-Meier analysis with log-rank testing, respectively. Enrichment analysis for biological processes and pathways was performed using Gene Ontology analysis. A resampling/cross-validation method with Cox regression analysis was used to identify a novel gene expression-based signature associated with OS, and a prognostic scoring system was developed and further validated in nine independent HGSOC datasets. We first identified 488 significantly deregulated genes in HGSOC patients, of which 232 were found to be significantly associated with their OS. These genes were significantly enriched for cell cycle division, epithelial cell differentiation, p53 signaling pathway, vasculature development, and other processes. A novel 11-gene prognostic signature was identified and a prognostic scoring system was developed, which robustly predicted OS in HGSOC patients in 100 sampling test sets. The scoring system was further validated successfully in nine additional HGSOC public datasets. In conclusion, our integrative bioinformatics study combining transcriptomic and clinical data established an 11-gene prognostic signature for robust and reproducible prediction of OS in HGSOC patients. This signature could be of clinical value for guiding therapeutic selection and individualized treatment
- …