237 research outputs found
LISA: Accurate reconstruction of cell trajectory and pseudo-time for massive single cell RNA-seq data.
Cell trajectory reconstruction based on single cell RNA sequencing is important for obtaining the landscape of different cell types and discovering cell fate transitions. Despite intense effort, analyzing massive single cell RNA-seq datasets is still challenging. We propose a new method named Landmark Isomap for Single-cell Analysis (LISA). LISA is an unsupervised approach to build cell trajectory and compute pseudo-time in the isometric embedding based on geodesic distances. The advantages of LISA include: (1) It utilizes k-nearest-neighbor graph and hierarchical clustering to identify cell clusters, peaks and valleys in low-dimension representation of the data; (2) Based on Landmark Isomap, it constructs the main geometric structure of cell lineages; (3) It projects cells to the edges of the main cell trajectory to generate the global pseudo-time. Assessments on simulated and real datasets demonstrate the advantages of LISA on cell trajectory and pseudo-time reconstruction compared to Monocle2 and TSCAN. LISA is accurate, fast, and requires less memory usage, allowing its applications to massive single cell datasets generated from current experimental platforms
LMDA-Net:A lightweight multi-dimensional attention network for general EEG-based brain-computer interface paradigms and interpretability
EEG-based recognition of activities and states involves the use of prior
neuroscience knowledge to generate quantitative EEG features, which may limit
BCI performance. Although neural network-based methods can effectively extract
features, they often encounter issues such as poor generalization across
datasets, high predicting volatility, and low model interpretability. Hence, we
propose a novel lightweight multi-dimensional attention network, called
LMDA-Net. By incorporating two novel attention modules designed specifically
for EEG signals, the channel attention module and the depth attention module,
LMDA-Net can effectively integrate features from multiple dimensions, resulting
in improved classification performance across various BCI tasks. LMDA-Net was
evaluated on four high-impact public datasets, including motor imagery (MI) and
P300-Speller paradigms, and was compared with other representative models. The
experimental results demonstrate that LMDA-Net outperforms other representative
methods in terms of classification accuracy and predicting volatility,
achieving the highest accuracy in all datasets within 300 training epochs.
Ablation experiments further confirm the effectiveness of the channel attention
module and the depth attention module. To facilitate an in-depth understanding
of the features extracted by LMDA-Net, we propose class-specific neural network
feature interpretability algorithms that are suitable for event-related
potentials (ERPs) and event-related desynchronization/synchronization
(ERD/ERS). By mapping the output of the specific layer of LMDA-Net to the time
or spatial domain through class activation maps, the resulting feature
visualizations can provide interpretable analysis and establish connections
with EEG time-spatial analysis in neuroscience. In summary, LMDA-Net shows
great potential as a general online decoding model for various EEG tasks.Comment: 20 pages, 7 Figure
Fortifying Ethical Boundaries in AI: Advanced Strategies for Enhancing Security in Large Language Models
Recent advancements in large language models (LLMs) have significantly
enhanced capabilities in natural language processing and artificial
intelligence. These models, including GPT-3.5 and LLaMA-2, have revolutionized
text generation, translation, and question-answering tasks due to the
transformative Transformer model. Despite their widespread use, LLMs present
challenges such as ethical dilemmas when models are compelled to respond
inappropriately, susceptibility to phishing attacks, and privacy violations.
This paper addresses these challenges by introducing a multi-pronged approach
that includes: 1) filtering sensitive vocabulary from user input to prevent
unethical responses; 2) detecting role-playing to halt interactions that could
lead to 'prison break' scenarios; 3) implementing custom rule engines to
restrict the generation of prohibited content; and 4) extending these
methodologies to various LLM derivatives like Multi-Model Large Language Models
(MLLMs). Our approach not only fortifies models against unethical manipulations
and privacy breaches but also maintains their high performance across tasks. We
demonstrate state-of-the-art performance under various attack prompts, without
compromising the model's core functionalities. Furthermore, the introduction of
differentiated security levels empowers users to control their personal data
disclosure. Our methods contribute to reducing social risks and conflicts
arising from technological abuse, enhance data protection, and promote social
equity. Collectively, this research provides a framework for balancing the
efficiency of question-answering systems with user privacy and ethical
standards, ensuring a safer user experience and fostering trust in AI
technology
PRAS: Predicting functional targets of RNA binding proteins based on CLIP-seq peaks.
RNA-protein interaction plays important roles in post-transcriptional regulation. Recent advancements in cross-linking and immunoprecipitation followed by sequencing (CLIP-seq) technologies make it possible to detect the binding peaks of a given RNA binding protein (RBP) at transcriptome scale. However, it is still challenging to predict the functional consequences of RBP binding peaks. In this study, we propose the Protein-RNA Association Strength (PRAS), which integrates the intensities and positions of the binding peaks of RBPs for functional mRNA targets prediction. We illustrate the superiority of PRAS over existing approaches on predicting the functional targets of two related but divergent CELF (CUGBP, ELAV-like factor) RBPs in mouse brain and muscle. We also demonstrate the potential of PRAS for wide adoption by applying it to the enhanced CLIP-seq (eCLIP) datasets of 37 RNA decay related RBPs in two human cell lines. PRAS can be utilized to investigate any RBPs with available CLIP-seq peaks. PRAS is freely available at http://ouyanglab.jax.org/pras/
Deciphering the role of RNA structure in translation efficiency.
BACKGROUND: RNA secondary structure has broad impact on the fate of RNA metabolism. The reduced stability of secondary structures near the translation initiation site/start codon of the coding region promotes the efficiency of translation in both prokaryotic and eukaryotic species. However, the inaccuracy of in silico folding and the focus on the coding region limit our understanding of the global relationship between the whole mRNA structure and translation efficiency. Leveraging high-throughput RNA structure probing data in the transcriptome, we aim to systematically investigate the role of RNA structure in regulating translation efficiency.
RESULTS: Here, we analyze the influences of hundreds of sequence and structural features on translation efficiency in the mouse embryonic stem cells (mESCs) and zebrafish developmental stages. Our findings reveal that overall in vivo RNA structure has a higher relative importance in predicting translation efficiency than in vitro RNA structure in both mESCs and zebrafish. Also, RNA structures in 3\u27 untranslated region (UTR) have much stronger influence on translation efficiency compared to those in coding regions or 5\u27 UTR. Furthermore, strong alternation between in vitro and in vivo structures in 3\u27 UTR are detected in highly translated mRNAs in mESCs but not zebrafish. Instead, moderate alteration between in vitro and in vivo RNA structures in the 5\u27 UTR and proximal coding regions are detected in highly translated mRNAs in zebrafish.
CONCLUSIONS: Our results suggest the openness of the 3\u27 UTR promotes the translation efficiency in both mice and zebrafish, with the in vivo structure in 3\u27 UTR more important in mice than in zebrafish. This reveals a novel role of RNA secondary structure on translational regulation
RPN: A Word Vector Level Data Augmentation Algorithm in Deep Learning for Language Understanding
This paper presents a new data augmentation algorithm for natural
understanding tasks, called RPN:Random Position Noise algorithm.Due to the
relative paucity of current text augmentation methods. Few of the extant
methods apply to natural language understanding tasks for all sentence-level
tasks.RPN applies the traditional augmentation on the original text to the word
vector level. The RPN algorithm makes a substitution in one or several
dimensions of some word vectors. As a result, the RPN can introduce a certain
degree of perturbation to the sample and can adjust the range of perturbation
on different tasks. The augmented samples are then used to give the model
training.This makes the model more robust. In subsequent experiments, we found
that adding RPN to the training or fine-tuning model resulted in a stable boost
on all 8 natural language processing tasks, including TweetEval, CoLA, and
SST-2 datasets, and more significant improvements than other data augmentation
algorithms.The RPN algorithm applies to all sentence-level tasks for language
understanding and is used in any deep learning model with a word embedding
layer.Comment: 10 pages, 4 figure
A new graph-based clustering method with application to single-cell RNA-seq data from human pancreatic islets.
Traditional bulk RNA-sequencing of human pancreatic islets mainly reflects transcriptional response of major cell types. Single-cell RNA sequencing technology enables transcriptional characterization of individual cells, and thus makes it possible to detect cell types and subtypes. To tackle the heterogeneity of single-cell RNA-seq data, powerful and appropriate clustering is required to facilitate the discovery of cell types. In this paper, we propose a new clustering framework based on a graph-based model with various types of dissimilarity measures. We take the compositional nature of single-cell RNA-seq data into account and employ log-ratio transformations. The practical merit of the proposed method is demonstrated through the application to the centered log-ratio-transformed single-cell RNA-seq data for human pancreatic islets. The practical merit is also demonstrated through comparisons with existing single-cell clustering methods. The R-package for the proposed method can be found at https://github.com/Zhang-Data-Science-Research-Lab/LrSClust
Recommended from our members
KARR-seq reveals cellular higher-order RNA structures and RNA–RNA interactions
RNA fate and function are affected by their structures and interactomes. However, how RNA and RNA-binding proteins (RBPs) assemble into higher-order structures and how RNA molecules may interact with each other to facilitate functions remain largely unknown. Here we present KARR-seq, which uses N3-kethoxal labeling and multifunctional chemical crosslinkers to covalently trap and determine RNA–RNA interactions and higher-order RNA structures inside cells, independent of local protein binding to RNA. KARR-seq depicts higher-order RNA structure and detects widespread intermolecular RNA–RNA interactions with high sensitivity and accuracy. Using KARR-seq, we show that translation represses mRNA compaction under native and stress conditions. We determined the higher-order RNA structures of respiratory syncytial virus (RSV) and vesicular stomatitis virus (VSV) and identified RNA–RNA interactions between the viruses and the host RNAs that potentially regulate viral replication
- …