237 research outputs found

    LISA: Accurate reconstruction of cell trajectory and pseudo-time for massive single cell RNA-seq data.

    Get PDF
    Cell trajectory reconstruction based on single cell RNA sequencing is important for obtaining the landscape of different cell types and discovering cell fate transitions. Despite intense effort, analyzing massive single cell RNA-seq datasets is still challenging. We propose a new method named Landmark Isomap for Single-cell Analysis (LISA). LISA is an unsupervised approach to build cell trajectory and compute pseudo-time in the isometric embedding based on geodesic distances. The advantages of LISA include: (1) It utilizes k-nearest-neighbor graph and hierarchical clustering to identify cell clusters, peaks and valleys in low-dimension representation of the data; (2) Based on Landmark Isomap, it constructs the main geometric structure of cell lineages; (3) It projects cells to the edges of the main cell trajectory to generate the global pseudo-time. Assessments on simulated and real datasets demonstrate the advantages of LISA on cell trajectory and pseudo-time reconstruction compared to Monocle2 and TSCAN. LISA is accurate, fast, and requires less memory usage, allowing its applications to massive single cell datasets generated from current experimental platforms

    LMDA-Net:A lightweight multi-dimensional attention network for general EEG-based brain-computer interface paradigms and interpretability

    Full text link
    EEG-based recognition of activities and states involves the use of prior neuroscience knowledge to generate quantitative EEG features, which may limit BCI performance. Although neural network-based methods can effectively extract features, they often encounter issues such as poor generalization across datasets, high predicting volatility, and low model interpretability. Hence, we propose a novel lightweight multi-dimensional attention network, called LMDA-Net. By incorporating two novel attention modules designed specifically for EEG signals, the channel attention module and the depth attention module, LMDA-Net can effectively integrate features from multiple dimensions, resulting in improved classification performance across various BCI tasks. LMDA-Net was evaluated on four high-impact public datasets, including motor imagery (MI) and P300-Speller paradigms, and was compared with other representative models. The experimental results demonstrate that LMDA-Net outperforms other representative methods in terms of classification accuracy and predicting volatility, achieving the highest accuracy in all datasets within 300 training epochs. Ablation experiments further confirm the effectiveness of the channel attention module and the depth attention module. To facilitate an in-depth understanding of the features extracted by LMDA-Net, we propose class-specific neural network feature interpretability algorithms that are suitable for event-related potentials (ERPs) and event-related desynchronization/synchronization (ERD/ERS). By mapping the output of the specific layer of LMDA-Net to the time or spatial domain through class activation maps, the resulting feature visualizations can provide interpretable analysis and establish connections with EEG time-spatial analysis in neuroscience. In summary, LMDA-Net shows great potential as a general online decoding model for various EEG tasks.Comment: 20 pages, 7 Figure

    Fortifying Ethical Boundaries in AI: Advanced Strategies for Enhancing Security in Large Language Models

    Full text link
    Recent advancements in large language models (LLMs) have significantly enhanced capabilities in natural language processing and artificial intelligence. These models, including GPT-3.5 and LLaMA-2, have revolutionized text generation, translation, and question-answering tasks due to the transformative Transformer model. Despite their widespread use, LLMs present challenges such as ethical dilemmas when models are compelled to respond inappropriately, susceptibility to phishing attacks, and privacy violations. This paper addresses these challenges by introducing a multi-pronged approach that includes: 1) filtering sensitive vocabulary from user input to prevent unethical responses; 2) detecting role-playing to halt interactions that could lead to 'prison break' scenarios; 3) implementing custom rule engines to restrict the generation of prohibited content; and 4) extending these methodologies to various LLM derivatives like Multi-Model Large Language Models (MLLMs). Our approach not only fortifies models against unethical manipulations and privacy breaches but also maintains their high performance across tasks. We demonstrate state-of-the-art performance under various attack prompts, without compromising the model's core functionalities. Furthermore, the introduction of differentiated security levels empowers users to control their personal data disclosure. Our methods contribute to reducing social risks and conflicts arising from technological abuse, enhance data protection, and promote social equity. Collectively, this research provides a framework for balancing the efficiency of question-answering systems with user privacy and ethical standards, ensuring a safer user experience and fostering trust in AI technology

    PRAS: Predicting functional targets of RNA binding proteins based on CLIP-seq peaks.

    Get PDF
    RNA-protein interaction plays important roles in post-transcriptional regulation. Recent advancements in cross-linking and immunoprecipitation followed by sequencing (CLIP-seq) technologies make it possible to detect the binding peaks of a given RNA binding protein (RBP) at transcriptome scale. However, it is still challenging to predict the functional consequences of RBP binding peaks. In this study, we propose the Protein-RNA Association Strength (PRAS), which integrates the intensities and positions of the binding peaks of RBPs for functional mRNA targets prediction. We illustrate the superiority of PRAS over existing approaches on predicting the functional targets of two related but divergent CELF (CUGBP, ELAV-like factor) RBPs in mouse brain and muscle. We also demonstrate the potential of PRAS for wide adoption by applying it to the enhanced CLIP-seq (eCLIP) datasets of 37 RNA decay related RBPs in two human cell lines. PRAS can be utilized to investigate any RBPs with available CLIP-seq peaks. PRAS is freely available at http://ouyanglab.jax.org/pras/

    Deciphering the role of RNA structure in translation efficiency.

    Get PDF
    BACKGROUND: RNA secondary structure has broad impact on the fate of RNA metabolism. The reduced stability of secondary structures near the translation initiation site/start codon of the coding region promotes the efficiency of translation in both prokaryotic and eukaryotic species. However, the inaccuracy of in silico folding and the focus on the coding region limit our understanding of the global relationship between the whole mRNA structure and translation efficiency. Leveraging high-throughput RNA structure probing data in the transcriptome, we aim to systematically investigate the role of RNA structure in regulating translation efficiency. RESULTS: Here, we analyze the influences of hundreds of sequence and structural features on translation efficiency in the mouse embryonic stem cells (mESCs) and zebrafish developmental stages. Our findings reveal that overall in vivo RNA structure has a higher relative importance in predicting translation efficiency than in vitro RNA structure in both mESCs and zebrafish. Also, RNA structures in 3\u27 untranslated region (UTR) have much stronger influence on translation efficiency compared to those in coding regions or 5\u27 UTR. Furthermore, strong alternation between in vitro and in vivo structures in 3\u27 UTR are detected in highly translated mRNAs in mESCs but not zebrafish. Instead, moderate alteration between in vitro and in vivo RNA structures in the 5\u27 UTR and proximal coding regions are detected in highly translated mRNAs in zebrafish. CONCLUSIONS: Our results suggest the openness of the 3\u27 UTR promotes the translation efficiency in both mice and zebrafish, with the in vivo structure in 3\u27 UTR more important in mice than in zebrafish. This reveals a novel role of RNA secondary structure on translational regulation

    RPN: A Word Vector Level Data Augmentation Algorithm in Deep Learning for Language Understanding

    Full text link
    This paper presents a new data augmentation algorithm for natural understanding tasks, called RPN:Random Position Noise algorithm.Due to the relative paucity of current text augmentation methods. Few of the extant methods apply to natural language understanding tasks for all sentence-level tasks.RPN applies the traditional augmentation on the original text to the word vector level. The RPN algorithm makes a substitution in one or several dimensions of some word vectors. As a result, the RPN can introduce a certain degree of perturbation to the sample and can adjust the range of perturbation on different tasks. The augmented samples are then used to give the model training.This makes the model more robust. In subsequent experiments, we found that adding RPN to the training or fine-tuning model resulted in a stable boost on all 8 natural language processing tasks, including TweetEval, CoLA, and SST-2 datasets, and more significant improvements than other data augmentation algorithms.The RPN algorithm applies to all sentence-level tasks for language understanding and is used in any deep learning model with a word embedding layer.Comment: 10 pages, 4 figure

    A new graph-based clustering method with application to single-cell RNA-seq data from human pancreatic islets.

    Get PDF
    Traditional bulk RNA-sequencing of human pancreatic islets mainly reflects transcriptional response of major cell types. Single-cell RNA sequencing technology enables transcriptional characterization of individual cells, and thus makes it possible to detect cell types and subtypes. To tackle the heterogeneity of single-cell RNA-seq data, powerful and appropriate clustering is required to facilitate the discovery of cell types. In this paper, we propose a new clustering framework based on a graph-based model with various types of dissimilarity measures. We take the compositional nature of single-cell RNA-seq data into account and employ log-ratio transformations. The practical merit of the proposed method is demonstrated through the application to the centered log-ratio-transformed single-cell RNA-seq data for human pancreatic islets. The practical merit is also demonstrated through comparisons with existing single-cell clustering methods. The R-package for the proposed method can be found at https://github.com/Zhang-Data-Science-Research-Lab/LrSClust
    • …
    corecore