249 research outputs found
Imaging the transverse spin density of light via electromagnetically induced transparency
When a light beam is strongly laterally confined, its field vector spins in a
plane not perpendicular to the propagation direction, leading to the presence
of transverse spin angular momentum, which plays a crucial role in the field of
chiral quantum optics. The existing techniques to measure the transverse spin
density require complex setups and sophisticated time-consuming procedures.
Here, we propose a scheme to measure the transverse spin density of an optical
field in real time using a multi-level atomic medium. The susceptibility of the
medium is spatially modulated by the transverse spin via electromagnetically
induced transparency. The distribution of the transverse spin is then extracted
by measuring the distributions of the Stokes parameters of another collimated
probe field.Comment: 4 pages, 3 figure
New approaches for clustering high dimensional data
Clustering is one of the most effective methods for analyzing datasets that contain a large number of objects with numerous attributes. Clustering seeks to identify groups, or clusters, of similar objects. In low dimensional space, the similarity between objects is often evaluated by summing the difference across all of their attributes. High dimensional data, however, may contain irrelevant attributes which mask the existence of clusters. The discovery of groups of objects that are highly similar within some subsets of relevant attributes becomes an important but challenging task. My thesis focuses on various models and algorithms for this task. We first present a flexible clustering model, namely OP-Cluster (Order Preserving Cluster). Under this model, two objects are similar on a subset of attributes if the values of these two objects induce the same relative ordering of these attributes. OPClustering algorithm has demonstrated to be useful to identify co-regulated genes in gene expression data. We also propose a semi-supervised approach to discover biologically meaningful OP-Clusters by incorporating existing gene function classifications into the clustering process. This semi-supervised algorithm yields only OP-clusters that are significantly enriched by genes from specific functional categories. Real datasets are often noisy. We propose a noise-tolerant clustering algorithm for mining frequently occuring itemsets. This algorithm is called approximate frequent itemsets (AFI). Both the theoretical and experimental results demonstrate that our AFI mining algorithm has higher recoverability of real clusters than any other existing itemset mining approaches. Pair-wise dissimilarities are often derived from original data to reduce the complexities of high dimensional data. Traditional clustering algorithms taking pair-wise dissimilarities as input often generate disjoint clusters from pair-wise dissimilarities. It is well known that the classification model represented by disjoint clusters is inconsistent with many real classifications, such gene function classifications. We develop a Poclustering algorithm, which generates overlapping clusters from pair-wise dissimilarities. We prove that by allowing overlapping clusters, Poclustering fully preserves the information of any dissimilarity matrices while traditional partitioning algorithms may cause significant information loss
iMapSplice: Alleviating Reference Bias Through Personalized RNA-seq Alignment
Genomic variants in both coding and non-coding sequences can have functionally important and sometimes deleterious effects on exon splicing of gene transcripts. For transcriptome profiling using RNA-seq, the accurate alignment of reads across exon junctions is a critical step. Existing algorithms that utilize a standard reference genome as a template sometimes have difficulty in mapping reads that carry genomic variants. These problems can lead to allelic ratio biases and the failure to detect splice variants created by splice site polymorphisms. To improve RNA-seq read alignment, we have developed a novel approach called iMapSplice that enables personalized mRNA transcriptome profiling. The algorithm makes use of personal genomic information and performs an unbiased alignment towards genome indices carrying both reference and alternative bases. Importantly, this breaks the dependency on reference genome splice site dinucleotide motifs and enables iMapSplice to discover personal splice junctions created through splice site polymorphisms. We report comparative analyses using a number of simulated and real datasets. Besides general improvements in read alignment and splice junction discovery, iMapSplice greatly alleviates allelic ratio biases and unravels many previously uncharacterized splice junctions created by splice site polymorphisms, with minimal overhead in computation time and storage. Software download URL: https://github.com/LiuBioinfo/iMapSplice
Piecing the puzzle together: a revisit to transcript reconstruction problem in RNA-seq
The advancement of RNA sequencing (RNA-seq) has provided an unprecedented opportunity to assess both the diversity and quantity of transcript isoforms in an mRNA transcriptome. In this paper, we revisit the computational problem of transcript reconstruction and quantification. Unlike existing methods which focus on how to explain the exons and splice variants detected by the reads with a set of isoforms, we aim at reconstructing transcripts by piecing the reads into individual effective transcript copies. Simultaneously, the quantity of each isoform is explicitly measured by the number of assembled effective copies, instead of estimated solely based on the collective read count. We have developed a novel method named Astroid that solves the problem of effective copy reconstruction on the basis of a flow network. The RNA-seq reads are represented as vertices in the flow network and are connected by weighted edges that evaluate the likelihood of two reads originating from the same effective copy. A maximum likelihood set of transcript copies is then reconstructed by solving a minimum-cost flow problem on the flow network. Simulation studies on the human transcriptome have demonstrated the superior sensitivity and specificity of Astroid in transcript reconstruction as well as improved accuracy in transcript quantification over several existing approaches. The application of Astroid on two real RNA-seq datasets has further demonstrated its accuracy through high correlation between the estimated isoform abundance and the qRT-PCR validations
ATRank: An Attention-Based User Behavior Modeling Framework for Recommendation
A user can be represented as what he/she does along the history. A common way
to deal with the user modeling problem is to manually extract all kinds of
aggregated features over the heterogeneous behaviors, which may fail to fully
represent the data itself due to limited human instinct. Recent works usually
use RNN-based methods to give an overall embedding of a behavior sequence,
which then could be exploited by the downstream applications. However, this can
only preserve very limited information, or aggregated memories of a person.
When a downstream application requires to facilitate the modeled user features,
it may lose the integrity of the specific highly correlated behavior of the
user, and introduce noises derived from unrelated behaviors. This paper
proposes an attention based user behavior modeling framework called ATRank,
which we mainly use for recommendation tasks. Heterogeneous user behaviors are
considered in our model that we project all types of behaviors into multiple
latent semantic spaces, where influence can be made among the behaviors via
self-attention. Downstream applications then can use the user behavior vectors
via vanilla attention. Experiments show that ATRank can achieve better
performance and faster training process. We further explore ATRank to use one
unified model to predict different types of user behaviors at the same time,
showing a comparable performance with the highly optimized individual models.Comment: AAAI 201
Discerning Novel Splice Junctions Derived from RNA-Seq Alignment: A Deep Learning Approach
Background: Exon splicing is a regulated cellular process in the transcription of protein-coding genes. Technological advancements and cost reductions in RNA sequencing have made quantitative and qualitative assessments of the transcriptome both possible and widely available. RNA-seq provides unprecedented resolution to identify gene structures and resolve the diversity of splicing variants. However, currently available ab initio aligners are vulnerable to spurious alignments due to random sequence matches and sample-reference genome discordance. As a consequence, a significant set of false positive exon junction predictions would be introduced, which will further confuse downstream analyses of splice variant discovery and abundance estimation.
Results: In this work, we present a deep learning based splice junction sequence classifier, named DeepSplice, which employs convolutional neural networks to classify candidate splice junctions. We show (I) DeepSplice outperforms state-of-the-art methods for splice site classification when applied to the popular benchmark dataset HS3D, (II) DeepSplice shows high accuracy for splice junction classification with GENCODE annotation, and (III) the application of DeepSplice to classify putative splice junctions generated by Rail-RNA alignment of 21,504 human RNA-seq data significantly reduces 43 million candidates into around 3 million highly confident novel splice junctions.
Conclusions: A model inferred from the sequences of annotated exon junctions that can then classify splice junctions derived from primary RNA-seq data has been implemented. The performance of the model was evaluated and compared through comprehensive benchmarking and testing, indicating a reliable performance and gross usability for classifying novel splice junctions derived from RNA-seq alignment
Long-Read Sequencing of the Zebrafish Genome Reorganizes Genomic Architecture
BACKGROUND: Nanopore sequencing technology has revolutionized the field of genome biology with its ability to generate extra-long reads that can resolve regions of the genome that were previously inaccessible to short-read sequencing platforms. Over 50% of the zebrafish genome consists of difficult to map, highly repetitive, low complexity elements that pose inherent problems for short-read sequencers and assemblers.
RESULTS: We used long-read nanopore sequencing to generate a de novo assembly of the zebrafish genome and compared our assembly to the current reference genome, GRCz11. The new assembly identified 1697 novel insertions and deletions over one kilobase in length and placed 106 previously unlocalized scaffolds. We also discovered additional sites of retrotransposon integration previously unreported in GRCz11 and observed the expression of these transposable elements in adult zebrafish under physiologic conditions, implying they have active mobility in the zebrafish genome and contribute to the ever-changing genomic landscape.
CONCLUSIONS: We used nanopore sequencing to improve upon and resolve the issues plaguing the current zebrafish reference assembly, GRCz11. Zebrafish is a prominent model of human disease, and our corrected assembly will be useful for studies relying on interspecies comparisons and precise linkage of genetic events to disease phenotypes
Experience-Learning Inspired Two-Step Reward Method for Efficient Legged Locomotion Learning Towards Natural and Robust Gaits
Multi-legged robots offer enhanced stability in complex terrains, yet
autonomously learning natural and robust motions in such environments remains
challenging. Drawing inspiration from animals' progressive learning patterns,
from simple to complex tasks, we introduce a universal two-stage learning
framework with two-step reward setting based on self-acquired experience, which
efficiently enables legged robots to incrementally learn natural and robust
movements. In the first stage, robots learn through gait-related rewards to
track velocity on flat terrain, acquiring natural, robust movements and
generating effective motion experience data. In the second stage, mirroring
animal learning from existing experiences, robots learn to navigate challenging
terrains with natural and robust movements using adversarial imitation
learning. To demonstrate our method's efficacy, we trained both quadruped
robots and a hexapod robot, and the policy were successfully transferred to a
physical quadruped robot GO1, which exhibited natural gait patterns and
remarkable robustness in various terrains
- …