57,907 research outputs found
Sequence-based Multiscale Model (SeqMM) for High-throughput chromosome conformation capture (Hi-C) data analysis
In this paper, I introduce a Sequence-based Multiscale Model (SeqMM) for the
biomolecular data analysis. With the combination of spectral graph method, I
reveal the essential difference between the global scale models and local scale
ones in structure clustering, i.e., different optimization on Euclidean (or
spatial) distances and sequential (or genomic) distances. More specifically,
clusters from global scale models optimize Euclidean distance relations. Local
scale models, on the other hand, result in clusters that optimize the genomic
distance relations. For a biomolecular data, Euclidean distances and sequential
distances are two independent variables, which can never be optimized
simultaneously in data clustering. However, sequence scale in my SeqMM can work
as a tuning parameter that balances these two variables and deliver different
clusterings based on my purposes. Further, my SeqMM is used to explore the
hierarchical structures of chromosomes. I find that in global scale, the
Fiedler vector from my SeqMM bears a great similarity with the principal vector
from principal component analysis, and can be used to study genomic
compartments. In TAD analysis, I find that TADs evaluated from different scales
are not consistent and vary a lot. Particularly when the sequence scale is
small, the calculated TAD boundaries are dramatically different. Even for
regions with high contact frequencies, TAD regions show no obvious consistence.
However, when the scale value increases further, although TADs are still quite
different, TAD boundaries in these high contact frequency regions become more
and more consistent. Finally, I find that for a fixed local scale, my method
can deliver very robust TAD boundaries in different cluster numbers.Comment: 22 PAGES, 13 FIGURE
Diffusion Component Analysis: Unraveling Functional Topology in Biological Networks
Complex biological systems have been successfully modeled by biochemical and
genetic interaction networks, typically gathered from high-throughput (HTP)
data. These networks can be used to infer functional relationships between
genes or proteins. Using the intuition that the topological role of a gene in a
network relates to its biological function, local or diffusion based
"guilt-by-association" and graph-theoretic methods have had success in
inferring gene functions. Here we seek to improve function prediction by
integrating diffusion-based methods with a novel dimensionality reduction
technique to overcome the incomplete and noisy nature of network data. In this
paper, we introduce diffusion component analysis (DCA), a framework that plugs
in a diffusion model and learns a low-dimensional vector representation of each
node to encode the topological properties of a network. As a proof of concept,
we demonstrate DCA's substantial improvement over state-of-the-art
diffusion-based approaches in predicting protein function from molecular
interaction networks. Moreover, our DCA framework can integrate multiple
networks from heterogeneous sources, consisting of genomic information,
biochemical experiments and other resources, to even further improve function
prediction. Yet another layer of performance gain is achieved by integrating
the DCA framework with support vector machines that take our node vector
representations as features. Overall, our DCA framework provides a novel
representation of nodes in a network that can be used as a plug-in architecture
to other machine learning algorithms to decipher topological properties of and
obtain novel insights into interactomes.Comment: RECOMB 201
Fundamentals of direct limit Lie theory
We show that every countable direct system of finite-dimensional real or
complex Lie groups has a direct limit in the category of Lie groups modelled on
locally convex spaces. This enables us to push all basic constructions of
finite-dimensional Lie theory to the case of direct limit groups. In
particular, we obtain an analogue of Lie's third theorem: Every
countable-dimensional real or complex locally finite Lie algebra is enlargible,
i.e., it is the Lie algebra of some regular Lie group (a suitable direct limit
group).Comment: 33 pages (v2: Lemma 7.12 and Proposition 7.13 corrected, clearer
distinction between analyticity and convenient analyticity
BRST Formulation of 4-Monopoles
A supersymmetric gauge invariant action is constructed over any 4-dimensional
Riemannian manifold describing Witten's theory of 4-monopoles. The topological
supersymmetric algebra closes off-shell. The multiplets include the auxiliary
fields and the Wess-Zumino fields in an unusual way, arising naturally from
BRST gauge fixing. A new canonical approach over Riemann manifolds is followed,
using a Morse function as an euclidean time and taking into account the BRST
boundary conditions that come from the BFV formulation. This allows a
construction of the effective action starting from gauge principles.Comment: 18 pages, Amste
Connectedness of Higgs bundle moduli for complex reductive Lie groups
We carry an intrinsic approach to the study of the connectedness of the
moduli space of -Higgs bundles, over a compact Riemann
surface, when is a complex reductive (not necessarily connected) Lie group.
We prove that the number of connected components of is indexed
by the corresponding topological invariants. In particular, this gives an
alternative proof of the counting by J. Li of the number of connected
components of the moduli space of flat -connections in the case in which
is connected and semisimple.Comment: Due to some mistake the authors did not appear in the previous
version. Fixed this. Final version; to appear in the Asian Journal of
Mathematics. 19 page
- …