16,341 research outputs found
Sequence-based Multiscale Model (SeqMM) for High-throughput chromosome conformation capture (Hi-C) data analysis
In this paper, I introduce a Sequence-based Multiscale Model (SeqMM) for the
biomolecular data analysis. With the combination of spectral graph method, I
reveal the essential difference between the global scale models and local scale
ones in structure clustering, i.e., different optimization on Euclidean (or
spatial) distances and sequential (or genomic) distances. More specifically,
clusters from global scale models optimize Euclidean distance relations. Local
scale models, on the other hand, result in clusters that optimize the genomic
distance relations. For a biomolecular data, Euclidean distances and sequential
distances are two independent variables, which can never be optimized
simultaneously in data clustering. However, sequence scale in my SeqMM can work
as a tuning parameter that balances these two variables and deliver different
clusterings based on my purposes. Further, my SeqMM is used to explore the
hierarchical structures of chromosomes. I find that in global scale, the
Fiedler vector from my SeqMM bears a great similarity with the principal vector
from principal component analysis, and can be used to study genomic
compartments. In TAD analysis, I find that TADs evaluated from different scales
are not consistent and vary a lot. Particularly when the sequence scale is
small, the calculated TAD boundaries are dramatically different. Even for
regions with high contact frequencies, TAD regions show no obvious consistence.
However, when the scale value increases further, although TADs are still quite
different, TAD boundaries in these high contact frequency regions become more
and more consistent. Finally, I find that for a fixed local scale, my method
can deliver very robust TAD boundaries in different cluster numbers.Comment: 22 PAGES, 13 FIGURE
New Image Statistics for Detecting Disturbed Galaxy Morphologies at High Redshift
Testing theories of hierarchical structure formation requires estimating the
distribution of galaxy morphologies and its change with redshift. One aspect of
this investigation involves identifying galaxies with disturbed morphologies
(e.g., merging galaxies). This is often done by summarizing galaxy images
using, e.g., the CAS and Gini-M20 statistics of Conselice (2003) and Lotz et
al. (2004), respectively, and associating particular statistic values with
disturbance. We introduce three statistics that enhance detection of disturbed
morphologies at high-redshift (z ~ 2): the multi-mode (M), intensity (I), and
deviation (D) statistics. We show their effectiveness by training a
machine-learning classifier, random forest, using 1,639 galaxies observed in
the H band by the Hubble Space Telescope WFC3, galaxies that had been
previously classified by eye by the CANDELS collaboration (Grogin et al. 2011,
Koekemoer et al. 2011). We find that the MID statistics (and the A statistic of
Conselice 2003) are the most useful for identifying disturbed morphologies.
We also explore whether human annotators are useful for identifying disturbed
morphologies. We demonstrate that they show limited ability to detect
disturbance at high redshift, and that increasing their number beyond
approximately 10 does not provably yield better classification performance. We
propose a simulation-based model-fitting algorithm that mitigates these issues
by bypassing annotation.Comment: 15 pages, 14 figures, accepted for publication in MNRA
Real-time filtering and detection of dynamics for compression of HDTV
The preprocessing of video sequences for data compressing is discussed. The end goal associated with this is a compression system for HDTV capable of transmitting perceptually lossless sequences at under one bit per pixel. Two subtopics were emphasized to prepare the video signal for more efficient coding: (1) nonlinear filtering to remove noise and shape the signal spectrum to take advantage of insensitivities of human viewers; and (2) segmentation of each frame into temporally dynamic/static regions for conditional frame replenishment. The latter technique operates best under the assumption that the sequence can be modelled as a superposition of active foreground and static background. The considerations were restricted to monochrome data, since it was expected to use the standard luminance/chrominance decomposition, which concentrates most of the bandwidth requirements in the luminance. Similar methods may be applied to the two chrominance signals
- …