157 research outputs found
A Provable Smoothing Approach for High Dimensional Generalized Regression with Applications in Genomics
In many applications, linear models fit the data poorly. This article studies
an appealing alternative, the generalized regression model. This model only
assumes that there exists an unknown monotonically increasing link function
connecting the response to a single index of explanatory
variables . The generalized regression model is flexible and
covers many widely used statistical models. It fits the data generating
mechanisms well in many real problems, which makes it useful in a variety of
applications where regression models are regularly employed. In low dimensions,
rank-based M-estimators are recommended to deal with the generalized regression
model, giving root- consistent estimators of . Applications of
these estimators to high dimensional data, however, are questionable. This
article studies, both theoretically and practically, a simple yet powerful
smoothing approach to handle the high dimensional generalized regression model.
Theoretically, a family of smoothing functions is provided, and the amount of
smoothing necessary for efficient inference is carefully calculated.
Practically, our study is motivated by an important and challenging scientific
problem: decoding gene regulation by predicting transcription factors that bind
to cis-regulatory elements. Applying our proposed method to this problem shows
substantial improvement over the state-of-the-art alternative in real data.Comment: 53 page
Least Squares Based and Two-Stage Least Squares Based Iterative Estimation Algorithms for H-FIR-MA Systems
This paper studies the identification of Hammerstein finite impulse response moving average (H-FIR-MA for short) systems. A new two-stage least squares iterative algorithm is developed to identify the parameters of the H-FIR-MA systems. The simulation cases indicate the efficiency of the proposed algorithms
STATISTICAL METHODS FOR DECODING GENE REGULATION IN SINGLE CELLS
Single-cell sequencing is rapidly transforming biomedical research. With the ability to measure omics information in individual cells, it provides unprecedented resolution to study heterogeneous biological and clinical samples, enabling scientists to discover and characterize previously unknown biological signals and processes carried by novel or rare cell subpopulations. The new data structure and high level of noise in the single-cell genomic data pose significant analytical challenges. To address these challenges, we developed new statistical and computational methods for analyzing single-cell transcriptome and regulome data. First, to infer cells’ underlying developmental trajectories, we developed TSCAN that performs “pseudotime” analysis with a cluster-based minimum spanning tree approach. TSCAN facilitates accurate construction of pseudotemporal trajectories by regularizing the complexity of spanning trees. By improving the bias-variance tradeoff of the spanning tree estimation, TSCAN substantially improved the accuracy and robustness of the pseudotime analysis. Second, we developed RAISIN to support regression and differential analysis in single-cell RNA-seq datasets with multiple samples. Compared to classical linear mixed effects model, RAISIN improves variance estimate and statistical power for datasets with small sample size or cell number, and improves scalability for datasets with large sample size and millions of cells. Third, we developed SCATE to extract and enhance signals from the highly noisy and sparse single-cell ATAC-seq data. SCATE accurately infers genome-wide activities of each individual cis-regulatory element by adaptively integrating information from co-activated cis-regulatory elements, similar cells, and massive amounts of publicly available regulome data. The enhanced signal improves the performance of downstream analyses such as peak calling and prediction of transcription factor binding sites. These methods have been applied in numerous collaborative projects and helped decipher gene regulatory programs in T cell exhaustion process and identify molecular signatures in neoadjuvant immunotherapy
Stacking tunable interlayer magnetism in bilayer CrI3
Diverse interlayer tunability of physical properties of two-dimensional
layers mostly lies in the covalent-like quasi-bonding that is significant in
electronic structures but rather weak for energetics. Such characteristics
result in various stacking orders that are energetically comparable but may
significantly differ in terms of electronic structures, e.g. magnetism.
Inspired by several recent experiments showing interlayer
anti-ferromagnetically coupled CrI3 bilayers, we carried out first-principles
calculations for CrI3 bilayers. We found that the anti-ferromagnetic coupling
results from a new stacking order with the C2/m space group symmetry, rather
than the graphene-like one with R3 as previously believed. Moreover, we
demonstrated that the intra- and inter-layer couplings in CrI3 bilayer are
governed by two different mechanisms, namely ferromagnetic super-exchange and
direct-exchange interactions, which are largely decoupled because of their
significant difference in strength at the strong- and weak-interaction limits.
This allows the much weaker interlayer magnetic coupling to be more feasibly
tuned by stacking orders solely. Given the fact that interlayer magnetic
properties can be altered by changing crystal structure with different stacking
orders, our work opens a new paradigm for tuning interlayer magnetic properties
with the freedom of stacking order in two dimensional layered materials
Geochemical Composition Variations and Tectonic Implications of the Baoligaomiao Formation Volcanic Rocks from the Uliastai Continental Margin, Southeast Central Asian Orogenic Belt
The Permo-Carboniferous tectonic evolution in the Uliastai continental margin (UCM), north of the southeast central Asian Orogenic Belt, remains controversial. This work examined the geochemical composition of the felsic volcanic rocks from the lower and upper part of the Baoligaomiao Formation in the UCM. Zircon U-Pb ages reveal that the Baoligaomiao Formation has a long-lived eruption duration, from ca. 285 to 328 Ma. The lower part (ca. 328–310 Ma) of the Baoligaomiao Formation is dominated by clastic and pyroclastic rocks with subordinate intermediate-felsic volcanic rocks, whereas the upper part (ca. 307–285 Ma) mainly consists of felsic volcanic rocks and pyroclastic rocks. Calculations reveal that the felsic volcanic rocks from the lower part have low zircon saturation temperatures (TZr = 747℃–795℃), whereas those from the upper part exhibit high TZr (ca. 793℃–930℃). Zircons from the lower part exhibit high εHf(t) values and 176Lu/177Hf ratios, in contrast to the low εHf(t) values and 176Lu/177Hf ratios of zircons from the upper part. Those petrogeological and geochemical shifts might support the tectonic switch model in the UCM at the end of the Carboniferous, providing new constraints on the Late Carboniferous closure of the Hegenshan Ocean
- …