54 research outputs found
A Splicing Approach to Best Subset of Groups Selection
Best subset of groups selection (BSGS) is the process of selecting a small
part of non-overlapping groups to achieve the best interpretability on the
response variable. It has attracted increasing attention and has far-reaching
applications in practice. However, due to the computational intractability of
BSGS in high-dimensional settings, developing efficient algorithms for solving
BSGS remains a research hotspot. In this paper,we propose a group-splicing
algorithm that iteratively detects the relevant groups and excludes the
irrelevant ones. Moreover, coupled with a novel group information criterion, we
develop an adaptive algorithm to determine the optimal model size. Under mild
conditions, it is certifiable that our algorithm can identify the optimal
subset of groups in polynomial time with high probability. Finally, we
demonstrate the efficiency and accuracy of our methods by comparing them with
several state-of-the-art algorithms on both synthetic and real-world datasets.Comment: 49 pages, 7 figure
A SIMPLE Approach to Provably Reconstruct Ising Model with Global Optimality
Reconstruction of interaction network between random events is a critical
problem arising from statistical physics and politics to sociology, biology,
and psychology, and beyond. The Ising model lays the foundation for this
reconstruction process, but finding the underlying Ising model from the least
amount of observed samples in a computationally efficient manner has been
historically challenging for half a century. By using the idea of sparsity
learning, we present a approach named SIMPLE that has a dominant sample
complexity from theoretical limit. Furthermore, a tuning-free algorithm is
developed to give a statistically consistent solution of SIMPLE in polynomial
time with high probability. On extensive benchmarked cases, the SIMPLE approach
provably reconstructs underlying Ising models with global optimality. The
application on the U.S. senators voting in the last six congresses reveals that
both the Republicans and Democrats noticeably assemble in each congresses;
interestingly, the assembling of Democrats is particularly pronounced in the
latest congress
Sciences for The 2.5-meter Wide Field Survey Telescope (WFST)
The Wide Field Survey Telescope (WFST) is a dedicated photometric survey
facility under construction jointly by the University of Science and Technology
of China and Purple Mountain Observatory. It is equipped with a primary mirror
of 2.5m in diameter, an active optical system, and a mosaic CCD camera of 0.73
Gpix on the main focus plane to achieve high-quality imaging over a field of
view of 6.5 square degrees. The installation of WFST in the Lenghu observing
site is planned to happen in the summer of 2023, and the operation is scheduled
to commence within three months afterward. WFST will scan the northern sky in
four optical bands (u, g, r, and i) at cadences from hourly/daily to
semi-weekly in the deep high-cadence survey (DHS) and the wide field survey
(WFS) programs, respectively. WFS reaches a depth of 22.27, 23.32, 22.84, and
22.31 in AB magnitudes in a nominal 30-second exposure in the four bands during
a photometric night, respectively, enabling us to search tremendous amount of
transients in the low-z universe and systematically investigate the variability
of Galactic and extragalactic objects. Intranight 90s exposures as deep as 23
and 24 mag in u and g bands via DHS provide a unique opportunity to facilitate
explorations of energetic transients in demand for high sensitivity, including
the electromagnetic counterparts of gravitational-wave events detected by the
second/third-generation GW detectors, supernovae within a few hours of their
explosions, tidal disruption events and luminous fast optical transients even
beyond a redshift of 1. Meanwhile, the final 6-year co-added images,
anticipated to reach g about 25.5 mag in WFS or even deeper by 1.5 mag in DHS,
will be of significant value to general Galactic and extragalactic sciences.
The highly uniform legacy surveys of WFST will also serve as an indispensable
complement to those of LSST which monitors the southern sky.Comment: 46 pages, submitted to SCMP
BTOB: Extending the Biased GWAS to Bivariate GWAS
10.3389/fgene.2021.654821Frontiers in Genetics1265482
The effects of the E3 ubiquitin–protein ligase UBR7 of Frankliniella occidentalis on the ability of insects to acquire and transmit TSWV
The interactions between plant viruses and insect vectors are very complex. In recent years, RNA sequencing data have been used to elucidate critical genes of Tomato spotted wilt ortho-tospovirus (TSWV) and Frankliniella occidentalis (F. occidentalis). However, very little is known about the essential genes involved in thrips acquisition and transmission of TSWV. Based on transcriptome data of F. occidentalis infected with TSWV, we verified the complete sequence of the E3 ubiquitin-protein ligase UBR7 gene (UBR7), which is closely related to virus transmission. Additionally, we found that UBR7 belongs to the E3 ubiquitin–protein ligase family that is highly expressed in adulthood in F. occidentalis. UBR7 could interfere with virus replication and thus affect the transmission efficiency of F. occidentalis. With low URB7 expression, TSWV transmission efficiency decreased, while TSWV acquisition efficiency was unaffected. Moreover, the direct interaction between UBR7 and the nucleocapsid (N) protein of TSWV was investigated through surface plasmon resonance and GST pull-down. In conclusion, we found that UBR7 is a crucial protein for TSWV transmission by F. occidentalis, as it directly interacts with TSWV N. This study provides a new direction for developing green pesticides targeting E3 ubiquitin to control TSWV and F. occidentalis
Blockchain-based incentives for secure and collaborative data sharing in multiple clouds
The prosperity of cloud computing has driven an increasing number of enterprises and organizations to store their data on private or public cloud platforms. Due to the limitation of individual data owners in terms of data volume and diversity, data sharing over different cloud platforms would enable third parties to take advantage of big data analysis techniques to provide value-added services, such as providing healthcare services for customers by gathering medical data from multiple hospitals. However, it remains a challenging task to design effective incentives that encourage secure and collaborative data sharing in multiple clouds. In this paper, we propose a reliable collaboration model consisting of three types of participants, which include data owners, miners, and third parties, where the data is shared via blockchain and recorded by a smart contract. In general, these participants may acquire and store the sharing of data using their private or public clouds. We analyze the topological relationships between the participants and develop some Shapley value models from simple to complicate in the process of revenue distribution. We also discuss the incentive effect of sharing security data and rationality of the designed solution through analysis towards distribution rules.This work is partially supported by the Beijing Natural Science Foundation under Grant 4192050, and in part by the National Natural Science Foundation of China under Grants 61972039 and 61872041
abess: A Fast Best Subset Selection Library in Python and R
We introduce a new library named abess that implements a unified framework of
best-subset selection for solving diverse machine learning problems, e.g.,
linear regression, classification, and principal component analysis.
Particularly, the abess certifiably gets the optimal solution within polynomial
times with high probability under the linear model. Our efficient
implementation allows abess to attain the solution of best-subset selection
problems as fast as or even 20x faster than existing competing variable (model)
selection toolboxes. Furthermore, it supports common variants like best group
subset selection and regularized best-subset selection. The core of
the library is programmed in C++. For ease of use, a Python library is designed
for conveniently integrating with scikit-learn, and it can be installed from
the Python library Index. In addition, a user-friendly R library is available
at the Comprehensive R Archive Network. The source code is available at:
https://github.com/abess-team/abess
Distributed vibration sensor with a high strain dynamic range by harmonics analysis
Distributed vibration sensors (DVSs) have important applications in industrial production. A large strain dynamic range is very important for DVSs, but difficult to achieve as it often requires a complex sensing system or cumbersome data processing. This study shows that the strain dynamic range of the DVS can be improved by analyzing the harmonic numbers in the vibration response spectrum, and the vibration amplitude can be quantitatively measured with a strain resolution of 0.78 με for each additional harmonic. The system and data analysis method can significantly improve the strain dynamic range in comparison with traditional DVS based on the intensity or polarization information
- …