57 research outputs found
DNA Storage: A Promising Large Scale Archival Storage?
Deoxyribonucleic Acid (DNA), with its high density and long durability, is a
promising storage medium for long-term archival storage and has attracted much
attention. Several studies have verified the feasibility of using DNA for
archival storage with a small amount of data. However, the achievable storage
capacity of DNA as archival storage has not been comprehensively investigated
yet. Theoretically, the DNA storage density is about 1 exabyte/mm3 (109
GB/mm3). However, according to our investigation, DNA storage tube capacity
based on the current synthesizing and sequencing technologies is only at
hundreds of Gigabytes due to the limitation of multiple bio and technology
constraints. This paper identifies and investigates the critical factors
affecting the single DNA tube capacity for archival storage. Finally, we
suggest several promising directions to overcome the limitations and enhance
DNA storage capacity
Genome-wide investigation and expression analysis of OSCA gene family in response to abiotic stress in alfalfa
Alfalfa is an excellent leguminous forage crop that is widely cultivated worldwide, but its yield and quality are often affected by drought and soil salinization. Hyperosmolality-gated calcium-permeable channel (OSCA) proteins are hyperosmotic calcium ion (Ca2+) receptors that play an essential role in regulating plant growth, development, and abiotic stress responses. However, no systematic analysis of the OSCA gene family has been conducted in alfalfa. In this study, a total of 14 OSCA genes were identified from the alfalfa genome and classified into three groups based on their sequence composition and phylogenetic relationships. Gene structure, conserved motifs and functional domain prediction showed that all MsOSCA genes had the same functional domain DUF221. Cis-acting element analysis showed that MsOSCA genes had many cis-regulatory elements in response to abiotic or biotic stresses and hormones. Tissue expression pattern analysis demonstrated that the MsOSCA genes had tissue-specific expression; for example, MsOSCA12 was only expressed in roots and leaves but not in stem and petiole tissues. Furthermore, RT–qPCR results indicated that the expression of MsOSCA genes was induced by abiotic stress (drought and salt) and hormones (JA, SA, and ABA). In particular, the expression levels of MsOSCA3, MsOSCA5, MsOSCA12 and MsOSCA13 were significantly increased under drought and salt stress, and MsOSCA7, MsOSCA10, MsOSCA12 and MsOSCA13 genes exhibited significant upregulation under plant hormone treatments, indicating that these genes play a positive role in drought, salt and hormone responses. Subcellular localization results showed that the MsOSCA3 protein was localized on the plasma membrane. This study provides a basis for understanding the biological information and further functional analysis of the MsOSCA gene family and provides candidate genes for stress resistance breeding in alfalfa
Rethinking and Simplifying Bootstrapped Graph Latents
Graph contrastive learning (GCL) has emerged as a representative paradigm in
graph self-supervised learning, where negative samples are commonly regarded as
the key to preventing model collapse and producing distinguishable
representations. Recent studies have shown that GCL without negative samples
can achieve state-of-the-art performance as well as scalability improvement,
with bootstrapped graph latent (BGRL) as a prominent step forward. However,
BGRL relies on a complex architecture to maintain the ability to scatter
representations, and the underlying mechanisms enabling the success remain
largely unexplored. In this paper, we introduce an instance-level decorrelation
perspective to tackle the aforementioned issue and leverage it as a springboard
to reveal the potential unnecessary model complexity within BGRL. Based on our
findings, we present SGCL, a simple yet effective GCL framework that utilizes
the outputs from two consecutive iterations as positive pairs, eliminating the
negative samples. SGCL only requires a single graph augmentation and a single
graph encoder without additional parameters. Extensive experiments conducted on
various graph benchmarks demonstrate that SGCL can achieve competitive
performance with fewer parameters, lower time and space costs, and significant
convergence speedup.Comment: Accepted by WSDM 202
SAILOR: Structural Augmentation Based Tail Node Representation Learning
Graph Neural Networks (GNNs) have achieved state-of-the-art performance in
representation learning for graphs recently. However, the effectiveness of
GNNs, which capitalize on the key operation of message propagation, highly
depends on the quality of the topology structure. Most of the graphs in
real-world scenarios follow a long-tailed distribution on their node degrees,
that is, a vast majority of the nodes in the graph are tail nodes with only a
few connected edges. GNNs produce inferior node representations for tail nodes
since they lack structural information. In the pursuit of promoting the
expressiveness of GNNs for tail nodes, we explore how the deficiency of
structural information deteriorates the performance of tail nodes and propose a
general Structural Augmentation based taIL nOde Representation learning
framework, dubbed as SAILOR, which can jointly learn to augment the graph
structure and extract more informative representations for tail nodes.
Extensive experiments on public benchmark datasets demonstrate that SAILOR can
significantly improve the tail node representations and outperform the
state-of-the-art baselines.Comment: Accepted by CIKM 2023; Code is available at
https://github.com/Jie-Re/SAILO
- …