Search CORE

41 research outputs found

LeCo: Lightweight Compression via Learning Serial Correlations

Author: Liu Yihao
Zeng Xinyu
Zhang Huanchen
Publication venue
Publication date: 27/06/2023
Field of study

Lightweight data compression is a key technique that allows column stores to exhibit superior performance for analytical queries. Despite a comprehensive study on dictionary-based encodings to approach Shannon's entropy, few prior works have systematically exploited the serial correlation in a column for compression. In this paper, we propose LeCo (i.e., Learned Compression), a framework that uses machine learning to remove the serial redundancy in a value sequence automatically to achieve an outstanding compression ratio and decompression performance simultaneously. LeCo presents a general approach to this end, making existing (ad-hoc) algorithms such as Frame-of-Reference (FOR), Delta Encoding, and Run-Length Encoding (RLE) special cases under our framework. Our microbenchmark with three synthetic and six real-world data sets shows that a prototype of LeCo achieves a Pareto improvement on both compression ratio and random access speed over the existing solutions. When integrating LeCo into widely-used applications, we observe up to 3.9x speed up in filter-scanning a Parquet file and a 16% increase in Rocksdb's throughput

arXiv.org e-Print Archive

HotRAP: Hot Record Retention and Promotion for LSM-trees with tiered storage

Author: Qiu Jiansheng
Yuan Fangzhou
Zhang Huanchen
Publication venue
Publication date: 03/02/2024
Field of study

The multi-level design of Log-Structured Merge-trees (LSM-trees) naturally fits the tiered storage architecture: the upper levels (recently inserted/updated records) are kept in fast storage to guarantee performance while the lower levels (the majority of records) are placed in slower but cheaper storage to reduce cost. However, frequently accessed records may have been compacted and reside in slow storage, and existing algorithms are inefficient in promoting these ``hot'' records to fast storage, leading to compromised read performance. We present HotRAP, a key-value store based on RocksDB that can timely promote hot records individually from slow to fast storage and keep them in fast storage while they are hot. HotRAP uses an on-disk data structure (a specially-made LSM-tree) to track the hotness of keys and includes three pathways to ensure that hot records reach fast storage with short delays. Our experiments show that HotRAP outperforms state-of-the-art LSM-trees on tiered storage by up to 3.3

\times

compared to the second best for read-only and read-write-balanced workloads with common access skew patterns

arXiv.org e-Print Archive

Systematic electronic structure in the cuprate parent state from quantum many-body simulations

Author: Chan Garnet Kin-Lic
Cui Zhi-Hao
Zhai Huanchen
Zhang Xing
Publication venue: 'American Association for the Advancement of Science (AAAS)'
Publication date: 20/06/2022
Field of study

The quantitative description of correlated electron materials remains a modern computational challenge. We demonstrate a numerical strategy to simulate correlated materials at the fully ab initio level beyond the solution of effective low-energy models, and apply it to gain a detailed microscopic understanding across a family of cuprate superconducting materials in their parent undoped states. We uncover microscopic trends in the electron correlations and reveal the link between the material composition and magnetic energy scales via a many-body picture of excitation processes involving the buffer layers. Our work illustrates a path towards a quantitative and reliable understanding of more complex states of correlated materials at the ab initio many-body level.Comment: 21 pages, 5 figures, with Supplementary Material

arXiv.org e-Print Archive

SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores

Author: Fu Wei
Mei Zhiyu
Wang Guangju
Wu Yi
Zhang Huanchen
Publication venue
Publication date: 29/06/2023
Field of study

The ever-growing complexity of reinforcement learning (RL) tasks demands a distributed RL system to efficiently generate and process a massive amount of data to train intelligent agents. However, existing open-source libraries suffer from various limitations, which impede their practical use in challenging scenarios where large-scale training is necessary. While industrial systems from OpenAI and DeepMind have achieved successful large-scale RL training, their system architecture and implementation details remain undisclosed to the community. In this paper, we present a novel abstraction on the dataflows of RL training, which unifies practical RL training across diverse applications into a general framework and enables fine-grained optimizations. Following this abstraction, we develop a scalable, efficient, and extensible distributed RL system called ReaLly Scalable RL (SRL). The system architecture of SRL separates major RL computation components and allows massively parallelized training. Moreover, SRL offers user-friendly and extensible interfaces for customized algorithms. Our evaluation shows that SRL outperforms existing academic libraries in both a single machine and a medium-sized cluster. In a large-scale cluster, the novel architecture of SRL leads to up to 3.7x speedup compared to the design choices adopted by the existing libraries. We also conduct a direct benchmark comparison to OpenAI's industrial system, Rapid, in the challenging hide-and-seek environment. SRL reproduces the same solution as reported by OpenAI with up to 5x speedup in wall-clock time. Furthermore, we also examine the performance of SRL in a much harder variant of the hide-and-seek environment and achieve substantial learning speedup by scaling SRL to over 15k CPU cores and 32 A100 GPUs. Notably, SRL is the first in the academic community to perform RL experiments at such a large scale.Comment: 15 pages, 12 figures, 6 table

arXiv.org e-Print Archive

An Empirical Evaluation of Columnar Storage Formats

Author: Hui Yulong
McKinney Wes
Pavlo Andrew
Shen Jiahong
Zeng Xinyu
Zhang Huanchen
Publication venue
Publication date: 11/04/2023
Field of study

Columnar storage is one of the core components of a modern data analytics system. Although many database management systems (DBMSs) have proprietary storage formats, most provide extensive support to open-source storage formats such as Parquet and ORC to facilitate cross-platform data sharing. But these formats were developed over a decade ago, in the early 2010s, for the Hadoop ecosystem. Since then, both the hardware and workload landscapes have changed significantly. In this paper, we revisit the most widely adopted open-source columnar storage formats (Parquet and ORC) with a deep dive into their internals. We designed a benchmark to stress-test the formats' performance and space efficiency under different workload configurations. From our comprehensive evaluation of Parquet and ORC, we identify design decisions advantageous with modern hardware and real-world data distributions. These include using dictionary encoding by default, favoring decoding speed over compression ratio for integer encoding algorithms, making block compression optional, and embedding finer-grained auxiliary data structures. Our analysis identifies important considerations that may guide future formats to better fit modern technology trends

arXiv.org e-Print Archive

Research on strategy of load-side resonant soft-switching inverter based on interconnection and damping assignment-passivity based control

Author: Huanchen Zhang
Jianguo Li
Jiuhe Wang
Yajing Zhang
Publication venue: Polish Academy of Sciences
Publication date: 01/03/2024
Field of study

Soft-switching technologies can effectively solve the problem of switching losses caused by increasing switching frequency of grid-connected inverters. As a branch of soft-switching technologies, load-side resonant soft-switching is a hotspot for applications of high-frequency inverters, because it has the advantage of achieving soft-switching without using additional components. However, the traditional PI control strategy based on the linear model is prone to destabilization and non-robust dynamic performance when large signal perturbation occurs. In this paper, a novel Passivity-Based Control (PBC) method is proposed to improve the dynamic performance of load-side resonant soft-switching grid-connected inverter. Besides, the model based on the Port Controlled Hamiltonian (PCH) model of the soft switching inverter is carried out, and the passivity-based controller is designed based on the established model using the way of interconnection and damping assignmentpassivity based control (IDA-PBC). Both stable performance and dynamic performance of the load-side resonant soft-switching inverter can be improved over the whole operating range. Finally, a 750 W load-side resonant soft-switching inverter simulation model is built and the output performance is compared with the traditional PI control strategy under stable and dynamic conditions. The simulation results show that the proposed control strategy reduces the harmonic distortion rate and improves the quality of the output waveforms

Directory of Open Access Journals

SALI: A Scalable Adaptive Learned Index Framework based on Probability Models

Author: Chai Yunpeng
Chen Yuxing
Ge Jiake
Guo Yunda
Luo Yuanhui
Pan Anqun
Shi Boyu
Zhang Huanchen
Publication venue
Publication date: 04/09/2023
Field of study

The growth in data storage capacity and the increasing demands for high performance have created several challenges for concurrent indexing structures. One promising solution is learned indexes, which use a learning-based approach to fit the distribution of stored data and predictively locate target keys, significantly improving lookup performance. Despite their advantages, prevailing learned indexes exhibit constraints and encounter issues of scalability on multi-core data storage. This paper introduces SALI, the Scalable Adaptive Learned Index framework, which incorporates two strategies aimed at achieving high scalability, improving efficiency, and enhancing the robustness of the learned index. Firstly, a set of node-evolving strategies is defined to enable the learned index to adapt to various workload skews and enhance its concurrency performance in such scenarios. Secondly, a lightweight strategy is proposed to maintain statistical information within the learned index, with the goal of further improving the scalability of the index. Furthermore, to validate their effectiveness, SALI applied the two strategies mentioned above to the learned index structure that utilizes fine-grained write locks, known as LIPP. The experimental results have demonstrated that SALI significantly enhances the insertion throughput with 64 threads by an average of 2.04x compared to the second-best learned index. Furthermore, SALI accomplishes a lookup throughput similar to that of LIPP+.Comment: Accepted by Conference SIGMOD 24, June 09-15, 2024, Santiago, Chil

arXiv.org e-Print Archive

Ab initio quantum many-body description of superconducting trends in the cuprates

Author: Berkelbach Timothy C.
Chan Garnet Kin-Lic
Cui Zhi-Hao
Kim Raehyun
Lin Lin
Tölle Johannes
Yang Junjie
Ye Hong-Zhou
Zhai Huanchen
Zhang Xing
Publication venue
Publication date: 28/06/2023
Field of study

Using a systematic ab initio quantum many-body approach that goes beyond low-energy models, we directly compute the superconducting pairing order of several doped cuprate materials and structures. We find that we can correctly capture two well-known trends: the pressure effect, where pairing order increases with intra-layer pressure, and the layer effect, where the pairing order varies with the number of copper-oxygen layers. From these calculations, we observe that the strength of superexchange and the covalency at optimal doping are the best descriptors of the maximal pairing order. Our microscopic analysis further identifies short-range copper spin fluctuations, together with multi-orbital charge fluctuations, as central to the pairing trends. Our work illustrates the possibility of a quantitative computational understanding of high-temperature superconducting materials.Comment: 10 pages, 5 figures, with supplementary material

arXiv.org e-Print Archive

Crystallographic and Nuclear Magnetic Resonance Evaluation of the Impact of Peptide Binding to the Second PDZ Domain of Protein Tyrosine Phosphatase 1E

Author: Chang Aram
Hengel Sarah R.
Ke Hengming
Lee Andrew L.
Phillips George N.
Sapienza Paul J.
Wang Huanchen
Zhang Jun
Publication venue
Publication date: 01/01/2010
Field of study

PDZ (PSD95/Discs large/ZO-1) domains are ubiquitous protein interaction motifs found in scaffolding proteins involved in signal transduction. Despite the fact that many PDZs show a limited tendency to undergo structural change, the PDZ family has been associated with long-range communication and allostery. One of the PDZ domains studied most in terms of structure and biophysical properties is the second PDZ (“PDZ2”) domain from protein tyrosine phophatase 1E (PTP1E, also known as PTPL1). Previously we showed through NMR relaxation studies that binding of the RA-GEF2 C-terminal peptide substrate results in long-range propagation of side-chain dynamic changes in human PDZ2 [Fuentes, et al., J. Mol. Biol. (2004), 335, 1105-1115]. Here, we present the first X-ray crystal structures of PDZ2 in the absence and presence of RA-GEF2 ligand, solved to resolutions of 1.65 and 1.3 Å, respectively. These structures deviate somewhat from previously determined NMR structures, and indicate that very minor structural changes in PDZ2 accompany peptide binding. NMR residual dipolar couplings confirm the crystal structures to be accurate models of the time-averaged atomic coordinates of PDZ2. The impact on side-chain dynamics was further tested with a C-terminal peptide from APC, which showed near-identical results to that of RA-GEF2. Thus, allosteric transmission in PDZ2 induced by peptide binding is conveyed purely and robustly by dynamics. 15N relaxation dispersion measurements did not detect appreciable populations of a kinetic structural intermediate. Collectively, for ligand binding to PDZ2, these data support a lock-and-key binding model from a structural perspective and an allosteric model from a dynamical perspective, which together suggest a complex energy landscape for functional transitions within the ensemble

Carolina Digital Repository