Search CORE

8 research outputs found

On Efficient Range-Summability of IID Random Variables in Two or Higher Dimensions

Author: Meng Jingfan
Ogihara Mitsunori
Wang Huayi
Xu Jun
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 26th International Conference on Database Theory (ICDT 2023)
Publication date: 01/01/2023
Field of study

d-dimensional (for d > 1) efficient range-summability (dD-ERS) of random variables (RVs) is a fundamental algorithmic problem that has applications to two important families of database problems, namely, fast approximate wavelet tracking (FAWT) on data streams and approximately answering range-sum queries over a data cube. Whether there are efficient solutions to the dD-ERS problem, or to the latter database problem, have been two long-standing open problems. Both are solved in this work. Specifically, we propose a novel solution framework to dD-ERS on RVs that have Gaussian or Poisson distribution. Our dD-ERS solutions are the first ones that have polylogarithmic time complexities. Furthermore, we develop a novel k-wise independence theory that allows our dD-ERS solutions to have both high computational efficiencies and strong provable independence guarantees. Finally, we show that under a sufficient and likely necessary condition, certain existing solutions for 1D-ERS can be generalized to higher dimensions

Dagstuhl Research Online Publication Server

A Dyadic Simulation Approach to Efficient Range-Summability

Author: Meng Jingfan
Ogihara Mitsunori
Wang Huayi
Xu Jun
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 25th International Conference on Database Theory (ICDT 2022)
Publication date: 13/09/2021
Field of study

Efficient range-summability (ERS) of a long list of random variables is a fundamental algorithmic problem that has applications to three important database applications, namely, data stream processing, space-efficient histogram maintenance (SEHM), and approximate nearest neighbor searches (ANNS). In this work, we propose a novel dyadic simulation framework and develop three novel ERS solutions, namely Gaussian-dyadic simulation tree (DST), Cauchy-DST and Random Walk-DST, using it. We also propose novel rejection sampling techniques to make these solutions computationally efficient. Furthermore, we develop a novel k-wise independence theory that allows our ERS solutions to have both high computational efficiencies and strong provable independence guarantees

Dagstuhl Research Online Publication Server

University of Miami: Scholarship Miami

Rethinking Similarity Search: Embracing Smarter Mechanisms over Smarter Data

Author: Meng Jingfan
Rong Kexin
Wang Huayi
Wu Renzhi
Xu Jie Jeff
Publication venue
Publication date: 01/08/2023
Field of study

In this vision paper, we propose a shift in perspective for improving the effectiveness of similarity search. Rather than focusing solely on enhancing the data quality, particularly machine learning-generated embeddings, we advocate for a more comprehensive approach that also enhances the underpinning search mechanisms. We highlight three novel avenues that call for a redefinition of the similarity search problem: exploiting implicit data structures and distributions, engaging users in an iterative feedback loop, and moving beyond a single query vector. These novel pathways have gained relevance in emerging applications such as large-scale language models, video clip retrieval, and data labeling. We discuss the corresponding research challenges posed by these new problem areas and share insights from our preliminary discoveries

arXiv.org e-Print Archive

RECIPE: Rateless Erasure Codes Induced by Protocol-Based Encoding

Author: Liu Ziheng
Meng Jingfan
Wang Yiwei
Xu Jun
Publication venue
Publication date: 10/05/2023
Field of study

LT (Luby transform) codes are a celebrated family of rateless erasure codes (RECs). Most of existing LT codes were designed for applications in which a centralized encoder possesses all message blocks and is solely responsible for encoding them into codewords. Distributed LT codes, in which message blocks are physically scattered across multiple different locations (encoders) that need to collaboratively perform the encoding, has never been systemically studied before despite its growing importance in applications. In this work, we present the first systemic study of LT codes in the distributed setting, and make the following three major contributions. First, we show that only a proper subset of LT codes are feasible in the distributed setting, and give the sufficient and necessary condition for such feasibility. Second, we propose a distributed encoding protocol that can efficiently implement any feasible code. The protocol is parameterized by a so-called action probability array (APA) that is only a few KBs in size, and any feasible code corresponds to a valid APA setting and vice versa. Third, we propose two heuristic search algorithms that have led to the discovery of feasible codes that are much more efficient than the state of the art.Comment: Accepted by IEEE ISIT 202

arXiv.org e-Print Archive

Recommended from our members

MP-RW-LSH an efficient multi-probe LSH solution to ANNS- L 1

Author: Gong Long
Meng Jingfan
Ogihara Mitsunori
Wang Huayi
Xu Jun
Publication venue
Publication date: 01/09/2021
Field of study

Approximate Nearest Neighbor Search (ANNS) is a fundamental algorithmic problem, with numerous applications in many areas of computer science. Locality-Sensitive Hashing (LSH) is one of the most popular solution approaches for ANNS. A common shortcoming of many LSH schemes is that since they probe only a single bucket in a hash table, they need to use a large number of hash tables to achieve a high query accuracy. For ANNS- L 2 , a multi-probe scheme was proposed to overcome this drawback by strategically probing multiple buckets in a hash table. In this work, we propose MP-RW-LSH, the first and so far only multi-probe LSH solution to ANNS in L 1 distance, and show that it achieves a better tradeoff between scalability and query efficiency than all existing LSH-based solutions. We also explain why a state-of-the-art ANNS -L 1 solution called Cauchy projection LSH (CP-LSH) is fundamentally not suitable for multi-probe extension. Finally, as a use case, we construct, using MP-RW-LSH as the underlying "ANNS- L 1 engine", a new ANNS-E (E for edit distance) solution that beats the state of the art

University of Miami: Scholarship Miami

Stress-based fatigue behavior of Ti–6Al–4V alloy with a discontinuous lamellar microstructure fabricated by thermomechanical powder consolidation

Author: ASTM International
ASTM International
Banerjee
Bantounas
Bodunrin
Bolzoni
Cai
Cao
Cao
Cao
Cao
Chan
Chen
Deliang Zhang
Dunstan
Dunstan
Duz
El-Soudani
Eylon
Fang
Gb/T 3075-2008
Jia
Jia
Jingbo Gao
Jingfan Zhang
Joseph
Jukun Yue
Lei Meng
Li
Liang
Luo
Lutjering
Minakawa
Murakami
Nalla
Niu
Pao
Paramore
Pegues
Rao
Ren
Romero
Savage
Semiatin
Sinha
Soro
Verdhan
Wang
Xiaoli Zhao
Yan
Zhang
Zhang
Åkerfeldt
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref