Search CORE

57 research outputs found

Query-centric regression for In-DBMS analytics

Author: Ma Qingzhi
Triantafillou Peter
Publication venue
Publication date: 01/01/2020
Field of study

Warwick Research Archives Portal Repository

DBEst : revisiting approximate query processing engines with machine learning models

Author: Ma Qingzhi
Triantafillou Peter
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

In the era of big data, computing exact answers to analytical queries becomes prohibitively expensive. This greatly increases the value of approaches that can compute efficiently approximate, but highly-accurate, answers to analytical queries. Alas, the state of the art still suffers from many shortcomings: Errors are still high, unless large memory investments are made. Many important analytics tasks are not supported. Query response times are too long and thus approaches rely on parallel execution of queries atop large big data analytics clusters, in-situ or in the cloud, whose acquisition/use costs dearly. Hence, the following questions are crucial: Can we develop AQP engines that reduce response times by orders of magnitude, ensure high accuracy, and support most aggregate functions? With smaller memory footprints and small overheads to build the state upon which they are based? With this paper, we show that the answers to all questions above can be positive. The paper presents DBEst, a system based on Machine Learning models (regression models and probability density estimators). It will discuss its limitations, promises, and how it can complement existing systems. It will substantiate its advantages using queries and data from the TPC-DS benchmark and real-life datasets, compared against state of the art AQP engines

Warwick Research Archives Portal Repository

Approximate query processing using machine learning

Author: Ma Qingzhi
Publication venue
Publication date
Field of study

In the era of big data, the volume of collected data grows faster than the growth of computational power. And it becomes prohibitively expensive to compute the exact answers to analytical queries. This greatly increases the value of approaches that can compute efficiently approximate, but highly accurate, answers to analytical queries. Approximate query processing (AQP) aims to reduce the query latency and memory footprints at the cost of small quality losses. Previous efforts on AQP largely rely on samples or sketches, etc. However, trade-offs between query response time (or memory footprint) and accuracy are unavoidable. Specifically, to guarantee higher accuracy, a large sample is usually generated and maintained, which leads to increased query response time and space overheads. In this thesis, we aim to overcome the drawbacks of current AQP solutions by applying machine learning models. Instead of accessing data (or samples of it), models are used to make predictions. Our model-based AQP solutions are developed and improved in three stages, and are described as follows: 1. We firstly investigate potential regression models for AQP and propose the query-centric regression, coined QReg. QReg is an ensemble method based on regression models. It achieves better accuracy than the state-of- the-art regression models and overcomes the generalization-overfit dilemma when employing machine learning models within DBMSs. 2. We introduce the first AQP engine DBEst based on classical machine learning models. Specifically, regression models and density estimators are trained over the data/samples, and are further combined to produce the final approximate answers. 3. We further improve DBEst by replacing classical machine learning models with deep learning networks and word embedding. This overcomes the drawbacks of queries with large groups, and query response time and space overheads are further reduced. We conduct experiments against the state-of-the-art AQP engines over various datasets, and show that our method achieves better accuracy while offering orders of magnitude savings in space overheads and query response time

Warwick Research Archives Portal Repository

Hepatitis B virus infection and replication in human bone marrow mesenchymal stem cells

Author: Hao Qingzhi
Li Xia
Ma Lixian
Ma Ruiping
Sai Lintao
Shao Lihua
Wang Dakun
Xing Quantai
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Springer - Publisher Connector

PubMed Central

Streaming weighted sampling over join queries

Author: Cormode Graham
Ma Qingzhi
Shanghooshabad A. M.
Shekelyan Michael
Triantafillou Peter
Publication venue
Publication date: 01/03/2023
Field of study

Warwick Research Archives Portal Repository

Learned approximate query processing : make it light, accurate and fast

Author: Almasi Mehrdad
Kurmanji Meghdad
Ma Qingzhi
Shanghooshabad Ali M.
Triantafillou Peter
Publication venue
Publication date: 23/03/2021
Field of study

Warwick Research Archives Portal Repository

Sirtuin 6 maintains epithelial STAT6 activity to support intestinal tuft cell development and type 2 immunity

Author: He Wei-Qi
Huang Rong
Li Zun
Ma Honghui
Ma Jie
Ren Kaiqun
Ruan Hai-Bin
Song Yu
Vakoc Christopher R
Wang Qingzhi
Wang Qunyi
Wu Xiaoli S
Xin Yue
Xiong Xiwen
Xu Lin
Xu Shaofang
Yang Chenyan
Yu Jiahui
Zhang Xinge
Zhong Genshen
Zhong Jiateng
Zhu Xiaofei
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/09/2022
Field of study

Dynamic regulation of intestinal epithelial cell (IEC) differentiation is crucial for both homeostasis and the response to helminth infection. SIRT6 belongs to the NAD+-dependent deacetylases and has established diverse roles in aging, metabolism and disease. Here, we report that IEC Sirt6 deletion leads to impaired tuft cell development and type 2 immunity in response to helminth infection, thereby resulting in compromised worm expulsion. Conversely, after helminth infection, IEC SIRT6 transgenic mice exhibit enhanced epithelial remodeling process and more efficient worm clearance. Mechanistically, Sirt6 ablation causes elevated Socs3 expression, and subsequently attenuated tyrosine 641 phosphorylation of STAT6 in IECs. Notably, intestinal epithelial overexpression of constitutively activated STAT6 (STAT6vt) in mice is sufficient to induce the expansion of tuft and goblet cell linage. Furthermore, epithelial STAT6vt overexpression remarkedly reverses the defects in intestinal epithelial remodeling caused by Sirt6 ablation. Our results reveal a novel function of SIRT6 in regulating intestinal epithelial remodeling and mucosal type 2 immunity in response to helminth infection

Cold Spring Harbor Laboratory Institutional Repository

PubMed Central

Query-centric regression

Author: Ma Qingzhi
Triantafillou Peter
Publication venue: 'Elsevier BV'
Publication date: 01/02/2022
Field of study

Regression Models (RMs) and Machine Learning models (ML) in general, aim to offer high prediction accuracy, even for unforeseen queries/datasets. This depends on their fundamental ability to generalize. However, overfitting a model, with respect to the current DB state, may be best suited to offer excellent accuracy. This overfit-generalize divide bears many practical implications faced by a data analyst. The paper will reveal, shed light, and quantify this divide using a large number of real-world datasets and a large number of RMs. It will show that different RMs occupy different positions in this divide, which results in different RMs being better suited to answer queries on different parts of the same dataset (as queries typically target specific data subspaces defined using selection operators on attributes). It will study in detail 8 real-life data sets and from the TPC-DS benchmark and experiment with various dimensionalities therein. It will employ new appropriate metrics that will reveal the performance differences of RMs and will substantiate the problem across a wide variety of popular RMs, ranging from simple linear models to advanced, state-of-the-art, ensembles (which enjoy excellent generalization performance). It will put forth and study a new, query-centric, model that addresses this problem, improving per-query accuracy, while also offering excellent overall accuracy. Finally, it will study the effects of scale on the problem and its solutions

Warwick Research Archives Portal Repository

Complete mitochondrial genome sequence of Pseudecheneis Sulcata in the Yarlung Zangbo River, Tibet

Author: Bo Ma
Hongyu Jin
Lei Li
Qingzhi Ma
Yu Du
Publication venue: Taylor & Francis Group
Publication date: 01/07/2019
Field of study

Pseudecheneis sulcata belongs to Sisoridae, Pseudecheneis, which is mainly distributed in India and Tibet of China, and is located in the Motuo and Chayu in the lower reaches of the Yarlung Zangbo River in Tibet. In the present study, we obtained the complete mitochondrial genome sequence of Pseudecheneis sulcata, which was 16,535 bp in length. This genome consisted of 13 protein-coding genes, 22 tRNAgenes, 2 rRNA genes and a non-coding control region. The protein-coding genes have three start codons (GTG, ATG, and CTA) and four stop codons, including three complete stop codons and one incomplete stop codon. To verify the accuracy and utility of newly determined mitogenome sequences by constructing a species phylogenetic relationship tree of species, we expect to use the full mitochondrial gene sequence to interpret related evolutionary events

Directory of Open Access Journals