Search CORE

641 research outputs found

Envelopes and principal component regression

Author: Deng Kai
Mai Qing
Zhang Xin
Publication venue
Publication date: 12/07/2022
Field of study

Envelope methods offer targeted dimension reduction for various models. The overarching goal is to improve efficiency in multivariate parameter estimation by projecting the data onto a lower-dimensional subspace known as the envelope. Envelope approaches have advantages in analyzing data with highly correlated variables, but their iterative Grassmannian optimization algorithms do not scale very well with ultra high-dimensional data. While the connections between envelopes and partial least squares in multivariate linear regression have promoted recent progress in high-dimensional studies of envelopes, we propose a more straightforward way of envelope modeling from a novel principal components regression perspective. The proposed procedure, Non-Iterative Envelope Component Estimation (NIECE), has excellent computational advantages over the iterative Grassmannian optimization alternatives in high dimensions. We develop a unified NIECE theory that bridges the gap between envelope methods and principal components in regression. The new theoretical insights also shed light on the envelope subspace estimation error as a function of eigenvalue gaps of two symmetric positive definite matrices used in envelope modeling. We apply the new theory and algorithm to several envelope models, including response and predictor reduction in multivariate linear models, logistic regression, and Cox proportional hazard model. Simulations and illustrative data analysis show the potential for NIECE to improve standard methods in linear and generalized linear models significantly

arXiv.org e-Print Archive

Statistical analysis for a penalized EM algorithm in high-dimensional mixture linear regression model

Author: Mai Qing
Wang Ning
Zhang Xin
Publication venue
Publication date: 21/07/2023
Field of study

The expectation-maximization (EM) algorithm and its variants are widely used in statistics. In high-dimensional mixture linear regression, the model is assumed to be a finite mixture of linear regression and the number of predictors is much larger than the sample size. The standard EM algorithm, which attempts to find the maximum likelihood estimator, becomes infeasible for such model. We devise a group lasso penalized EM algorithm and study its statistical properties. Existing theoretical results of regularized EM algorithms often rely on dividing the sample into many independent batches and employing a fresh batch of sample in each iteration of the algorithm. Our algorithm and theoretical analysis do not require sample-splitting, and can be extended to multivariate response cases. The proposed methods also have encouraging performances in numerical studies

arXiv.org e-Print Archive

Mobile App Development to Increase Student Engagement and Problem Solving Skills

Author: Dekhane Sonal
Tsoi Mai Y.
Xu Xin
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2013
Field of study

This paper describes a project designed to promote problem solving and critical thinking skills in a general education, computing course at an open access institution. A visual programming tool, GameSalad, was used to enable students to create educational apps for mobile platforms. The students worked on a game development project for the entire semester, incorporating various skills learned throughout the semester. Pre and post quiz analysis showed a significant improvement in students’ ability to design comprehensive solutions to a given problem. Survey results also showed increased student engagement, high interest in computing and a “better” understanding of information technology

AIS Electronic Library (AISeL)

Slicing-free Inverse Regression in High-dimensional Sufficient Dimension Reduction

Author: Mai Qing
Shao Xiaofeng
Wang Runmin
Zhang Xin
Publication venue: 'Institute of Statistical Science'
Publication date: 12/04/2023
Field of study

Sliced inverse regression (SIR, Li 1991) is a pioneering work and the most recognized method in sufficient dimension reduction. While promising progress has been made in theory and methods of high-dimensional SIR, two remaining challenges are still nagging high-dimensional multivariate applications. First, choosing the number of slices in SIR is a difficult problem, and it depends on the sample size, the distribution of variables, and other practical considerations. Second, the extension of SIR from univariate response to multivariate is not trivial. Targeting at the same dimension reduction subspace as SIR, we propose a new slicing-free method that provides a unified solution to sufficient dimension reduction with high-dimensional covariates and univariate or multivariate response. We achieve this by adopting the recently developed martingale difference divergence matrix (MDDM, Lee & Shao 2018) and penalized eigen-decomposition algorithms. To establish the consistency of our method with a high-dimensional predictor and a multivariate response, we develop a new concentration inequality for sample MDDM around its population counterpart using theories for U-statistics, which may be of independent interest. Simulations and real data analysis demonstrate the favorable finite sample performance of the proposed method

arXiv.org e-Print Archive

Identifying Malicious Nodes in Multihop IoT Networks using Dual Link Technologies and Unsupervised Learning

Author: David Tipper
Mai Abdelhakim
Prashant Krishnamurthy
Xin Liu
Publication venue: RonPub
Publication date: 01/01/2018
Field of study

Packet manipulation attack is one of the challenging threats in cyber-physical systems (CPSs) and Internet of Things (IoT), where information packets are corrupted during transmission by compromised devices. These attacks consume network resources, result in delays in decision making, and could potentially lead to triggering wrong actions that disrupt an overall system's operation. Such malicious attacks as well as unintentional faults are difficult to locate/identify in a large-scale mesh-like multihop network, which is the typical topology suggested by most IoT standards. In this paper, first, we propose a novel network architecture that utilizes powerful nodes that can support two distinct communication link technologies for identification of malicious networked devices (with typical singlelink technology). Such powerful nodes equipped with dual-link technologies can reveal hidden information within meshed connections that is hard to otherwise detect. By applying machine intelligence at the dual-link nodes, malicious networked devices in an IoT network can be accurately identified. Second, we propose two techniques based on unsupervised machine learning, namely hard detection and soft detection, that enable dual-link nodes to identify malicious networked devices. Our techniques exploit network diversity as well as the statistical information computed by dual-link nodes to identify the trustworthiness of resource-constrained devices. Simulation results show that the detection accuracy of our algorithms is superior to the conventional watchdog scheme, where nodes passively listen to neighboring transmissions to detect corrupted packets. The results also show that as the density of the dual-link nodes increases, the detection accuracy improves and the false alarm rate decreases

RonPub -- Research Online Publishing

Directory of Open Access Journals

D-Scholarship@Pitt

DAQE: Enhancing the Quality of Compressed Images by Finding the Secret of Defocus

Author: Deng Xin
Guo Yichen
Xing Qunliang
Xu Mai
Publication venue
Publication date: 20/11/2022
Field of study

Image defocus is inherent in the physics of image formation caused by the optical aberration of lenses, providing plentiful information on image quality. Unfortunately, the existing quality enhancement approaches for compressed images neglect the inherent characteristic of defocus, resulting in inferior performance. This paper finds that in compressed images, the significantly defocused regions are with better compression quality and two regions with different defocus values possess diverse texture patterns. These findings motivate our defocus-aware quality enhancement (DAQE) approach. Specifically, we propose a novel dynamic region-based deep learning architecture of the DAQE approach, which considers the region-wise defocus difference of compressed images in two aspects. (1) The DAQE approach employs fewer computational resources to enhance the quality of significantly defocused regions, while more resources on enhancing the quality of other regions; (2) The DAQE approach learns to separately enhance diverse texture patterns for the regions with different defocus values, such that texture-wise one-on-one enhancement can be achieved. Extensive experiments validate the superiority of our DAQE approach in terms of quality enhancement and resource-saving, compared with other state-of-the-art approaches

arXiv.org e-Print Archive