641 research outputs found
Envelopes and principal component regression
Envelope methods offer targeted dimension reduction for various models. The
overarching goal is to improve efficiency in multivariate parameter estimation
by projecting the data onto a lower-dimensional subspace known as the envelope.
Envelope approaches have advantages in analyzing data with highly correlated
variables, but their iterative Grassmannian optimization algorithms do not
scale very well with ultra high-dimensional data. While the connections between
envelopes and partial least squares in multivariate linear regression have
promoted recent progress in high-dimensional studies of envelopes, we propose a
more straightforward way of envelope modeling from a novel principal components
regression perspective. The proposed procedure, Non-Iterative Envelope
Component Estimation (NIECE), has excellent computational advantages over the
iterative Grassmannian optimization alternatives in high dimensions. We develop
a unified NIECE theory that bridges the gap between envelope methods and
principal components in regression. The new theoretical insights also shed
light on the envelope subspace estimation error as a function of eigenvalue
gaps of two symmetric positive definite matrices used in envelope modeling. We
apply the new theory and algorithm to several envelope models, including
response and predictor reduction in multivariate linear models, logistic
regression, and Cox proportional hazard model. Simulations and illustrative
data analysis show the potential for NIECE to improve standard methods in
linear and generalized linear models significantly
Statistical analysis for a penalized EM algorithm in high-dimensional mixture linear regression model
The expectation-maximization (EM) algorithm and its variants are widely used
in statistics. In high-dimensional mixture linear regression, the model is
assumed to be a finite mixture of linear regression and the number of
predictors is much larger than the sample size. The standard EM algorithm,
which attempts to find the maximum likelihood estimator, becomes infeasible for
such model. We devise a group lasso penalized EM algorithm and study its
statistical properties. Existing theoretical results of regularized EM
algorithms often rely on dividing the sample into many independent batches and
employing a fresh batch of sample in each iteration of the algorithm. Our
algorithm and theoretical analysis do not require sample-splitting, and can be
extended to multivariate response cases. The proposed methods also have
encouraging performances in numerical studies
Mobile App Development to Increase Student Engagement and Problem Solving Skills
This paper describes a project designed to promote problem solving and critical thinking skills in a general education, computing course at an open access institution. A visual programming tool, GameSalad, was used to enable students to create educational apps for mobile platforms. The students worked on a game development project for the entire semester, incorporating various skills learned throughout the semester. Pre and post quiz analysis showed a significant improvement in students’ ability to design comprehensive solutions to a given problem. Survey results also showed increased student engagement, high interest in computing and a “better” understanding of information technology
Slicing-free Inverse Regression in High-dimensional Sufficient Dimension Reduction
Sliced inverse regression (SIR, Li 1991) is a pioneering work and the most
recognized method in sufficient dimension reduction. While promising progress
has been made in theory and methods of high-dimensional SIR, two remaining
challenges are still nagging high-dimensional multivariate applications. First,
choosing the number of slices in SIR is a difficult problem, and it depends on
the sample size, the distribution of variables, and other practical
considerations. Second, the extension of SIR from univariate response to
multivariate is not trivial. Targeting at the same dimension reduction subspace
as SIR, we propose a new slicing-free method that provides a unified solution
to sufficient dimension reduction with high-dimensional covariates and
univariate or multivariate response. We achieve this by adopting the recently
developed martingale difference divergence matrix (MDDM, Lee & Shao 2018) and
penalized eigen-decomposition algorithms. To establish the consistency of our
method with a high-dimensional predictor and a multivariate response, we
develop a new concentration inequality for sample MDDM around its population
counterpart using theories for U-statistics, which may be of independent
interest. Simulations and real data analysis demonstrate the favorable finite
sample performance of the proposed method
Identifying Malicious Nodes in Multihop IoT Networks using Dual Link Technologies and Unsupervised Learning
Packet manipulation attack is one of the challenging threats in cyber-physical systems (CPSs) and Internet of Things (IoT), where information packets are corrupted during transmission by compromised devices. These attacks consume network resources, result in delays in decision making, and could potentially lead to triggering wrong actions that disrupt an overall system's operation. Such malicious attacks as well as unintentional faults are difficult to locate/identify in a large-scale mesh-like multihop network, which is the typical topology suggested by most IoT standards. In this paper, first, we propose a novel network architecture that utilizes powerful nodes that can support two distinct communication link technologies for identification of malicious networked devices (with typical singlelink technology). Such powerful nodes equipped with dual-link technologies can reveal hidden information within meshed connections that is hard to otherwise detect. By applying machine intelligence at the dual-link nodes, malicious networked devices in an IoT network can be accurately identified. Second, we propose two techniques based on unsupervised machine learning, namely hard detection and soft detection, that enable dual-link nodes to identify malicious networked devices. Our techniques exploit network diversity as well as the statistical information computed by dual-link nodes to identify the trustworthiness of resource-constrained devices. Simulation results show that the detection accuracy of our algorithms is superior to the conventional watchdog scheme, where nodes passively listen to neighboring transmissions to detect corrupted packets. The results also show that as the density of the dual-link nodes increases, the detection accuracy improves and the false alarm rate decreases
DAQE: Enhancing the Quality of Compressed Images by Finding the Secret of Defocus
Image defocus is inherent in the physics of image formation caused by the
optical aberration of lenses, providing plentiful information on image quality.
Unfortunately, the existing quality enhancement approaches for compressed
images neglect the inherent characteristic of defocus, resulting in inferior
performance. This paper finds that in compressed images, the significantly
defocused regions are with better compression quality and two regions with
different defocus values possess diverse texture patterns. These findings
motivate our defocus-aware quality enhancement (DAQE) approach. Specifically,
we propose a novel dynamic region-based deep learning architecture of the DAQE
approach, which considers the region-wise defocus difference of compressed
images in two aspects. (1) The DAQE approach employs fewer computational
resources to enhance the quality of significantly defocused regions, while more
resources on enhancing the quality of other regions; (2) The DAQE approach
learns to separately enhance diverse texture patterns for the regions with
different defocus values, such that texture-wise one-on-one enhancement can be
achieved. Extensive experiments validate the superiority of our DAQE approach
in terms of quality enhancement and resource-saving, compared with other
state-of-the-art approaches
- …