54 research outputs found
Combined Quantile Forecasting for High-Dimensional Non-Gaussian Data
This study proposes a novel method for forecasting a scalar variable based on
high-dimensional predictors that is applicable to various data distributions.
In the literature, one of the popular approaches for forecasting with many
predictors is to use factor models. However, these traditional methods are
ineffective when the data exhibit non-Gaussian characteristics such as skewness
or heavy tails. In this study, we newly utilize a quantile factor model to
extract quantile factors that describe specific quantiles of the data beyond
the mean factor. We then build a quantile-based forecast model using the
estimated quantile factors at different quantile levels as predictors. Finally,
the predicted values at the various quantile levels are combined into a single
forecast as a weighted average with weights determined by a Markov chain based
on past trends of the target variable. The main idea of the proposed method is
to incorporate a quantile approach to a forecasting method to handle
non-Gaussian characteristics effectively. The performance of the proposed
method is evaluated through a simulation study and real data analysis of PM2.5
data in South Korea, where the proposed method outperforms other existing
methods in most cases
Robust coherence analysis for long-memory processes
This paper investigates the linear relationships between two time-series in the frequency domain, termed coherence analysis. It is widely used in various fields, including signal processing, engineering, and meteorology. However, conventional coherence analysis tends to be sensitive to outliers. Laplace cross-periodogram and a corresponding robust coherence analysis based on the least-absolute deviation (LAD) regression have recently been developed to improve this shortcoming. In this paper, to extend the scope of Laplace cross-periodogram, we study a robust cross periodogram for long-memory processes and derive its asymptotic distribution. Through numerical studies, we demonstrate the usefulness of the proposed robust coherence analysis for long-memory processes.N
Radiomics signature on 3T dynamic contrast-enhanced magnetic resonance imaging for estrogen receptor-positive invasive breast cancers: Preliminary results for correlation with Oncotype DX recurrence scores
To evaluate the ability of a radiomics signature based on 3T dynamic contrast-enhanced (DCE) magnetic resonance imaging (MRI) to distinguish between low and non-low Oncotype DX (OD) risk groups in estrogen receptor (ER)-positive invasive breast cancers.Between May 2011 and March 2016, 67 women with ER-positive invasive breast cancer who performed preoperative 3T MRI and OD assay were included. We divided the patients into low (OD recurrence score [RS] <18) and non-low risk (RS โฅ18) groups. Extracted radiomics features included 8 morphological, 76 histogram-based, and 72 higher-order texture features. A radiomics signature (Rad-score) was generated using the least absolute shrinkage and selection operator (LASSO). Univariate and multivariate logistic regression analyses were performed to investigate the association between clinicopathologic factors, MRI findings, and the Rad-score with OD risk groups, and the areas under the receiver operating characteristic curves (AUC) were used to assess classification performance of the Rad-score.The Rad-score was constructed for each tumor by extracting 10 (6.3%) from 158 radiomics features. A higher Rad-score (odds ratio [OR], 65.209; P <.001), Ki-67 expression (OR, 17.462; P = .007), and high p53 (OR = 8.449; P = .077) were associated with non-low OD risk. The Rad-score classified low and non-low OD risk with an AUC of 0.759.The Rad-score showed the potential for discrimination between low and non-low OD risk groups in patients with ER-positive invasive breast cancers. Copyright ยฉ 2019 the Author(s)
A Data-Adaptive Principal Component Analysis: Use of Composite Asymmetric Huber Function
This article considers a new type of principal component analysis (PCA) that adaptively reflects the information of data. The ordinary PCA is useful for dimension reduction and identifying important features of multivariate data. However, it uses the second moment of data only, and consequently, it is not efficient for analyzing real observations in the case that these are skewed or asymmetric data. To extend the scope of PCA to non-Gaussian distributed data that cannot be well represented by the second moment, a new approach for PCA is proposed. The core of the methodology is to use a composite asymmetric Huber function defined as a weighted linear combination of modified Huber loss functions, which replaces the conventional square loss function. A practical algorithm to implement the data-adaptive PCA is discussed. Results from numerical studies including simulation study and real data analysis demonstrate the promising empirical properties of the proposed approach. Supplementary materials for this article are available online.OAIID:RECH_ACHV_DSTSH_NO:T201616388RECH_ACHV_FG:RR00200001ADJUST_YN:EMP_ID:A076383CITE_RATE:1.735DEPT_NM:ํต๊ณํ๊ณผEMAIL:[email protected]_YN:YN
Composite quantile periodogram for spectral analysis
We propose a new type of periodogram for identifying hidden frequencies and providing a better understanding of the frequency behaviour. The quantile periodogram by Li () provides richer information on the frequency of signal than a single estimation of the mean frequency does. However, it is difficult to find a specific quantile that identifies hidden frequencies. In this study, we consider a weighted linear combination of quantile periodograms, termed 'composite quantile periodogram'. It is completely data adaptive and does not require prior knowledge of the signal. Simulation results and real-data example demonstrate significant improvement in the quality of the periodogram.OAIID:RECH_ACHV_DSTSH_NO:T201616359RECH_ACHV_FG:RR00200001ADJUST_YN:EMP_ID:A076383CITE_RATE:.975DEPT_NM:ํต๊ณํ๊ณผEMAIL:[email protected]_YN:YN
Dynamic principal component analysis with missing values
Dynamic principal component analysis (DPCA), also known as frequency domain principal component analysis, has been developed by Brillinger [Time Series: Data Analysis and Theory, Vol. 36, SIAM, 1981] to decompose multivariate time-series data into a few principal component series. A primary advantage of DPCA is its capability of extracting essential components from the data by reflecting the serial dependence of them. It is also used to estimate the common component in a dynamic factor model, which is frequently used in econometrics. However, its beneficial property cannot be utilized when missing values are present, which should not be simply ignored when estimating the spectral density matrix in the DPCA procedure. Based on a novel combination of conventional DPCA and self-consistency concept, we propose a DPCA method when missing values are present. We demonstrate the advantage of the proposed method over some existing imputation methods through the Monte Carlo experiments and real data analysis.N
A generalization of functional clustering for discrete multivariate longitudinal data
This paper presents a new model-based generalized functional clustering method for discrete longitudinal data, such as multivariate binomial and Poisson distributed data. For this purpose, we propose a multivariate functional principal component analysis (MFPCA)-based clustering procedure for a latent multivariate Gaussian process instead of the original functional data directly. The main contribution of this study is two-fold: modeling of discrete longitudinal data with the latent multivariate Gaussian process and developing of a clustering algorithm based on MFPCA coupled with the latent multivariate Gaussian process. Numerical experiments, including real data analysis and a simulation study, demonstrate the promising empirical properties of the proposed approach.N
A Data-Adaptive Principal Component Analysis: Use of Composite Asymmetric Huber Function
<p>This article considers a new type of principal component analysis (PCA) that adaptively reflects the information of data. The ordinary PCA is useful for dimension reduction and identifying important features of multivariate data. However, it uses the second moment of data only, and consequently, it is not efficient for analyzing real observations in the case that these are skewed or asymmetric data. To extend the scope of PCA to non-Gaussian distributed data that cannot be well represented by the second moment, a new approach for PCA is proposed. The core of the methodology is to use a composite asymmetric Huber function defined as a weighted linear combination of modified Huber loss functions, which replaces the conventional square loss function. A practical algorithm to implement the data-adaptive PCA is discussed. Results from numerical studies including simulation study and real data analysis demonstrate the promising empirical properties of the proposed approach. Supplementary materials for this article are available online.</p
Ensemble clustering for step data via binning
This paper considers the clustering problem of physical step count data recorded on wearable devices. Clustering step data give an insight into an individual's activity status and further provide the groundwork for health-related policies. However, classical methods, such asK-means clustering and hierarchical clustering, are not suitable for step count data that are typically high-dimensional and zero-inflated. This paper presents a new clustering method for step data based on a novel combination of ensemble clustering and binning. We first construct multiple sets of binned data by changing the size and starting position of the bin, and then merge the clustering results from the binned data using a voting method. The advantage of binning, as a critical component, is that it substantially reduces the dimension of the original data while preserving the essential characteristics of the data. As a result, combining clustering results from multiple binned data can provide an improved clustering result that reflects both local and global structures of the data. Simulation studies and real data analysis were carried out to evaluate the empirical performance of the proposed method and demonstrate its general utility.N
- โฆ