Search CORE

9 research outputs found

Spline-Lasso in High-Dimensional Linear Regression

Author: Bing-Yi Jing (290313)
Jianchang Hu (688693)
Jianhua Guo (55282)
Zhen Zhang (86004)
Publication venue
Publication date: 02/01/2016
Field of study

We consider a high-dimensional linear regression problem, where the covariates (features) are ordered in some meaningful way, and the number of covariates p can be much larger than the sample size n. The fused lasso of Tibshirani et al. is designed especially to tackle this type of problems; it yields sparse coefficients and selects grouped variables, and encourages local constant coefficient profile within each group. However, in some applications, the effects of different features within a group might be different and change smoothly. In this article, we propose a new spline-lasso or more generally, spline-MCP to better capture the different effects within the group. The newly proposed method is very easy to implement since it can be easily turned into a lasso or MCP problem. Simulations show that the method works very effectively both in feature selection and prediction accuracy. A real application is also given to illustrate the benefits of the method. Supplementary materials for this article are available online.</p

Crossref

ScholarBank@NUS

FigShare

On SURE-Type Double Shrinkage Estimation

Author: Bing-Yi Jing (290313)
Guangming Pan (839398)
Wang Zhou (324766)
Zhouping Li (3633625)
Publication venue
Publication date
Field of study

The article is concerned with empirical Bayes shrinkage estimators for the heteroscedastic hierarchical normal model using Stein's unbiased estimate of risk (SURE). Recently, Xie, Kou, and Brown proposed a class of estimators for this type of problems and established their asymptotic optimality properties under the assumption of known but unequal variances. In this article, we consider this problem with unequal and unknown variances, which may be more appropriate in real situations. By placing priors for both means and variances, we propose novel SURE-type double shrinkage estimators that shrink both means and variances. Optimal properties for these estimators are derived under certain regularity conditions. Extensive simulation studies are conducted to compare the newly developed methods with other shrinkage techniques. Finally, the methods are applied to the well-known baseball dataset and a gene expression dataset. Supplementary materials for this article are available online.</p

FigShare

Comparison of the performance of different peak picking methods.

Author: Ahmed Abbas (290309)
Bing-Yi Jing (290313)
Xin Gao (14001)
Xin-Bing Kong (290311)
Zhi Liu (165121)
Publication venue
Publication date
Field of study

Rec stands for recall values and Pre stands for precision values. The recall and the precision values of PICKY and WaVPeak are taken from <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0053112#pone.0053112-Liu1" target="_blank">[1]</a>. B-H (WaVPeak) is the WaVPeak peaks selected by the proposed B-H algorithm. Consensus () is the consensus of WaVPeak and PICKY by simply considering the top peaks from each method. B-H (Consensus) is the consensus of WaVPeak and PICKY by considering the top peaks that are determined by the proposed B-H algorithm. All the values are given as percentage.</p

FigShare

Comparison of the missing peak rate of the fixed number-based method () and the Benjamini-Hochberg (B–H) algorithm with on the 32 spectra of the eight proteins in the benchmark set picked by PICKY.

Author: Ahmed Abbas (290309)
Bing-Yi Jing (290313)
Xin Gao (14001)
Xin-Bing Kong (290311)
Zhi Liu (165121)
Publication venue
Publication date
Field of study

Column is the relative improvement of the missing peak rate of B-H over . All values except the last two rows are the missing peak rates. The “” row lists the standard deviations of the missing peak rates for the corresponding columns, which demonstrates the robustness of different methods. The last row gives the average precision values. All values are given in percentage.</p

FigShare

Precision-recall curves for different peak picking methods and sensitivity analysis of B-H WaVPeak.

Author: Ahmed Abbas (290309)
Bing-Yi Jing (290313)
Xin Gao (14001)
Xin-Bing Kong (290311)
Zhi Liu (165121)
Publication venue
Publication date
Field of study

(a)–(e): precision-recall curves for different methods on 15N-HSQC, HNCO, HNCA, CBCA(CO)NH and NHCACB, respectively. The solid black curves are for B-H consensus method; the dashed black curves are for the 1.5 consensus method; the solid cyan curves are for B-H WaVPeak; the dashed cyan curves are for the original WaVPeak; the solid magenta curves are for B-H PICKY; and the dashed magenta curves are for the original PICKY. The relative area under curve (AUC) values are in legends, which are the area under curve over the total area of recall at least 0.7. (f): sensitivity analysis for different number of peaks. The precision and recall values of B-H WaVPeak are shown when , , and top peaks are used to calculate the p-values.</p

FigShare

Comparison of the missing peak rate of the fixed number-based method () and the Benjamini-Hochberg (B-H) algorithm with on the 32 spectra of the eight proteins in the benchmark dataset as picked by WaVPeak.

Author: Ahmed Abbas (290309)
Bing-Yi Jing (290313)
Xin Gao (14001)
Xin-Bing Kong (290311)
Zhi Liu (165121)
Publication venue
Publication date
Field of study

Column is the relative improvement of the missing peak rate of B-H over . All values except the last two rows are the missing peak rates. The “” row lists the standard deviations of the missing peak rates for the corresponding columns, demonstrating the robustness of different methods. The last row is the average precision value. All values are given in percentage.</p

FigShare

Illustration of the Benjamini-Hochberg procedure.

Author: Ahmed Abbas (290309)
Bing-Yi Jing (290313)
Xin Gao (14001)
Xin-Bing Kong (290311)
Zhi Liu (165121)
Publication venue
Publication date
Field of study

In this example, the number of hypotheses () is 10 and the false discovery proportion () is 0.2. The largest index of the hypotheses that is below the line is 6 (). Therefore, the first six hypotheses are rejected as the predicted peaks.</p

FigShare

Original intensity curves and the corresponding p-value curves.

Author: Ahmed Abbas (290309)
Bing-Yi Jing (290313)
Xin Gao (14001)
Xin-Bing Kong (290311)
Zhi Liu (165121)
Publication venue
Publication date
Field of study

(a) and (d): sorted intensity curve (a) and the corresponding p-value curve (d) of peaks predicted by PICKY on the 2D 15N-HSQC spectrum of the protein TM1112; (b) and (e): sorted intensity curve (b) and the corresponding p-value curve (e) of peaks predicted by PICKY on the 3D HNCO spectrum of the protein COILIN; (c) and (f): sorted intensity curve (c) and the corresponding p-value curve (f) of peaks predicted by PICKY on the 3D CBCA(CO)NH spectrum of the protein RP3384. In these figures, true peaks are shown in black and false ones are shown in cyan. In (d), (e) and (f), the decision boundaries of and the B-H procedure are shown in black and magenta, respectively.</p

FigShare