9 research outputs found

    Spline-Lasso in High-Dimensional Linear Regression

    No full text
    <p>We consider a high-dimensional linear regression problem, where the covariates (features) are ordered in some meaningful way, and the number of covariates <i>p</i> can be much larger than the sample size <i>n</i>. The fused lasso of Tibshirani et al. is designed especially to tackle this type of problems; it yields sparse coefficients and selects grouped variables, and encourages local constant coefficient profile within each group. However, in some applications, the effects of different features within a group might be different and change smoothly. In this article, we propose a new spline-lasso or more generally, spline-MCP to better capture the different effects within the group. The newly proposed method is very easy to implement since it can be easily turned into a lasso or MCP problem. Simulations show that the method works very effectively both in feature selection and prediction accuracy. A real application is also given to illustrate the benefits of the method. Supplementary materials for this article are available online.</p

    On SURE-Type Double Shrinkage Estimation

    No full text
    <p>The article is concerned with empirical Bayes shrinkage estimators for the heteroscedastic hierarchical normal model using Stein's unbiased estimate of risk (SURE). Recently, Xie, Kou, and Brown proposed a class of estimators for this type of problems and established their asymptotic optimality properties under the assumption of known but unequal variances. In this article, we consider this problem with unequal and unknown variances, which may be more appropriate in real situations. By placing priors for both means and variances, we propose novel SURE-type double shrinkage estimators that shrink both means and variances. Optimal properties for these estimators are derived under certain regularity conditions. Extensive simulation studies are conducted to compare the newly developed methods with other shrinkage techniques. Finally, the methods are applied to the well-known baseball dataset and a gene expression dataset. Supplementary materials for this article are available online.</p

    Comparison of the performance of different peak picking methods.

    No full text
    <p><i>Rec</i> stands for recall values and <i>Pre</i> stands for precision values. The recall and the precision values of PICKY and WaVPeak are taken from <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0053112#pone.0053112-Liu1" target="_blank">[1]</a>. B-H (WaVPeak) is the WaVPeak peaks selected by the proposed B-H algorithm. Consensus () is the consensus of WaVPeak and PICKY by simply considering the top peaks from each method. B-H (Consensus) is the consensus of WaVPeak and PICKY by considering the top peaks that are determined by the proposed B-H algorithm. All the values are given as percentage.</p

    Comparison of the missing peak rate of the fixed number-based method () and the Benjamini-Hochberg (B–H) algorithm with on the 32 spectra of the eight proteins in the benchmark set picked by PICKY.

    No full text
    <p>Column is the relative improvement of the missing peak rate of B-H over . All values except the last two rows are the missing peak rates. The “” row lists the standard deviations of the missing peak rates for the corresponding columns, which demonstrates the robustness of different methods. The last row gives the average precision values. All values are given in percentage.</p

    Precision-recall curves for different peak picking methods and sensitivity analysis of B-H WaVPeak.

    No full text
    <p>(a)–(e): precision-recall curves for different methods on <sup>15</sup>N-HSQC, HNCO, HNCA, CBCA(CO)NH and NHCACB, respectively. The solid black curves are for B-H consensus method; the dashed black curves are for the 1.5 consensus method; the solid cyan curves are for B-H WaVPeak; the dashed cyan curves are for the original WaVPeak; the solid magenta curves are for B-H PICKY; and the dashed magenta curves are for the original PICKY. The relative area under curve (AUC) values are in legends, which are the area under curve over the total area of recall at least 0.7. (f): sensitivity analysis for different number of peaks. The precision and recall values of B-H WaVPeak are shown when , , and top peaks are used to calculate the p-values.</p

    Comparison of the missing peak rate of the fixed number-based method () and the Benjamini-Hochberg (B-H) algorithm with on the 32 spectra of the eight proteins in the benchmark dataset as picked by WaVPeak.

    No full text
    <p>Column is the relative improvement of the missing peak rate of B-H over . All values except the last two rows are the missing peak rates. The “” row lists the standard deviations of the missing peak rates for the corresponding columns, demonstrating the robustness of different methods. The last row is the average precision value. All values are given in percentage.</p

    Illustration of the Benjamini-Hochberg procedure.

    No full text
    <p>In this example, the number of hypotheses () is 10 and the false discovery proportion () is 0.2. The largest index of the hypotheses that is below the line is 6 (). Therefore, the first six hypotheses are rejected as the predicted peaks.</p

    Original intensity curves and the corresponding p-value curves.

    No full text
    <p>(a) and (d): sorted intensity curve (a) and the corresponding p-value curve (d) of peaks predicted by PICKY on the 2D <sup>15</sup>N-HSQC spectrum of the protein TM1112; (b) and (e): sorted intensity curve (b) and the corresponding p-value curve (e) of peaks predicted by PICKY on the 3D HNCO spectrum of the protein COILIN; (c) and (f): sorted intensity curve (c) and the corresponding p-value curve (f) of peaks predicted by PICKY on the 3D CBCA(CO)NH spectrum of the protein RP3384. In these figures, true peaks are shown in black and false ones are shown in cyan. In (d), (e) and (f), the decision boundaries of and the B-H procedure are shown in black and magenta, respectively.</p
    corecore