25 research outputs found

    Robust Variable and Interaction Selection for Logistic Regression and General Index Models

    No full text
    <p>Under the logistic regression framework, we propose a forward-backward method, SODA, for variable selection with both main and quadratic interaction terms. In the forward stage, SODA adds in predictors that have significant overall effects, whereas in the backward stage SODA removes unimportant terms to optimize the extended Bayesian information criterion (EBIC). Compared with existing methods for variable selection in quadratic discriminant analysis, SODA can deal with high-dimensional data in which the number of predictors is much larger than the sample size and does not require the joint normality assumption on predictors, leading to much enhanced robustness. We further extend SODA to conduct variable selection and model fitting for general index models. Compared with existing variable selection methods based on the sliced inverse regression (SIR), SODA requires neither linearity nor constant variance condition and is thus more robust. Our theoretical analysis establishes the variable-selection consistency of SODA under high-dimensional settings, and our simulation studies as well as real-data applications demonstrate superior performances of SODA in dealing with non-Gaussian design matrices in both logistic and general index models. Supplementary materials for this article are available online.</p

    Nonparametric <i>K</i>-Sample Tests via Dynamic Slicing

    No full text
    <div><p><i>K</i>-sample testing problems arise in many scientific applications and have attracted statisticians’ attention for many years. We propose an omnibus nonparametric method based on an optimal discretization (aka “slicing”) of continuous random variables in the test. The novelty of our approach lies in the inclusion of a term penalizing the number of slices (i.e., the resolution of the discretization) so as to regularize the corresponding likelihood-ratio test statistic. An efficient dynamic programming algorithm is developed to determine the optimal slicing scheme. Asymptotic and finite-sample properties such as power and null distribution of the resulting test statistic are studied. We compare the proposed testing method with some existing well-known methods and demonstrate its statistical power through extensive simulation studies as well as a real data example. A dynamic slicing method for the one-sample testing problem is further developed and studied under the same framework. Supplementary materials including technical derivations and proofs are available online.</p></div

    Benchmarking the performance of CLIC on three pathway databases.

    No full text
    <p>Leave-one-out cross-validation is shown for CORUM (A, D), KEGG (B, E) and GO (C, F) gene sets using 1774 mouse datasets from GEO. (A-C) Precision-recall curves show results based on CLIC and average correlation (AvCorr) using the GNFv3 tissue atlas. These plots highlight the utility of each of CLIC’s components: module-specific co-expression (CLIC vs CLIC no-partitioning), frequent co-expression (CLIC vs. CLIC GNFv3 tissue atlas), and specific co-expression (CLIC vs CLIC no-background). (D-F) Recall-rank curves show the recall (sensitivity) of different methods when looking at only top N predictions (N ranging 10–400). Results are shown for all gene sets, as well as for subsets with different CEM strength <i>ϕ</i> cut-offs, where n indicates the number of pathways used in generating the curves.</p

    C7orf55 regulates ATP synthase activity.

    No full text
    <p>(A) Confocal microscopy of HeLa cells expressing a mitochondria-targeted version of dsRed (mito-dsRed) immunolabeled with antibodies to endogenous C7orf55. (B) Protein immunoblot analysis of K562 cells depleted for <i>C7orf55</i> and/or expressing a CRISPR-resistant version of <i>C7orf55</i>. * denotes an aspecific band recognized by the C7orf55 antibody. (C) Blue-native PAGE analysis on the cells described in (B) before (top) and after (bottom) in-gel ATPase activity reaction. (D) C7orf55-FLAG immunoprecipitation and mass spectrometry analysis of co-immunoprecipitated proteins from two replicates. (E) Co-immunoprecipitation of C7orf55-FLAG and ATPAF2-V5.</p

    Ranunculus shinano-alpinus Ohwi

    No full text
    原著和名: タカネキンポウゲ科名: キンポウゲ科 = Ranunculaceae採集地: 富山県 白馬鑓ヶ岳 (越中 白馬鑓ヶ岳)採集日: 1978/8/22採集者: 古瀬 義整理番号: JH000107国立科学博物館整理番号: TNS-VS-95010

    Comparing performance between CLIC and other co-expression algorithms.

    No full text
    <p>Leave-one-out cross-validation results for CLIC and other 3 methods (SEEK, COXPRESdb and GeneFriends with microarray and RNA-seq data) are shown as Precision-Recall curves (A-C) as well as Recall-Rank curves that show the recall (sensitivity) of algorithms when considering the top N predictions (D-F). Results are shown for CORUM (A,D), KEGG (B,E), and GO (C,F). n indicates the number of pathways used in generating each curve.</p

    Functional predictions for uncharacterized human genes.

    No full text
    <p>349 uncharacterized human genes (<i>X</i>-axis) are ranked by the highest normalized LLR score received from any of the 910 CORUM, KEGG, or GO annotated gene sets. The <i>y</i>-axis shows the top LLR score, normalized by the size of the corresponding gene set. Inset table shows the top predictions. Arrowheads indicate existing literature support of functional association, and red text indicates new experimental validation.</p

    Schematic overview of CLIC.

    No full text
    <p>CLIC partitions an input Query gene set into co-expressed modules (CEMs), assigns weight to each dataset according to the intra-correlation of each module relative to background, and then predicts additional genes co-expressed with each CEM in high-weight datasets. CLIC inputs a compendium of <i>D</i> microarray data sets (e.g. from GEO) and an input Query gene set. In the Partition step, input genes are partitioned into distinct CEMs (in this example, CEM 1 in red, CEM 2 in orange), using a Bayesian partition model to simultaneously infer the number of CEMs and assign weights to datasets. Dataset weights quantify the significance of each intra-CEM correlation compared to the background distribution of correlation in each dataset (gray density curves). Genes from the input set that are not assigned to any CEM are assigned to a “Null” cluster. In the Expansion step, each CEM is expanded by identifying additional genes that show higher co-expression with the CEM genes compared to the gene-specific background distribution, scored by the log-likelihood ratio (LLR).</p
    corecore