Search CORE

176 research outputs found

Density plot for correlation scores using Jones-Taylor-Thornton matrix for common interacting and non-interacting protein pairs from Dataset 1 (A) and the corresponding ROC plot (B).

Author: Eric Jakobsson (7156)
Hua Zhou (50017)
Publication venue
Publication date
Field of study

Density plot for correlation scores using Jones-Taylor-Thornton matrix for common interacting and non-interacting protein pairs from Dataset 1 (A) and the corresponding ROC plot (B).</p

FigShare

Correlation density plot for interacting (A) and non-interacting (B) Protein pairs of different evolutionary span from Dataset 3.

Author: Eric Jakobsson (7156)
Hua Zhou (50017)
Publication venue
Publication date
Field of study

In this plot we separately consider the protein pairs that are conserved only in chordates, the pairs that are conserved across the metazoan but not elsewhere in the eukaryotes, and finally the protein pairs that are distributed across the eukaryotes beyond the metazoan. C). The corresponding ROC plots for the correlation analysis for these 3 different sub-datasets.</p

FigShare

Matthews correlation coefficient (MCC) vs. choice of binary classification threshold for Datasets 1, 2, 3.

Author: Eric Jakobsson (7156)
Hua Zhou (50017)
Publication venue
Publication date
Field of study

It is seen that there is a much higher and more distinct peak for Dataset 1, supporting the inference derived from the relative AUC scores (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0081100#pone-0081100-g001" target="_blank">Figures 1</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0081100#pone-0081100-g002" target="_blank">2</a>, and <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0081100#pone-0081100-g003" target="_blank">3</a>) that the Dataset 1 provides the best differentiation between the interacting and non-interacting pairs.</p

FigShare

Protein sequences' within ortholog set degree of conservation (mean pairwise fraction identity for all orthologs in each set) vs. protein pairs correlation score for Dataset 1.

Author: Eric Jakobsson (7156)
Hua Zhou (50017)
Publication venue
Publication date
Field of study

A). Scatter plots of degree of conservation vs. protein pairs correlation score for interacting protein pairs. B). Scatter plots of degree of conservation vs. protein pairs correlation score for non-interacting protein pairs. C). Mean degree of conservation vs. protein pairs correlation score for interacting pairs with standard deviation as error bar. D). Mean degree of conservation vs. protein pairs correlation score for interacting pairs with standard deviation as error bar.</p

FigShare

Plot of sensitivity, specificity, and MCC vs. threshold for binary classification using Dataset 1.

Author: Eric Jakobsson (7156)
Hua Zhou (50017)
Publication venue
Publication date
Field of study

It is seen that the peak of the MCC (dashed vertical line) occurs in this case where the specificity is somewhat larger than the sensitivity. A user may wish to use a threshold either larger or smaller than the position of the peak of the MCC, depending on whether specificity or sensitivity is more highly valued.</p

FigShare

Average correlation vs. evolutionary span for Dataset 3.

Author: Eric Jakobsson (7156)
Hua Zhou (50017)
Publication venue
Publication date
Field of study

A). Interacting protein pairs. B). Non-interacting protein pairs. The evolutionary span is defined as the time since last common ancestor for the most distantly related species in the data subset. Correlation scores are mean values for each different evolutionary span, error bar shown as the standard deviation of the correlation scores within respective correlation score range. Range of conservation is defined by the range of the relevant OMA orthology sets. Time since last common ancestor is derived from the TimeTree database <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0081100#pone.0081100-Hedges1" target="_blank">[35]</a>. It is seen that the mean score is lower and the standard deviation is larger for data subsets that contain only closely related species.</p

FigShare

ConvexLAR: An Extension of Least Angle Regression

Author: Hua Zhou (50017)
Wei Xiao (16583)
Yichao Wu (259129)
Publication venue
Publication date: 01/07/2015
Field of study

The least angle regression (LAR) was proposed by Efron, Hastie, Johnstone and Tibshirani in the year 2004 for continuous model selection in linear regression. It is motivated by a geometric argument and tracks a path along which the predictors enter successively and the active predictors always maintain the same absolute correlation (angle) with the residual vector. Although it gains popularity quickly, its extensions seem rare compared to the penalty methods. In this expository article, we show that the powerful geometric idea of LAR can be generalized in a fruitful way. We propose a ConvexLAR algorithm that works for any convex loss function and naturally extends to group selection and data adaptive variable selection. After simple modification, it also yields new exact path algorithms for certain penalty methods such as a convex loss function with lasso or group lasso penalty. Variable selection in recurrent event and panel count data analysis, Ada-Boost, and Gaussian graphical model is reconsidered from the ConvexLAR angle. Supplementary materials for this article are available online.</p

Crossref

PubMed Central

eScholarship - University of California

FigShare

Regression Models for Multivariate Count Data

Author: Hua Zhou (50017)
Jin Zhou (90734)
Wei Sun (93580)
Yiwen Zhang (512835)
Publication venue
Publication date: 01/01/2017
Field of study

Data with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of overdispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts. Current literature lacks a treatment of these models, partly because they do not belong to the natural exponential family. We study the estimation, testing, and variable selection for these models in a unifying framework. The regression models are compared on both synthetic and real RNA-seq data. Supplementary materials for this article are available online.</p

PubMed Central

eScholarship - University of California

FigShare

Systems Design, Modeling, and Thermoeconomic Analysis of Azeotropic Distillation Processes for Organic Waste Treatment and Recovery in Nylon Plants

Author: Fengqi You (1360893)
Hua Zhou (50017)
Yintian Cai (5043620)
Publication venue
Publication date
Field of study

Nylon-6 and nylon-6,6 processes produce considerable amount of organic waste (known as light oil) consisting of n-pentanol, cyclohexanone, and cyclohexene oxide, which are difficult to separate and recover. This Article proposes six novel process designs to separate the light oil into three value-added products based on azeotropic distillation using water as an entrainer. These azeotropic distillation process designs take into account direct sequence, indirect sequence, thermal coupled column, and three types of dividing wall columns (dividing wall at the top, bottom, and middle of columns, respectively) for entrainer recovery. A conventional distillation process design for separation of the same light oil is also modeled and analyzed for comparison. High-fidelity process simulations are performed for each of the seven process designs in Aspen Plus. We further conduct exergy analyses and technoeconomic analyses to evaluate and compare the exergy efficiencies and economic performances of these seven process designs. The results indicate that the proposed azeotropic distillation process design with dividing wall at the middle of the column has the best performance in terms of both exergy efficiency and total annual cost

FigShare

An Algorithm for Generating Individualized Treatment Decision Trees and Random Forests

Author: Haoda Fu (5039792)
Hua Zhou (50017)
Jin Zhou (90734)
Kevin Doubleday (5039795)
Publication venue
Publication date: 29/03/2018
Field of study

With new treatments and novel technology available, precision medicine has become a key topic in the new era of healthcare. Traditional statistical methods for precision medicine focus on subgroup discovery through identifying interactions between a few markers and treatment regimes. However, given the large scale and high dimensionality of modern datasets, it is difficult to detect the interactions between treatment and high-dimensional covariates. Recently, novel approaches have emerged that seek to directly estimate individualized treatment rules (ITR) via maximizing the expected clinical reward by using, for example, support vector machines (SVM) or decision trees. The latter enjoys great popularity in clinical practice due to its interpretability. In this article, we propose a new reward function and a novel decision tree algorithm to directly maximize rewards. We further improve a single tree decision rule by an ensemble decision tree algorithm, ITR random forests. Our final decision rule is an average over single decision trees and it is a soft probability rather than a hard choice. Depending on how strong the treatment recommendation is, physicians can make decisions based on our model along with their own judgment and experience. Performance of ITR forest and tree methods is assessed through simulations along with applications to a randomized controlled trial (RCT) of 1385 patients with diabetes and an EMR cohort of 5177 patients with diabetes. ITR forest and tree methods are implemented using statistical software R (<a href="https://github.com/kdoub5ha/ITR.Forest" target="_blank">https://github.com/kdoub5ha/ITR.Forest</a>). Supplementary materials for this article are available online.</p

Crossref

eScholarship - University of California

FigShare