177 research outputs found

    Correlation density plot for interacting (A) and non-interacting (B) Protein pairs of different evolutionary span from Dataset 3.

    No full text
    <p>In this plot we separately consider the protein pairs that are conserved only in chordates, the pairs that are conserved across the metazoan but not elsewhere in the eukaryotes, and finally the protein pairs that are distributed across the eukaryotes beyond the metazoan. C). The corresponding ROC plots for the correlation analysis for these 3 different sub-datasets.</p

    Density plot for correlation scores using Jones-Taylor-Thornton matrix for common interacting and non-interacting protein pairs from Dataset 1 (A) and the corresponding ROC plot (B).

    No full text
    <p>Density plot for correlation scores using Jones-Taylor-Thornton matrix for common interacting and non-interacting protein pairs from Dataset 1 (A) and the corresponding ROC plot (B).</p

    Matthews correlation coefficient (MCC) vs. choice of binary classification threshold for Datasets 1, 2, 3.

    No full text
    <p>It is seen that there is a much higher and more distinct peak for Dataset 1, supporting the inference derived from the relative AUC scores (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0081100#pone-0081100-g001" target="_blank">Figures 1</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0081100#pone-0081100-g002" target="_blank">2</a>, and <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0081100#pone-0081100-g003" target="_blank">3</a>) that the Dataset 1 provides the best differentiation between the interacting and non-interacting pairs.</p

    Protein sequences' within ortholog set degree of conservation (mean pairwise fraction identity for all orthologs in each set) vs. protein pairs correlation score for Dataset 1.

    No full text
    <p>A). Scatter plots of degree of conservation vs. protein pairs correlation score for interacting protein pairs. B). Scatter plots of degree of conservation vs. protein pairs correlation score for non-interacting protein pairs. C). Mean degree of conservation vs. protein pairs correlation score for interacting pairs with standard deviation as error bar. D). Mean degree of conservation vs. protein pairs correlation score for interacting pairs with standard deviation as error bar.</p

    Plot of sensitivity, specificity, and MCC vs. threshold for binary classification using Dataset 1.

    No full text
    <p>It is seen that the peak of the MCC (dashed vertical line) occurs in this case where the specificity is somewhat larger than the sensitivity. A user may wish to use a threshold either larger or smaller than the position of the peak of the MCC, depending on whether specificity or sensitivity is more highly valued.</p

    Average correlation vs. evolutionary span for Dataset 3.

    No full text
    <p>A). Interacting protein pairs. B). Non-interacting protein pairs. The evolutionary span is defined as the time since last common ancestor for the most distantly related species in the data subset. Correlation scores are mean values for each different evolutionary span, error bar shown as the standard deviation of the correlation scores within respective correlation score range. Range of conservation is defined by the range of the relevant OMA orthology sets. Time since last common ancestor is derived from the TimeTree database <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0081100#pone.0081100-Hedges1" target="_blank">[35]</a>. It is seen that the mean score is lower and the standard deviation is larger for data subsets that contain only closely related species.</p

    ConvexLAR: An Extension of Least Angle Regression

    No full text
    <p>The least angle regression (LAR) was proposed by Efron, Hastie, Johnstone and Tibshirani in the year 2004 for continuous model selection in linear regression. It is motivated by a geometric argument and tracks a path along which the predictors enter successively and the active predictors always maintain the same absolute correlation (angle) with the residual vector. Although it gains popularity quickly, its extensions seem rare compared to the penalty methods. In this expository article, we show that the powerful geometric idea of LAR can be generalized in a fruitful way. We propose a ConvexLAR algorithm that works for any convex loss function and naturally extends to group selection and data adaptive variable selection. After simple modification, it also yields new exact path algorithms for certain penalty methods such as a convex loss function with lasso or group lasso penalty. Variable selection in recurrent event and panel count data analysis, Ada-Boost, and Gaussian graphical model is reconsidered from the ConvexLAR angle. Supplementary materials for this article are available online.</p

    Regression Models for Multivariate Count Data

    No full text
    <p>Data with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of overdispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts. Current literature lacks a treatment of these models, partly because they do not belong to the natural exponential family. We study the estimation, testing, and variable selection for these models in a unifying framework. The regression models are compared on both synthetic and real RNA-seq data. Supplementary materials for this article are available online.</p

    Systems Design, Modeling, and Thermoeconomic Analysis of Azeotropic Distillation Processes for Organic Waste Treatment and Recovery in Nylon Plants

    No full text
    Nylon-6 and nylon-6,6 processes produce considerable amount of organic waste (known as light oil) consisting of <i>n</i>-pentanol, cyclohexanone, and cyclohexene oxide, which are difficult to separate and recover. This Article proposes six novel process designs to separate the light oil into three value-added products based on azeotropic distillation using water as an entrainer. These azeotropic distillation process designs take into account direct sequence, indirect sequence, thermal coupled column, and three types of dividing wall columns (dividing wall at the top, bottom, and middle of columns, respectively) for entrainer recovery. A conventional distillation process design for separation of the same light oil is also modeled and analyzed for comparison. High-fidelity process simulations are performed for each of the seven process designs in Aspen Plus. We further conduct exergy analyses and technoeconomic analyses to evaluate and compare the exergy efficiencies and economic performances of these seven process designs. The results indicate that the proposed azeotropic distillation process design with dividing wall at the middle of the column has the best performance in terms of both exergy efficiency and total annual cost

    An Algorithm for Generating Individualized Treatment Decision Trees and Random Forests

    No full text
    <p>With new treatments and novel technology available, precision medicine has become a key topic in the new era of healthcare. Traditional statistical methods for precision medicine focus on subgroup discovery through identifying interactions between a few markers and treatment regimes. However, given the large scale and high dimensionality of modern datasets, it is difficult to detect the interactions between treatment and high-dimensional covariates. Recently, novel approaches have emerged that seek to directly estimate individualized treatment rules (ITR) via maximizing the expected clinical reward by using, for example, support vector machines (SVM) or decision trees. The latter enjoys great popularity in clinical practice due to its interpretability. In this article, we propose a new reward function and a novel decision tree algorithm to directly maximize rewards. We further improve a single tree decision rule by an ensemble decision tree algorithm, ITR random forests. Our final decision rule is an average over single decision trees and it is a soft probability rather than a hard choice.   Depending on how strong the treatment recommendation is, physicians can make decisions based on our model along with their own judgment and experience.  Performance of ITR forest and tree methods is assessed through simulations along with applications to a randomized controlled trial (RCT) of 1385 patients with diabetes and an EMR cohort of 5177 patients with diabetes. ITR forest and tree methods are implemented using statistical software R (<i><a href="https://github.com/kdoub5ha/ITR.Forest" target="_blank">https://github.com/kdoub5ha/ITR.Forest</a></i>). Supplementary materials for this article are available online.</p
    • …
    corecore