7 research outputs found

    Penalized Cluster Analysis With Applications to Family Data

    No full text
    Cluster analysis is the assignment of observations into clusters so that observations in the same cluster are similar in some sense, and many clustering methods have been developed. However, these methods cannot be applied to family data, which possess intrinsic familial structure. To take the familial structure into account, we propose a form of penalized cluster analysis with a tuning parameter controlling its influence. The tuning parameter can be selected based on the concept of clustering stability. The method can also be applied to other cluster data such as panel data. The method is illustrated via simulations and an application to a family study of asthma

    Analysis of presence-only data via semi-supervised learning approaches

    No full text
    Presence-only data occur in classification, which consist of a sample of observations from presence class and a large number of background observations with unknown presence/absence. Since absence data are generally unavailable, conventional semisupervised learning approaches are no longer appropriate as they tend to degenerate and assign all observations to presence class. In this article, we propose a generalized class balance constraint, which can be equipped with semi-supervised learning approaches to prevent them from degeneration. Furthermore, to circumvent the difficulty of model tuning with presence-only data, a selection criterion based on classification stability is developed, which measures the robustness of any given classification algorithm against the sampling randomness. The effectiveness of the proposed approach is demonstrated through a variety of simulated examples, along with an application to gene function prediction

    Selection of the number of clusters via the bootstrap method

    No full text
    Here the problem of selecting the number of clusters in cluster analysis is considered. Recently, the concept of clustering stability, which measures the robustness of any given clustering algorithm, has been utilized in Wang (2010) for selecting the number of clusters through cross validation. In this manuscript, an estimation scheme for clustering instability is developed based on the bootstrap, and then the number of clusters is selected so that the corresponding estimated clustering instability is minimized. The proposed selection criterion’s effectiveness is demonstrated on simulations and real examples

    Regularized k-means clustering of high-dimensional data and its asymptotic consistency

    No full text
    K-means clustering is a widely used tool for cluster analysis due to its conceptual simplicity and computational efficiency. However, its performance can be distorted when clustering high-dimensional data where the number of variables becomes relatively large and many of them may contain no information about the clustering structure. This article proposes a high-dimensional cluster analysis method via regularized k-means clus- tering, which can simultaneously cluster similar observations and eliminate redundant variables. The key idea is to formulate the k-means clustering in a form of regularization, with an adaptive group lasso penalty term on cluster centers. In order to optimally balance the trade-off between the clustering model fitting and sparsity, a selection criterion based on clustering stabil- ity is developed. The asymptotic estimation and selection consistency of the regularized k-means clustering with diverging dimension is established. The effectiveness of the regularized k-means clustering is also demonstrated through a variety of numerical experiments as well as applications to two gene microarray examples. The regularized clustering framework can also be extended to the general model-based clustering

    Backbone Degradable <i>N</i>‑(2-Hydroxypropyl)methacrylamide Copolymer Conjugates with Gemcitabine and Paclitaxel: Impact of Molecular Weight on Activity toward Human Ovarian Carcinoma Xenografts

    No full text
    Degradable diblock and multiblock (tetrablock and hexablock) <i>N</i>-(2-hydroxypropyl)­methacrylamide (HPMA) copolymer–gemcitabine (GEM) and −paclitaxel (PTX) conjugates were synthesized by reversible addition–fragmentation chain-transter (RAFT) copolymerization followed by click reaction for preclinical investigation. The aim was to validate the hypothesis that long-circulating conjugates are needed to generate a sustained concentration gradient between vasculature and a solid tumor and result in significant anticancer effect. To evaluate the impact of molecular weight of the conjugates on treatment efficacy, diblock, tetrablock, and hexablock GEM and PTX conjugates were administered intravenously to nude mice bearing A2780 human ovarian xenografts. For GEM conjugates, triple doses with dosage 5 mg/kg were given on days 0, 7, and 14 (q7dx3), whereas a single dose regime with 20 mg/kg was applied on day 0 for PTX conjugates treatment. The most effective conjugates for each monotherapy were the diblock ones, 2P–GEM and 2P–PTX (Mw ≈ 100 kDa). Increasing the Mw to 200 or 300 kDa resulted in decrease of activity most probably due to changes in the conformation of the macromolecule because of interaction of hydrophobic residues at side chain termini and formation of “unimer micelles”. In addition to monotherapy, a sequential combination treatment of diblock PTX conjugate followed by GEM conjugate (2P–PTX/2P–GEM) was also performed, which showed the best tumor growth inhibition due to synergistic effect: complete remission was achieved after the first treatment cycle. However, because of low dose applied, tumor recurrence was observed 2 weeks after cease of treatment. To assess optimal route of administration, intraperitoneal (i.p.) application of 2P–GEM, 2P–PTX, and their combination was examined. The fact that the highest anticancer efficiency was achieved with diblock conjugates that can be synthesized in one scalable step bodes well for the translation into clinics
    corecore