152 research outputs found

    Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation

    Full text link
    While several papers have investigated computationally and statistically efficient methods for learning Gaussian mixtures, precise minimax bounds for their statistical performance as well as fundamental limits in high-dimensional settings are not well-understood. In this paper, we provide precise information theoretic bounds on the clustering accuracy and sample complexity of learning a mixture of two isotropic Gaussians in high dimensions under small mean separation. If there is a sparse subset of relevant dimensions that determine the mean separation, then the sample complexity only depends on the number of relevant dimensions and mean separation, and can be achieved by a simple computationally efficient procedure. Our results provide the first step of a theoretical basis for recent methods that combine feature selection and clustering

    Feature Selection For High-Dimensional Clustering

    Full text link
    We present a nonparametric method for selecting informative features in high-dimensional clustering problems. We start with a screening step that uses a test for multimodality. Then we apply kernel density estimation and mode clustering to the selected features. The output of the method consists of a list of relevant features, and cluster assignments. We provide explicit bounds on the error rate of the resulting clustering. In addition, we provide the first error bounds on mode based clustering.Comment: 11 pages, 2 figure

    Efficient Sparse Clustering of High-Dimensional Non-spherical Gaussian Mixtures

    Full text link
    We consider the problem of clustering data points in high dimensions, i.e. when the number of data points may be much smaller than the number of dimensions. Specifically, we consider a Gaussian mixture model (GMM) with non-spherical Gaussian components, where the clusters are distinguished by only a few relevant dimensions. The method we propose is a combination of a recent approach for learning parameters of a Gaussian mixture model and sparse linear discriminant analysis (LDA). In addition to cluster assignments, the method returns an estimate of the set of features relevant for clustering. Our results indicate that the sample complexity of clustering depends on the sparsity of the relevant feature set, while only scaling logarithmically with the ambient dimension. Additionally, we require much milder assumptions than existing work on clustering in high dimensions. In particular, we do not require spherical clusters nor necessitate mean separation along relevant dimensions.Comment: 11 pages, 1 figur

    Density-sensitive semisupervised inference

    Full text link
    Semisupervised methods are techniques for using labeled data (X1,Y1),…,(Xn,Yn)(X_1,Y_1),\ldots,(X_n,Y_n) together with unlabeled data Xn+1,…,XNX_{n+1},\ldots,X_N to make predictions. These methods invoke some assumptions that link the marginal distribution PXP_X of X to the regression function f(x). For example, it is common to assume that f is very smooth over high density regions of PXP_X. Many of the methods are ad-hoc and have been shown to work in specific examples but are lacking a theoretical foundation. We provide a minimax framework for analyzing semisupervised methods. In particular, we study methods based on metrics that are sensitive to the distribution PXP_X. Our model includes a parameter α\alpha that controls the strength of the semisupervised assumption. We then use the data to adapt to α\alpha.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1092 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Mono-ubiquitination mediated regulation of KMT5A and its role in prostate cancer

    Get PDF
    PhD ThesisAbstract: Prostate cancer (PC) is the most common cancer and the second cause of cancer related death in men. Central to this, is the role of the androgen receptor (AR) which acts as a transcription factor, regulating the expression of genes required for normal prostate growth and cancer development. Consequently, the AR remains the primary target for therapeutic intervention. However, these treatments become ineffective, resulting in castrate resistant prostate cancer (CRPC) which generally retains AR expression. The AR interacts with several co-regulatory proteins which can perturb AR-targeted therapies in CRPC. Targeting these co-regulatory proteins to indirectly target the AR signalling cascade may prove beneficial. Recently, our group identified KMT5A as a potential regulator of AR through selective siRNA library screening. KMT5A is a lysine methyltransferase that mono-methylates histone 4 lysine 20 and non-histone proteins, including p53. Using a relevant in vitro CRPC model it was shown that KMT5A acquires AR co-activator activity which is in contrast to androgen sensitive models where KMT5A co-represses AR activity. This highlights the importance of studying KMT5A regulation. KMT5A protein levels are tightly regulated by multiple E3 ligases for cell cycle-dependent poly-ubiquitination-mediated degradation. KMT5A poly-ubiquitination by E3 ligases SCFβ-TRCF, CRL4Cdt2 and APCCdh1 promotes degradation in G1, S and late mitosis cell cycle phases, respectively. Moreover, the Skp2 E3 ligase has been suggested to play a role in KMT5A ubiquitination and degradation but direct supporting evidence is currently absent. Additionally, Skp2 is suggested to directly regulate the AR signaling pathway. It is also unknown whether KMT5A could be modified directly by ubiquitination without promoting its degradation. As such, we aimed to investigate KMT5A mono-ubiquitination and the role of Skp2 in regulating KMT5A as well as independently regulating the AR signaling cascade. Mono-ubiquitinated KMT5A was demonstrated in a panel of PC cell lines. Its existence was further confirmed by performing ubiquitination assays in COS7 cells. Furthermore, the KMT5A C-terminal SET domain was identified as the target for mono-ubiquitination. Moreover, mono-ubiquitinated KMT5A was highly enriched in S phase cells, coincident with extremely low levels of unmodified KMT5A. Mono-ubiquitinated KMT5A was exclusively cytoplasmic and its abundance was greatly enhanced by Skp2, but not associated with protein turnover. Together, this data suggests that cell cycle-dependent KMT5A mono-ubiquitination is an important mechanism to diminish nuclear, unmodified KMT5A levels to facilitate cell cycle progression. Thus, insight for the physiological significance of mono-ubiquitinated KMT5A ii may provide a novel therapeutic target to indirectly target the AR. Finally, Skp2 was not found to have a direct effect on AR signaling

    Multifactorial Linear Regression Method For Prediction Of Mountain Rivers Flow

    Full text link
    Long-term river flow forecasting methoods are used for the theoretical basis for the development of river basin water balance equation : the basins with high marks nascent river basin water balance should be organized according to higher zones . Spring flood water balance equation is used for the calculation of the measure or impossibillness to get through , so they are replaced by the flow and the approximate korelyatsion connections between the key factors , including the multifactorial flow prediction recommend using multifactorial linear regression method , which is believed to be the predicted size ( prediktant ) and the variables ( factors or prediktorner ) .There is a linear connection between the above mentioned method to calculate flow of the number of rivers of Armenia which are forecasts for 25 stations. The experimental and calculated results are about 80 %
    • …
    corecore