Search CORE

152 research outputs found

Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation

Author: Azizyan Martin
Singh Aarti
Wasserman Larry
Publication venue
Publication date: 09/06/2013
Field of study

While several papers have investigated computationally and statistically efficient methods for learning Gaussian mixtures, precise minimax bounds for their statistical performance as well as fundamental limits in high-dimensional settings are not well-understood. In this paper, we provide precise information theoretic bounds on the clustering accuracy and sample complexity of learning a mixture of two isotropic Gaussians in high dimensions under small mean separation. If there is a sparse subset of relevant dimensions that determine the mean separation, then the sample complexity only depends on the number of relevant dimensions and mean separation, and can be achieved by a simple computationally efficient procedure. Our results provide the first step of a theoretical basis for recent methods that combine feature selection and clustering

arXiv.org e-Print Archive

CiteSeerX

Feature Selection For High-Dimensional Clustering

Author: Azizyan Martin
Singh Aarti
Wasserman Larry
Publication venue
Publication date: 09/06/2014
Field of study

We present a nonparametric method for selecting informative features in high-dimensional clustering problems. We start with a screening step that uses a test for multimodality. Then we apply kernel density estimation and mode clustering to the selected features. The output of the method consists of a list of relevant features, and cluster assignments. We provide explicit bounds on the error rate of the resulting clustering. In addition, we provide the first error bounds on mode based clustering.Comment: 11 pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

Efficient Sparse Clustering of High-Dimensional Non-spherical Gaussian Mixtures

Author: Azizyan Martin
Singh Aarti
Wasserman Larry
Publication venue
Publication date: 09/06/2014
Field of study

We consider the problem of clustering data points in high dimensions, i.e. when the number of data points may be much smaller than the number of dimensions. Specifically, we consider a Gaussian mixture model (GMM) with non-spherical Gaussian components, where the clusters are distinguished by only a few relevant dimensions. The method we propose is a combination of a recent approach for learning parameters of a Gaussian mixture model and sparse linear discriminant analysis (LDA). In addition to cluster assignments, the method returns an estimate of the set of features relevant for clustering. Our results indicate that the sample complexity of clustering depends on the sparsity of the relevant feature set, while only scaling logarithmically with the ambient dimension. Additionally, we require much milder assumptions than existing work on clustering in high dimensions. In particular, we do not require spherical clusters nor necessitate mean separation along relevant dimensions.Comment: 11 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Density-sensitive semisupervised inference

Author: Azizyan Martin
Singh Aarti
Wasserman Larry
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 24/05/2013
Field of study

Semisupervised methods are techniques for using labeled data

(X_1,Y_1),\ldots,(X_n,Y_n)

together with unlabeled data

X_{n+1},\ldots,X_N

to make predictions. These methods invoke some assumptions that link the marginal distribution

P_X

of X to the regression function f(x). For example, it is common to assume that f is very smooth over high density regions of

P_X

. Many of the methods are ad-hoc and have been shown to work in specific examples but are lacking a theoretical foundation. We provide a minimax framework for analyzing semisupervised methods. In particular, we study methods based on metrics that are sensitive to the distribution

P_X

. Our model includes a parameter

\alpha

that controls the strength of the semisupervised assumption. We then use the data to adapt to

\alpha

.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1092 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Mono-ubiquitination mediated regulation of KMT5A and its role in prostate cancer

Author: Azizyan Mahsa
Publication venue: Newcastle University
Publication date: 01/01/2017
Field of study

PhD ThesisAbstract: Prostate cancer (PC) is the most common cancer and the second cause of cancer related death in men. Central to this, is the role of the androgen receptor (AR) which acts as a transcription factor, regulating the expression of genes required for normal prostate growth and cancer development. Consequently, the AR remains the primary target for therapeutic intervention. However, these treatments become ineffective, resulting in castrate resistant prostate cancer (CRPC) which generally retains AR expression. The AR interacts with several co-regulatory proteins which can perturb AR-targeted therapies in CRPC. Targeting these co-regulatory proteins to indirectly target the AR signalling cascade may prove beneficial. Recently, our group identified KMT5A as a potential regulator of AR through selective siRNA library screening. KMT5A is a lysine methyltransferase that mono-methylates histone 4 lysine 20 and non-histone proteins, including p53. Using a relevant in vitro CRPC model it was shown that KMT5A acquires AR co-activator activity which is in contrast to androgen sensitive models where KMT5A co-represses AR activity. This highlights the importance of studying KMT5A regulation. KMT5A protein levels are tightly regulated by multiple E3 ligases for cell cycle-dependent poly-ubiquitination-mediated degradation. KMT5A poly-ubiquitination by E3 ligases SCFβ-TRCF, CRL4Cdt2 and APCCdh1 promotes degradation in G1, S and late mitosis cell cycle phases, respectively. Moreover, the Skp2 E3 ligase has been suggested to play a role in KMT5A ubiquitination and degradation but direct supporting evidence is currently absent. Additionally, Skp2 is suggested to directly regulate the AR signaling pathway. It is also unknown whether KMT5A could be modified directly by ubiquitination without promoting its degradation. As such, we aimed to investigate KMT5A mono-ubiquitination and the role of Skp2 in regulating KMT5A as well as independently regulating the AR signaling cascade. Mono-ubiquitinated KMT5A was demonstrated in a panel of PC cell lines. Its existence was further confirmed by performing ubiquitination assays in COS7 cells. Furthermore, the KMT5A C-terminal SET domain was identified as the target for mono-ubiquitination. Moreover, mono-ubiquitinated KMT5A was highly enriched in S phase cells, coincident with extremely low levels of unmodified KMT5A. Mono-ubiquitinated KMT5A was exclusively cytoplasmic and its abundance was greatly enhanced by Skp2, but not associated with protein turnover. Together, this data suggests that cell cycle-dependent KMT5A mono-ubiquitination is an important mechanism to diminish nuclear, unmodified KMT5A levels to facilitate cell cycle progression. Thus, insight for the physiological significance of mono-ubiquitinated KMT5A ii may provide a novel therapeutic target to indirectly target the AR. Finally, Skp2 was not found to have a direct effect on AR signaling

Newcastle University eTheses

Multifactorial Linear Regression Method For Prediction Of Mountain Rivers Flow

Author: Azizyan Levon
Sarukhanyan Arestak
Vardanyan Levon
Yeroyan Yelizaveta
Publication venue: CUNY Academic Works
Publication date: 01/08/2014
Field of study

Long-term river flow forecasting methoods are used for the theoretical basis for the development of river basin water balance equation : the basins with high marks nascent river basin water balance should be organized according to higher zones . Spring flood water balance equation is used for the calculation of the measure or impossibillness to get through , so they are replaced by the flow and the approximate korelyatsion connections between the key factors , including the multifactorial flow prediction recommend using multifactorial linear regression method , which is believed to be the predicted size ( prediktant ) and the variables ( factors or prediktorner ) .There is a linear connection between the above mentioned method to calculate flow of the number of rivers of Armenia which are forecasts for 25 stations. The experimental and calculated results are about 80 %

City University of New York