152 research outputs found
Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation
While several papers have investigated computationally and statistically
efficient methods for learning Gaussian mixtures, precise minimax bounds for
their statistical performance as well as fundamental limits in high-dimensional
settings are not well-understood. In this paper, we provide precise information
theoretic bounds on the clustering accuracy and sample complexity of learning a
mixture of two isotropic Gaussians in high dimensions under small mean
separation. If there is a sparse subset of relevant dimensions that determine
the mean separation, then the sample complexity only depends on the number of
relevant dimensions and mean separation, and can be achieved by a simple
computationally efficient procedure. Our results provide the first step of a
theoretical basis for recent methods that combine feature selection and
clustering
Feature Selection For High-Dimensional Clustering
We present a nonparametric method for selecting informative features in
high-dimensional clustering problems. We start with a screening step that uses
a test for multimodality. Then we apply kernel density estimation and mode
clustering to the selected features. The output of the method consists of a
list of relevant features, and cluster assignments. We provide explicit bounds
on the error rate of the resulting clustering. In addition, we provide the
first error bounds on mode based clustering.Comment: 11 pages, 2 figure
Efficient Sparse Clustering of High-Dimensional Non-spherical Gaussian Mixtures
We consider the problem of clustering data points in high dimensions, i.e.
when the number of data points may be much smaller than the number of
dimensions. Specifically, we consider a Gaussian mixture model (GMM) with
non-spherical Gaussian components, where the clusters are distinguished by only
a few relevant dimensions. The method we propose is a combination of a recent
approach for learning parameters of a Gaussian mixture model and sparse linear
discriminant analysis (LDA). In addition to cluster assignments, the method
returns an estimate of the set of features relevant for clustering. Our results
indicate that the sample complexity of clustering depends on the sparsity of
the relevant feature set, while only scaling logarithmically with the ambient
dimension. Additionally, we require much milder assumptions than existing work
on clustering in high dimensions. In particular, we do not require spherical
clusters nor necessitate mean separation along relevant dimensions.Comment: 11 pages, 1 figur
Density-sensitive semisupervised inference
Semisupervised methods are techniques for using labeled data
together with unlabeled data
to make predictions. These methods invoke some assumptions that link the
marginal distribution of X to the regression function f(x). For example,
it is common to assume that f is very smooth over high density regions of
. Many of the methods are ad-hoc and have been shown to work in specific
examples but are lacking a theoretical foundation. We provide a minimax
framework for analyzing semisupervised methods. In particular, we study methods
based on metrics that are sensitive to the distribution . Our model
includes a parameter that controls the strength of the semisupervised
assumption. We then use the data to adapt to .Comment: Published in at http://dx.doi.org/10.1214/13-AOS1092 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Mono-ubiquitination mediated regulation of KMT5A and its role in prostate cancer
PhD ThesisAbstract: Prostate cancer (PC) is the most common cancer and the second cause of cancer related death in men. Central to this, is the role of the androgen receptor (AR) which acts as a transcription factor, regulating the expression of genes required for normal prostate growth and cancer development. Consequently, the AR remains the primary target for therapeutic intervention. However, these treatments become ineffective, resulting in castrate resistant prostate cancer (CRPC) which generally retains AR expression. The AR interacts with several co-regulatory proteins which can perturb AR-targeted therapies in CRPC. Targeting these co-regulatory proteins to indirectly target the AR signalling cascade may prove beneficial. Recently, our group identified KMT5A as a potential regulator of AR through selective siRNA library screening.
KMT5A is a lysine methyltransferase that mono-methylates histone 4 lysine 20 and non-histone proteins, including p53. Using a relevant in vitro CRPC model it was shown that KMT5A acquires AR co-activator activity which is in contrast to androgen sensitive models where KMT5A co-represses AR activity. This highlights the importance of studying KMT5A regulation. KMT5A protein levels are tightly regulated by multiple E3 ligases for cell cycle-dependent poly-ubiquitination-mediated degradation. KMT5A poly-ubiquitination by E3 ligases SCFβ-TRCF, CRL4Cdt2 and APCCdh1 promotes degradation in G1, S and late mitosis cell cycle phases, respectively. Moreover, the Skp2 E3 ligase has been suggested to play a role in KMT5A ubiquitination and degradation but direct supporting evidence is currently absent. Additionally, Skp2 is suggested to directly regulate the AR signaling pathway. It is also unknown whether KMT5A could be modified directly by ubiquitination without promoting its degradation. As such, we aimed to investigate KMT5A mono-ubiquitination and the role of Skp2 in regulating KMT5A as well as independently regulating the AR signaling cascade.
Mono-ubiquitinated KMT5A was demonstrated in a panel of PC cell lines. Its existence was further confirmed by performing ubiquitination assays in COS7 cells. Furthermore, the KMT5A C-terminal SET domain was identified as the target for mono-ubiquitination. Moreover, mono-ubiquitinated KMT5A was highly enriched in S phase cells, coincident with extremely low levels of unmodified KMT5A. Mono-ubiquitinated KMT5A was exclusively cytoplasmic and its abundance was greatly enhanced by Skp2, but not associated with protein turnover. Together, this data suggests that cell cycle-dependent KMT5A mono-ubiquitination is an important mechanism to diminish nuclear, unmodified KMT5A levels to facilitate cell cycle progression. Thus, insight for the physiological significance of mono-ubiquitinated KMT5A
ii
may provide a novel therapeutic target to indirectly target the AR. Finally, Skp2 was not found to have a direct effect on AR signaling
Multifactorial Linear Regression Method For Prediction Of Mountain Rivers Flow
Long-term river flow forecasting methoods are used for the theoretical basis for the development of river basin water balance equation : the basins with high marks nascent river basin water balance should be organized according to higher zones . Spring flood water balance equation is used for the calculation of the measure or impossibillness to get through , so they are replaced by the flow and the approximate korelyatsion connections between the key factors , including the multifactorial flow prediction recommend using multifactorial linear regression method , which is believed to be the predicted size ( prediktant ) and the variables ( factors or prediktorner ) .There is a linear connection between the above mentioned method to calculate flow of the number of rivers of Armenia which are forecasts for 25 stations. The experimental and calculated results are about 80 %
- …