128 research outputs found
Learning without labels and nonnegative tensor factorization
Supervised learning tasks like building a classifier, estimating the error rate of the
predictors, are typically performed with labeled data. In most cases, obtaining labeled data
is costly as it requires manual labeling. On the other hand, unlabeled data is available in
abundance. In this thesis, we discuss methods to perform supervised learning tasks with
no labeled data. We prove consistency of the proposed methods and demonstrate its applicability
with synthetic and real world experiments. In some cases, small quantities of labeled data maybe easily available and supplemented with large quantities of unlabeled data (semi-supervised learning). We derive the asymptotic efficiency of generative models for semi-supervised learning and quantify the effect of labeled and unlabeled data on the quality of the estimate. Another independent track of the thesis is efficient computational methods for nonnegative tensor factorization (NTF). NTF provides the user with rich modeling capabilities but it comes with an added computational cost. We provide a fast algorithm for performing NTF using a modified active set method called block principle pivoting method and demonstrate its applicability to social network analysis and text
mining.M.S.Committee Chair: Lebanon, Guy; Committee Co-Chair: Park, Haesun; Committee Member: Gray, Alexande
Linguistic Geometries for Unsupervised Dimensionality Reduction
Text documents are complex high dimensional objects. To effectively visualize
such data it is important to reduce its dimensionality and visualize the low
dimensional embedding as a 2-D or 3-D scatter plot. In this paper we explore
dimensionality reduction methods that draw upon domain knowledge in order to
achieve a better low dimensional embedding and visualization of documents. We
consider the use of geometries specified manually by an expert, geometries
derived automatically from corpus statistics, and geometries computed from
linguistic resources.Comment: 13 pages, 15 figure
- …