440 research outputs found
Semi-Supervised Kernel PCA
We present three generalisations of Kernel Principal Components Analysis
(KPCA) which incorporate knowledge of the class labels of a subset of the data
points. The first, MV-KPCA, penalises within class variances similar to Fisher
discriminant analysis. The second, LSKPCA is a hybrid of least squares
regression and kernel PCA. The final LR-KPCA is an iteratively reweighted
version of the previous which achieves a sigmoid loss function on the labeled
points. We provide a theoretical risk bound as well as illustrative experiments
on real and toy data sets
Theory and Applications for Advanced Text Mining
Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields
Representation Learning for Words and Entities
This thesis presents new methods for unsupervised learning of distributed
representations of words and entities from text and knowledge bases. The first
algorithm presented in the thesis is a multi-view algorithm for learning
representations of words called Multiview Latent Semantic Analysis (MVLSA). By
incorporating up to 46 different types of co-occurrence statistics for the same
vocabulary of english words, I show that MVLSA outperforms other
state-of-the-art word embedding models. Next, I focus on learning entity
representations for search and recommendation and present the second method of
this thesis, Neural Variational Set Expansion (NVSE). NVSE is also an
unsupervised learning method, but it is based on the Variational Autoencoder
framework. Evaluations with human annotators show that NVSE can facilitate
better search and recommendation of information gathered from noisy, automatic
annotation of unstructured natural language corpora. Finally, I move from
unstructured data and focus on structured knowledge graphs. I present novel
approaches for learning embeddings of vertices and edges in a knowledge graph
that obey logical constraints.Comment: phd thesis, Machine Learning, Natural Language Processing,
Representation Learning, Knowledge Graphs, Entities, Word Embeddings, Entity
Embedding
- …