50 research outputs found
Climbing the tower of babel: Unsupervised multilingual learning
For centuries, scholars have explored the deep
links among human languages. In this paper,
we present a class of probabilistic models
that use these links as a form of naturally
occurring supervision. These models allow
us to substantially improve performance for
core text processing tasks, such as morphological
segmentation, part-of-speech tagging,
and syntactic parsing. Besides these traditional
NLP tasks, we also present a multilingual
model for the computational decipherment
of lost languages
Modelling the Lexicon in Unsupervised Part of Speech Induction
Automatically inducing the syntactic part-of-speech categories for words in
text is a fundamental task in Computational Linguistics. While the performance
of unsupervised tagging models has been slowly improving, current
state-of-the-art systems make the obviously incorrect assumption that all
tokens of a given word type must share a single part-of-speech tag. This
one-tag-per-type heuristic counters the tendency of Hidden Markov Model based
taggers to over generate tags for a given word type. However, it is clearly
incompatible with basic syntactic theory. In this paper we extend a
state-of-the-art Pitman-Yor Hidden Markov Model tagger with an explicit model
of the lexicon. In doing so we are able to incorporate a soft bias towards
inducing few tags per type. We develop a particle filter for drawing samples
from the posterior of our model and present empirical results that show that
our model is competitive with and faster than the state-of-the-art without
making any unrealistic restrictions.Comment: To be presented at the 14th Conference of the European Chapter of the
Association for Computational Linguistic
The Importance of Category Labels in Grammar Induction with Child-directed Utterances
Recent progress in grammar induction has shown that grammar induction is
possible without explicit assumptions of language-specific knowledge. However,
evaluation of induced grammars usually has ignored phrasal labels, an essential
part of a grammar. Experiments in this work using a labeled evaluation metric,
RH, show that linguistically motivated predictions about grammar sparsity and
use of categories can only be revealed through labeled evaluation. Furthermore,
depth-bounding as an implementation of human memory constraints in grammar
inducers is still effective with labeled evaluation on multilingual transcribed
child-directed utterances.Comment: The 16th International Conference on Parsing Technologies (IWPT 2020
Recruitment Market Trend Analysis with Sequential Latent Variable Models
Recruitment market analysis provides valuable understanding of
industry-specific economic growth and plays an important role for both
employers and job seekers. With the rapid development of online recruitment
services, massive recruitment data have been accumulated and enable a new
paradigm for recruitment market analysis. However, traditional methods for
recruitment market analysis largely rely on the knowledge of domain experts and
classic statistical models, which are usually too general to model large-scale
dynamic recruitment data, and have difficulties to capture the fine-grained
market trends. To this end, in this paper, we propose a new research paradigm
for recruitment market analysis by leveraging unsupervised learning techniques
for automatically discovering recruitment market trends based on large-scale
recruitment data. Specifically, we develop a novel sequential latent variable
model, named MTLVM, which is designed for capturing the sequential dependencies
of corporate recruitment states and is able to automatically learn the latent
recruitment topics within a Bayesian generative framework. In particular, to
capture the variability of recruitment topics over time, we design hierarchical
dirichlet processes for MTLVM. These processes allow to dynamically generate
the evolving recruitment topics. Finally, we implement a prototype system to
empirically evaluate our approach based on real-world recruitment data in
China. Indeed, by visualizing the results from MTLVM, we can successfully
reveal many interesting findings, such as the popularity of LBS related jobs
reached the peak in the 2nd half of 2014, and decreased in 2015.Comment: 11 pages, 30 figure, SIGKDD 201