116,772 research outputs found
The Importance of Being Clustered: Uncluttering the Trends of Statistics from 1970 to 2015
In this paper we retrace the recent history of statistics by analyzing all
the papers published in five prestigious statistical journals since 1970,
namely: Annals of Statistics, Biometrika, Journal of the American Statistical
Association, Journal of the Royal Statistical Society, series B and Statistical
Science. The aim is to construct a kind of "taxonomy" of the statistical papers
by organizing and by clustering them in main themes. In this sense being
identified in a cluster means being important enough to be uncluttered in the
vast and interconnected world of the statistical research. Since the main
statistical research topics naturally born, evolve or die during time, we will
also develop a dynamic clustering strategy, where a group in a time period is
allowed to migrate or to merge into different groups in the following one.
Results show that statistics is a very dynamic and evolving science, stimulated
by the rise of new research questions and types of data
Recruitment Market Trend Analysis with Sequential Latent Variable Models
Recruitment market analysis provides valuable understanding of
industry-specific economic growth and plays an important role for both
employers and job seekers. With the rapid development of online recruitment
services, massive recruitment data have been accumulated and enable a new
paradigm for recruitment market analysis. However, traditional methods for
recruitment market analysis largely rely on the knowledge of domain experts and
classic statistical models, which are usually too general to model large-scale
dynamic recruitment data, and have difficulties to capture the fine-grained
market trends. To this end, in this paper, we propose a new research paradigm
for recruitment market analysis by leveraging unsupervised learning techniques
for automatically discovering recruitment market trends based on large-scale
recruitment data. Specifically, we develop a novel sequential latent variable
model, named MTLVM, which is designed for capturing the sequential dependencies
of corporate recruitment states and is able to automatically learn the latent
recruitment topics within a Bayesian generative framework. In particular, to
capture the variability of recruitment topics over time, we design hierarchical
dirichlet processes for MTLVM. These processes allow to dynamically generate
the evolving recruitment topics. Finally, we implement a prototype system to
empirically evaluate our approach based on real-world recruitment data in
China. Indeed, by visualizing the results from MTLVM, we can successfully
reveal many interesting findings, such as the popularity of LBS related jobs
reached the peak in the 2nd half of 2014, and decreased in 2015.Comment: 11 pages, 30 figure, SIGKDD 201
Temporal Topic Analysis with Endogenous and Exogenous Processes
We consider the problem of modeling temporal textual data taking endogenous
and exogenous processes into account. Such text documents arise in real world
applications, including job advertisements and economic news articles, which
are influenced by the fluctuations of the general economy. We propose a
hierarchical Bayesian topic model which imposes a "group-correlated"
hierarchical structure on the evolution of topics over time incorporating both
processes, and show that this model can be estimated from Markov chain Monte
Carlo sampling methods. We further demonstrate that this model captures the
intrinsic relationships between the topic distribution and the time-dependent
factors, and compare its performance with latent Dirichlet allocation (LDA) and
two other related models. The model is applied to two collections of documents
to illustrate its empirical performance: online job advertisements from
DirectEmployers Association and journalists' postings on BusinessInsider.com
- …