2,223,492 research outputs found
Learning Determinantal Point Processes
Determinantal point processes (DPPs), which arise in random matrix theory and
quantum physics, are natural models for subset selection problems where
diversity is preferred. Among many remarkable properties, DPPs offer tractable
algorithms for exact inference, including computing marginal probabilities and
sampling; however, an important open question has been how to learn a DPP from
labeled training data. In this paper we propose a natural feature-based
parameterization of conditional DPPs, and show how it leads to a convex and
efficient learning formulation. We analyze the relationship between our model
and binary Markov random fields with repulsive potentials, which are
qualitatively similar but computationally intractable. Finally, we apply our
approach to the task of extractive summarization, where the goal is to choose a
small subset of sentences conveying the most important information from a set
of documents. In this task there is a fundamental tradeoff between sentences
that are highly relevant to the collection as a whole, and sentences that are
diverse and not repetitive. Our parameterization allows us to naturally balance
these two characteristics. We evaluate our system on data from the DUC 2003/04
multi-document summarization task, achieving state-of-the-art results
Analyzing collaborative learning processes automatically
In this article we describe the emerging area of text classification research focused on the problem of collaborative learning process analysis both from a broad perspective and more specifically in terms of a publicly available tool set called TagHelper tools. Analyzing the variety of pedagogically valuable facets of learners’ interactions is a time consuming and effortful process. Improving automated analyses of such highly valued processes of collaborative learning by adapting and applying recent text classification technologies would make it a less arduous task to obtain insights from corpus data. This endeavor also holds the potential for enabling substantially improved on-line instruction both by providing teachers and facilitators with reports about the groups they are moderating and by triggering context sensitive collaborative learning support on an as-needed basis. In this article, we report on an interdisciplinary research project, which has been investigating the effectiveness of applying text classification technology to a large CSCL corpus that has been analyzed by human coders using a theory-based multidimensional coding scheme. We report promising results and include an in-depth discussion of important issues such as reliability, validity, and efficiency that should be considered when deciding on the appropriateness of adopting a new technology such as TagHelper tools. One major technical contribution of this work is a demonstration that an important piece of the work towards making text classification technology effective for this purpose is designing and building linguistic pattern detectors, otherwise known as features, that can be extracted reliably from texts and that have high predictive power for the categories of discourse actions that the CSCL community is interested in
Optimality of Poisson processes intensity learning with Gaussian processes
In this paper we provide theoretical support for the so-called "Sigmoidal
Gaussian Cox Process" approach to learning the intensity of an inhomogeneous
Poisson process on a -dimensional domain. This method was proposed by Adams,
Murray and MacKay (ICML, 2009), who developed a tractable computational
approach and showed in simulation and real data experiments that it can work
quite satisfactorily. The results presented in the present paper provide
theoretical underpinning of the method. In particular, we show how to tune the
priors on the hyper parameters of the model in order for the procedure to
automatically adapt to the degree of smoothness of the unknown intensity and to
achieve optimal convergence rates
- …