Search CORE

135 research outputs found

Time-varying Learning and Content Analytics via Sparse Factor Analysis

Author: Bishop C. M.
Butler A. C.
Hastie T.
Jazwinski A. H.
Kasiviswanathan S. P.
Minka T. P.
Minka T. P.
Rasmussen C. E.
Thai-Nghe N.
Wan E. A.
Yu H.
Publication venue
Publication date: 19/12/2013
Field of study

We propose SPARFA-Trace, a new machine learning-based framework for time-varying learning and content analytics for education applications. We develop a novel message passing-based, blind, approximate Kalman filter for sparse factor analysis (SPARFA), that jointly (i) traces learner concept knowledge over time, (ii) analyzes learner concept knowledge state transitions (induced by interacting with learning resources, such as textbook sections, lecture videos, etc, or the forgetting effect), and (iii) estimates the content organization and intrinsic difficulty of the assessment questions. These quantities are estimated solely from binary-valued (correct/incorrect) graded learner response data and a summary of the specific actions each learner performs (e.g., answering a question or studying a learning resource) at each time instance. Experimental results on two online course datasets demonstrate that SPARFA-Trace is capable of tracing each learner's concept knowledge evolution over time, as well as analyzing the quality and content organization of learning resources, the question-concept associations, and the question intrinsic difficulties. Moreover, we show that SPARFA-Trace achieves comparable or better performance in predicting unobserved learner responses than existing collaborative filtering and knowledge tracing approaches for personalized education

arXiv.org e-Print Archive

CiteSeerX

Crossref

Model-based machine learning

Author: Bishop CM
Dangauthier P
Elo AE
Frey BJ
Herbrich R
Jelinek F
Koller D
Lauritzen SL
Manning CD
Minka T
Minka T
Minka T
Minka T
Minka T
Pearl J
Shotton J
Wiegerinck W
Winn J
Publication venue: 'The Royal Society'
Publication date: 01/01/2013
Field of study

Several decades of research in the field of machine learning have resulted in a multitude of different algorithms for solving a broad range of problems. To tackle a new application, a researcher typically tries to map their problem onto one of these existing methods, often influenced by their familiarity with specific algorithms and by the availability of corresponding software implementations. In this study, we describe an alternative methodology for applying machine learning, in which a bespoke solution is formulated for each new application. The solution is expressed through a compact modelling language, and the corresponding custom machine learning code is then generated automatically. This model-based approach offers several major advantages, including the opportunity to create highly tailored models for specific scenarios, as well as rapid prototyping and comparison of a range of alternative models. Furthermore, newcomers to the field of machine learning do not have to learn about the huge range of traditional methods, but instead can focus their attention on understanding a single modelling environment. In this study, we show how probabilistic graphical models, coupled with efficient inference algorithms, provide a very flexible foundation for model-based machine learning, and we outline a large-scale commercial application of this framework involving tens of millions of users. We also describe the concept of probabilistic programming as a powerful software environment for model-based machine learning, and we discuss a specific probabilistic programming language called Infer.NET, which has been widely used in practical applications

Crossref

PubMed Central

Edinburgh Research Explorer

Content-Based Image Retrieval Using Self-Organizing Maps

Author: P. Koikkalainen
S.-F. Chang
T. Honkela
T. P. Minka
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Noise and nonlinearities in high-throughput data

Author: Bagnoli F
Bagnoli F
Franco Bagnoli
Koukolíková-Nicola Z
Minka T
Nguyen V-A Nicola-Koulikova Z Bagnoli F Lió P Ho Tu Bao Zhou Zhi-Hua
Pietro Lió
Rajan J J
Viet-Anh Nguyen
Zdena Koukolíková-Nicola
Publication venue: 'IOP Publishing'
Publication date: 05/01/2010
Field of study

High-throughput data analyses are becoming common in biology, communications, economics and sociology. The vast amounts of data are usually represented in the form of matrices and can be considered as knowledge networks. Spectra-based approaches have proved useful in extracting hidden information within such networks and for estimating missing data, but these methods are based essentially on linear assumptions. The physical models of matching, when applicable, often suggest non-linear mechanisms, that may sometimes be identified as noise. The use of non-linear models in data analysis, however, may require the introduction of many parameters, which lowers the statistical weight of the model. According to the quality of data, a simpler linear analysis may be more convenient than more complex approaches. In this paper, we show how a simple non-parametric Bayesian model may be used to explore the role of non-linearities and noise in synthetic and experimental data sets.Comment: 12 pages, 3 figure

arXiv.org e-Print Archive

Crossref

A Geometric Variational Approach to Bayesian Inference

Author: Abhijoy Saha
Barber D.
Bauer M.
Bhattacharyya A
Bishop C. M
Broderick T.
Chen T.
Ghahramani Z.
Hernández-Lobato J.
Hoffman M.
Hoffman M. D.
Jaakkola T.
Karthik Bharath
Kass R. E.
Kingma D. P.
Kucukelbir A.
Lang S
Li Y.
Minka T. P
Rao C. R
Rezende D.
Rényi A
Saul L. K.
Sebastian Kurtek
Sigillito V. G.
Srivastava A.
Tan L. S
Wang C.
Yeung D.
Publication venue
Publication date: 27/03/2019
Field of study

We propose a novel Riemannian geometric framework for variational inference in Bayesian models based on the nonparametric Fisher-Rao metric on the manifold of probability density functions. Under the square-root density representation, the manifold can be identified with the positive orthant of the unit hypersphere in L2, and the Fisher-Rao metric reduces to the standard L2 metric. Exploiting such a Riemannian structure, we formulate the task of approximating the posterior distribution as a variational problem on the hypersphere based on the alpha-divergence. This provides a tighter lower bound on the marginal distribution when compared to, and a corresponding upper bound unavailable with, approaches based on the Kullback-Leibler divergence. We propose a novel gradient-based algorithm for the variational problem based on Frechet derivative operators motivated by the geometry of the Hilbert sphere, and examine its properties. Through simulations and real-data applications, we demonstrate the utility of the proposed geometric framework and algorithm on several Bayesian models

arXiv.org e-Print Archive

Crossref

Repository@Nottingham

FigShare

Learning a Factor Model via Regularized PCA

Author: A. A. Amini
A. D’Aspremont
B. Laurent
Benjamin Van Roy
C. M. Bishop
D. B. Rubin
D. Paul
E. J. Candès
G. Pison
H. Akaike
H. H. Harman
H. M. Markowitz
H. Xu
I. M. Johnstone
I. T. Jolliffe
J. Baik
J. Friedman
K. Hirose
M. E. Tipping
M. Pourahmadi
M. Yuan
O. Banerjee
P. Ravikumar
S. Boyd
T. P. Minka
V. Chandrasekaran
Yi-Hao Kao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

We consider the problem of learning a linear factor model. We propose a regularized form of principal component analysis (PCA) and demonstrate through experiments with synthetic and real data the superiority of resulting estimates to those produced by pre-existing factor analysis approaches. We also establish theoretical results that explain how our algorithm corrects the biases induced by conventional approaches. An important feature of our algorithm is that its computational requirements are similar to those of PCA, which enjoys wide use in large part due to its efficiency

arXiv.org e-Print Archive

CiteSeerX

Crossref

On the Use of Upper Trust Bounds in Constrained Bayesian Optimization Infill Criterion

Author: Hernández-Lobato J. M.
Kandasamy K.
Krige D. G.
Minka T. P.
Picheny V.
Srinivas N.
Wang Z.
Wang Z.
Publication venue: 'American Institute of Aeronautics and Astronautics (AIAA)'
Publication date: 01/01/2019
Field of study

In order to handle constrained optimization problems with a large number of design variables, a new approach has been proposed to address constraints in a surrogate-based optimization framework. This approach focuses on sequential enrichment using adaptive surrogate models based on Bayesian optimization approach, and Gaussian process models. A constraints criterion using the uncertainty estimation of the Gaussian process models is introduced. Different evolutions of the algorithm, based on the accuracy of the constraints surrogate models, are used for selecting the infill sample points. The resulting algorithm has been tested on the well known modified Branin optimization problem

Crossref

Open Archive Toulouse Archive Ouverte

PolyPublie

Piecewise Approximate Bayesian Computation: fast inference for discretely observed Markov models using a factorised posterior distribution

Author: A. Golightly
B.W. Silverman
C. Andrieu
D. Moriña
D.J. Wilkinson
D.J. Wilkinson
E. Gabriel
E. McKenzie
E.B. Sudderth
G.B. Durham
G.J. Székely
J.C. Cox
J.K. Pritchard
J.M. Marin
K. Fukunaga
K.V. Mardia
M.A. Al-Osh
M.A. Beaumont
M.G.B. Blum
N.G. Kampen Van
P. Fearnhead
P. Marjoram
P. Neal
P. Wand
R.D. Wilkinson
R.J. Boys
S. P. Preston
S. R. White
T. Kypraios
T. McKinley
T. Toni
T.P. Minka
Y. Aït-Sahalia
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Many modern statistical applications involve inference for complicated stochastic models for which the likelihood function is difficult or even impossible to calculate, and hence conventional likelihood-based inferential techniques cannot be used. In such settings, Bayesian inference can be performed using Approximate Bayesian Computation (ABC). However, in spite of many recent developments to ABC methodology, in many applications the computational cost of ABC necessitates the choice of summary statistics and tolerances that can potentially severely bias the estimate of the posterior. We propose a new “piecewise” ABC approach suitable for discretely observed Markov models that involves writing the posterior density of the parameters as a product of factors, each a function of only a subset of the data, and then using ABC within each factor. The approach has the advantage of side-stepping the need to choose a summary statistic and it enables a stringent tolerance to be set, making the posterior “less approximate”. We investigate two methods for estimating the posterior density based on ABC samples for each of the factors: the first is to use a Gaussian approximation for each factor, and the second is to use a kernel density estimate. Both methods have their merits. The Gaussian approximation is simple, fast, and probably adequate for many applications. On the other hand, using instead a kernel density estimate has the benefit of consistently estimating the true piecewise ABC posterior as the number of ABC samples tends to infinity. We illustrate the piecewise ABC approach with four examples; in each case, the approach offers fast and accurate inference

Nottingham ePrints

Nottingham eTheses

Crossref

Repository@Nottingham

Springer - Publisher Connector

PubMed Central

Tabular: A Schema-driven Probabilistic Programming Language

Author: Bachrach Y.
Bishop C. M.
Domingos P.
Getoor L.
Goodman N.
Grosse R.
Herbrich R.
Izbicki M.
Koller D.
McCallum A.
Minka T.
Pfeffer A.
Shafto P.
Wingate D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

We propose a new kind of probabilistic programming language for machine learning. We write programs simply by annotating existing relational schemas with probabilistic model expressions. We describe a detailed design of our language, Tabular, complete with formal semantics and type system. A rich series of examples illustrates the expressiveness of Tabular. We report an implementation, and show evidence of the succinctness of our notation relative to current best practice. Finally, we describe and verify a transformation of Tabular schemas so as to predict missing values in a concrete database. The ability to query for missing values provides a uniform interface to a wide variety of tasks, including classification, clustering, recommendation, and ranking

CiteSeerX

Crossref

Publikationer från Uppsala Universitet

Edinburgh Research Explorer

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Distinguishing Asthma Phenotypes Using Machine Learning Approaches.

Author: A Custovic
A Custovic
A Fraser
A Høst
A Pickles
A Simpson
A Wijga
Adnan Custovic
AJ Lowe
AV Berg
B Clarisse
BD Spycher
BD Spycher
BD Spycher
BG Toelle
BL Jones
BL Jones
C-M Chen
CA Figueiredo
CE Kuehni
CJ Lodge
CL Storr
D Barber
D Belgrave
D Caudri
D Nagin
DA Linzer
DC Belgrave
DC Belgrave
DCM Belgrave
DCM Belgrave
F Kauffmann
F Kauffmann
FD Martinez
FL Garden
FP Perera
G Bochenek
G Weinmayr
GB Marks
GP Anderson
J Hagenaars
J Henderson
J Lotvall
J Magidson
J Sunyer
J Winn
JA Smith
JK Vermunt
K Burnham
KE Wonderen Van
KL Nylund
L García-Marcos Álvarez
L Hunt
L Lowe
L Panico
LA Lowe
M Depner
M Herr
M Scott
Magnus Rattray
Mattia Prosperi
MJ Ege
ML Barreto
MM Hagendorens
MW Pijnenburg
N Lazic
NC Nicolaou
NG Papadopoulos
OE Savenije
P Burney
P Haldar
P Rzehak
P Rzehak
PD Sly
Q Chen
Q Vuong
Rebecca Howard
RJP Valk van der
RL Bergmann
RL Miller
RO Crapo
RT Stein
S American Thoracic
S Havstad
S Mihrshahi
S Rabe-Hesketh
S Stanojevic
SE Wenzel
SK Weiland
ST Lanza
ST Lanza
T Jung
T Minka
The European Community Respiratory Health Survey
V Siroux
WC Moore
X Robin
Y Lo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Asthma is not a single disease, but an umbrella term for a number of distinct diseases, each of which are caused by a distinct underlying pathophysiological mechanism. These discrete disease entities are often labelled as asthma endotypes. The discovery of different asthma subtypes has moved from subjective approaches in which putative phenotypes are assigned by experts to data-driven ones which incorporate machine learning. This review focuses on the methodological developments of one such machine learning technique-latent class analysis-and how it has contributed to distinguishing asthma and wheezing subtypes in childhood. It also gives a clinical perspective, presenting the findings of studies from the past 5 years that used this approach. The identification of true asthma endotypes may be a crucial step towards understanding their distinct pathophysiological mechanisms, which could ultimately lead to more precise prevention strategies, identification of novel therapeutic targets and the development of effective personalized therapies

Crossref

Springer - Publisher Connector

PubMed Central

Spiral - Imperial College Digital Repository

The University of Manchester - Institutional Repository