Search CORE

518 research outputs found

Investigating the relationship between language model perplexity and IR precision-recall measures

Author: Azzopardi L.
Girolami M.
Van Rijsbergen K.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2003
Field of study

An empirical study has been conducted investigating the relationship between the performance of an aspect based language model in terms of perplexity and the corresponding information retrieval performance obtained. It is observed, on the corpora considered, that the perplexity of the language model has a systematic relationship with the achievable precision recall performance though it is not statistically significant

CiteSeerX

Crossref

Enlighten

Entanglement distribution and quantum discord

Author: A Brodutch
A Fedrizzi
A Ferraro
A Kay
A Streltsov
A Streltsov
A Streltsov
A Streltsov
A Streltsov
A Streltsov
A Streltsov
A Streltsov
A Streltsov
A Streltsov
A Uhlmann
A Winter
B Dakić
B Dakić
C Peuntinger
C Silberhorn
CE Vollmer
CH Bennett
CH Bennett
D Bruß
D Girolami
D Girolami
D Girolami
D Girolami
E Chitambar
E Chitambar
FF Fanchini
FF Fanchini
G Adesso
G Vidal
H Ollivier
J Ma
J Oppenheim
K Audenaert
K Modi
K Modi
K Modi
K Życzkowski
L Henderson
L Mišta Jr
M Ali
M Ali
M Gessner
M Horodecki
M Horodecki
M Horodecki
M Horodecki
M Koashi
M Piani
M Piani
M Piani
M Zuppardo
MB Plenio
MB Plenio
P Horodecki
PM Hayden
PW Shor
R Auccaise
R Horodecki
R Pal
RF Werner
S Luo
T Baumgratz
T-C Wei
TK Chuan
TR Bromley
TS Cubitt
V Vedral
V Vedral
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/10/2016
Field of study

Establishing entanglement between distant parties is one of the most important problems of quantum technology, since long-distance entanglement is an essential part of such fundamental tasks as quantum cryptography or quantum teleportation. In this lecture we review basic properties of entanglement and quantum discord, and discuss recent results on entanglement distribution and the role of quantum discord therein. We also review entanglement distribution with separable states, and discuss important problems which still remain open. One such open problem is a possible advantage of indirect entanglement distribution, when compared to direct distribution protocols.Comment: 7 pages, 2 figures, contribution to "Lectures on general quantum correlations and their applications", edited by Felipe Fanchini, Diogo Soares-Pinto, and Gerardo Adess

arXiv.org e-Print Archive

Crossref

Classification of protein interaction sentences via gaussian processes

Author: A. Aizerman
A.M. Cohen
C.D. Manning
C.D. Manning
C.E. Rasmussen
C.H. Ding
D.D. Lewis
E.M. Marcotte
H. Chen
J. Huang
J.C. Platt
J.D. Kim
J.H. Albert
K. Crammer
K. Sugiyama
K.M.A. Chai
M. Girolami
M. Girolami
N. Lama
N. Lawrence
R. Bunescu
S. Rogers
S.S. Keerthi
Silva
T. Joachims
V. Vapnik
W. Chu
W. Chu
Y. Hao
Y. Lee
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

The increase in the availability of protein interaction studies in textual format coupled with the demand for easier access to the key results has lead to a need for text mining solutions. In the text processing pipeline, classification is a key step for extraction of small sections of relevant text. Consequently, for the task of locating protein-protein interaction sentences, we examine the use of a classifier which has rarely been applied to text, the Gaussian processes (GPs). GPs are a non-parametric probabilistic analogue to the more popular support vector machines (SVMs). We find that GPs outperform the SVM and na\"ive Bayes classifiers on binary sentence data, whilst showing equivalent performance on abstract and multiclass sentence corpora. In addition, the lack of the margin parameter, which requires costly tuning, along with the principled multiclass extensions enabled by the probabilistic framework make GPs an appealing alternative worth of further adoption

Semi-supervised prediction of protein interaction sentences exploiting semantically encoded metrics

Author: D.D. Lewis
E.M. Marcotte
J.D. Kim
K. Lund
L. Azzopardi
M. Girolami
M.N. Jones
M.N. Jones
R. Bunescu
S. Padó
S. Pyysalo
S. Rogers
T. Joachims
T.K. Landauer
Z. Minier
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Protein-protein interaction (PPI) identification is an integral component of many biomedical research and database curation tools. Automation of this task through classification is one of the key goals of text mining (TM). However, labelled PPI corpora required to train classifiers are generally small. In order to overcome this sparsity in the training data, we propose a novel method of integrating corpora that do not contain relevance judgements. Our approach uses a semantic language model to gather word similarity from a large unlabelled corpus. This additional information is integrated into the sentence classification process using kernel transformations and has a re-weighting effect on the training features that leads to an 8% improvement in F-score over the baseline results. Furthermore, we discover that some words which are generally considered indicative of interactions are actually neutralised by this process

Recommended from our members

Digital twin of an urban-integrated hydroponic farm

Author: Choudhary R
Girolami M
Jans-Singh M
Leeming K
Publication venue: Data-Centric Engineering
Publication date: 01/01/2020
Field of study

AbstractThis paper presents the development process of a digital twin of a unique hydroponic underground farm in London, Growing Underground (GU). Growing 12x more per unit area than traditional greenhouse farming in the UK, the farm also consumes 4x more energy per unit area. Key to the ongoing operational success of this farm and similar enterprises is finding ways to minimize the energy use while maximizing crop growth by maintaining optimal growing conditions. As such, it belongs to the class of Controlled Environment Agriculture, where indoor environments are carefully controlled to maximize crop growth by using artificial lighting and smart heating, ventilation, and air conditioning systems. We tracked changing environmental conditions and crop growth across 89 different variables, through a wireless sensor network and unstructured manual records, and combined all the data into a database. We show how the digital twin can provide enhanced outputs for a bespoke site like GU, by creating inferred data fields, and show the limitations of data collection in a commercial environment. For example, we find that lighting is the dominant environmental factor for temperature and thus crop growth in this farm, and that the effects of external temperature and ventilation are confounded. We combine information learned from historical data interpretation to create a bespoke temperature forecasting model (root mean squared error < 1.3°C), using a dynamic linear model with a data-centric lighting component. Finally, we present how the forecasting model can be integrated into the digital twin to provide feedback to the farmers for decision-making assistance.</jats:p

Apollo (Cambridge)

An efficient and principled method for detecting communities in networks

Author: A. Gyenge
B. W. Kernighan
Brian Ball
Brian Karrer
C. Ding
C. Ding
D. E. Knuth
D. M. Blei
E. M. Airoldi
H. Zhang
J. Parkinnen
K. Henderson
L. A. Adamic
L. Backstrom
M. E. J. Newman
M. Girolami
T. Hofmann
W. W. Zachary
Publication venue: 'American Physical Society (APS)'
Publication date: 18/04/2011
Field of study

A fundamental problem in the analysis of network data is the detection of network communities, groups of densely interconnected nodes, which may be overlapping or disjoint. Here we describe a method for finding overlapping communities based on a principled statistical approach using generative network models. We show how the method can be implemented using a fast, closed-form expectation-maximization algorithm that allows us to analyze networks of millions of nodes in reasonable running times. We test the method both on real-world networks and on synthetic benchmarks and find that it gives results competitive with previous methods. We also show that the same approach can be used to extract nonoverlapping community divisions via a relaxation method, and demonstrate that the algorithm is competitively fast and accurate for the nonoverlapping problem.Comment: 14 pages, 5 figures, 1 tabl

arXiv.org e-Print Archive

Crossref

Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC

Author: Adams R.
Ahn S.
Ahn S.
Bardenet R.
Bennett J.
Bezanson J.
Chen T.
Ding N.
Dror G.
Girolami M.
Hall K. B.
Korattikara A.
Mann G.
McDonald R.
Mnih A.
Neal R.
Patterson S.
Porteous I.
Rossky P.
Welling M.
Zinkevich M.
Publication venue
Publication date: 01/01/2015
Field of study

Despite having various attractive qualities such as high prediction accuracy and the ability to quantify uncertainty and avoid over-fitting, Bayesian Matrix Factorization has not been widely adopted because of the prohibitive cost of inference. In this paper, we propose a scalable distributed Bayesian matrix factorization algorithm using stochastic gradient MCMC. Our algorithm, based on Distributed Stochastic Gradient Langevin Dynamics, can not only match the prediction accuracy of standard MCMC methods like Gibbs sampling, but at the same time is as fast and simple as stochastic gradient descent. In our experiments, we show that our algorithm can achieve the same level of prediction accuracy as Gibbs sampling an order of magnitude faster. We also show that our method reduces the prediction error as fast as distributed stochastic gradient descent, achieving a 4.1% improvement in RMSE for the Netflix dataset and an 1.8% for the Yahoo music dataset

arXiv.org e-Print Archive

Crossref

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Comparing families of dynamic causal models

Mathematical models of scientific data can be formally compared using Bayesian model evidence. Previous applications in the biological sciences have mainly focussed on model selection in which one first selects the model with the highest evidence and then makes inferences based on the parameters of that model. This “best model” approach is very useful but can become brittle if there are a large number of models to compare, and if different subjects use different models. To overcome this shortcoming we propose the combination of two further approaches: (i) family level inference and (ii) Bayesian model averaging within families. Family level inference removes uncertainty about aspects of model structure other than the characteristic of interest. For example: What are the inputs to the system? Is processing serial or parallel? Is it linear or nonlinear? Is it mediated by a single, crucial connection? We apply Bayesian model averaging within families to provide inferences about parameters that are independent of further assumptions about model structure. We illustrate the methods using Dynamic Causal Models of brain imaging data

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

UCL Discovery

PubMed Central

ZORA

King's Research Portal

University of East Anglia digital repository

Using differential reinforcement of high rates of behavior to improve work productivity : a replication and extension

Author: Drucker P.
Girolami K. M.
Hemmes N. S.
Horner R. H.
Lanovaz M. J.
Lysaght R.
Martin G. L.
Rojahn J.
Siberski J.
Publication venue: 'Wiley'
Publication date: 30/04/2019
Field of study

Background: Due to deficits in adaptive and cognitive functioning, productivity may pose challenges for individuals with intellectual disability in the workplace.Method: Using a changing‐criterion embedded in a multiple baseline across partici‐pants design, we examined the effects of differential reinforcement of high rates of behaviour (DRH) on the rate of data entry (i.e., productivity) in four adults with intel‐lectual disability.Results: Although the DRH procedure increased the rate of correct data entry in all four participants, none of the participants achieved the criterion that we set with novice undergraduate students.Conclusions: Our results indicate that DRH is an effective intervention to increase rate of correct responding in individuals with intellectual disability, but that achiev‐ing the same productivity as workers without disability may not always be possible

Crossref

Dépôt Institutionnel Numérique

Infinite factorization of multiple non-parametric views

Author: A. Gelman
A. Klami
A. Klami
A. Rodriguez
A. Vinokourov
Arto Klami
C. Archambeau
C. Rasmussen
D. Blackwell
D. Blei
D. Cohn
D. Lee
D. M. Blei
D. M. Roy
G. Englebienne
I. Rivals
I. S. Dhillon
Janne Sinkkonen
K. Barnard
M. Welling
Mark Girolami
N. Friedman
N. L. Johnson
R. M. Neal
S. Becker
S. Rogers
Samuel Kaski
Simon Rogers
T. Hofmann
Y. W. Teh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Combined analysis of multiple data sources has increasing application interest, in particular for distinguishing shared and source-specific aspects. We extend this rationale of classical canonical correlation analysis into a flexible, generative and non-parametric clustering setting, by introducing a novel non-parametric hierarchical mixture model. The lower level of the model describes each source with a flexible non-parametric mixture, and the top level combines these to describe commonalities of the sources. The lower-level clusters arise from hierarchical Dirichlet Processes, inducing an infinite-dimensional contingency table between the views. The commonalities between the sources are modeled by an infinite block model of the contingency table, interpretable as non-negative factorization of infinite matrices, or as a prior for infinite contingency tables. With Gaussian mixture components plugged in for continuous measurements, the model is applied to two views of genes, mRNA expression and abundance of the produced proteins, to expose groups of genes that are co-regulated in either or both of the views. Cluster analysis of co-expression is a standard simple way of screening for co-regulation, and the two-view analysis extends the approach to distinguishing between pre- and post-translational regulation

CUED - Cambridge University Engineering Department