Search CORE

126 research outputs found

Distributional Sentence Entailment Using Density Matrices

Author: B Coecke
CJ Rijsbergen Van
J Lambek
L Kotlerman
P Selinger
Publication venue
Publication date: 26/08/2015
Field of study

Categorical compositional distributional model of Coecke et al. (2010) suggests a way to combine grammatical composition of the formal, type logical models with the corpus based, empirical word representations of distributional semantics. This paper contributes to the project by expanding the model to also capture entailment relations. This is achieved by extending the representations of words from points in meaning space to density operators, which are probability distributions on the subspaces of the space. A symmetric measure of similarity and an asymmetric measure of entailment is defined, where lexical entailment is measured using von Neumann entropy, the quantum variant of Kullback-Leibler divergence. Lexical entailment, combined with the composition map on word representations, provides a method to obtain entailment relations on the level of sentences. Truth theoretic and corpus-based examples are provided.Comment: 11 page

arXiv.org e-Print Archive

Crossref

HAL Descartes

Recommended from our members

Argo real-time quality control intercomparison

Author: Antonov JI
Cummings JA
Locarnini RA
Van Rijsbergen CJ
Webster MA
Publication venue: 'Informa UK Limited'
Publication date: 11/12/2015
Field of study

The real-time quality control (RTQC) methods applied to Argo profiling float data by the United Kingdom (UK) Met Office, the United States (US) Fleet Numerical Meteorology and Oceanography Centre, the Australian Bureau of Meteorology and the Coriolis Centre are compared and contrasted. Data are taken from the period 2007 to 2011 inclusive and RTQC performance is assessed with respect to Argo delayed-mode quality control (DMQC). An intercomparison of RTQC techniques is performed using a common data set of profiles from 2010 and 2011. The RTQC systems are found to have similar power in identifying faulty Argo profiles but to vary widely in the number of good profiles incorrectly rejected. The efficacy of individual QC tests are inferred from the results of the intercomparison. Techniques to increase QC performance are discussed

Central Archive at the University of Reading

Crossref

Utilizing Online Social Network and Location-Based Data to Recommend Products and Categories in Online Marketplaces

Author: A Barabási
B Smyth
CJ Rijsbergen Van
D Parra
J Delporte
JL Herlocker
L Adamic
Publication venue
Publication date: 08/09/2014
Field of study

Recent research has unveiled the importance of online social networks for improving the quality of recommender systems and encouraged the research community to investigate better ways of exploiting the social information for recommendations. To contribute to this sparse field of research, in this paper we exploit users' interactions along three data sources (marketplace, social network and location-based) to assess their performance in a barely studied domain: recommending products and domains of interests (i.e., product categories) to people in an online marketplace environment. To that end we defined sets of content- and network-based user similarity features for each data source and studied them isolated using an user-based Collaborative Filtering (CF) approach and in combination via a hybrid recommender algorithm, to assess which one provides the best recommendation performance. Interestingly, in our experiments conducted on a rich dataset collected from SecondLife, a popular online virtual world, we found that recommenders relying on user similarity features obtained from the social network data clearly yielded the best results in terms of accuracy in case of predicting products, whereas the features obtained from the marketplace and location-based data sources also obtained very good results in case of predicting categories. This finding indicates that all three types of data sources are important and should be taken into account depending on the level of specialization of the recommendation task.Comment: 20 pages book chapte

arXiv.org e-Print Archive

Crossref

From E-MAPs to module maps: dissecting quantitative genetic interactions using physical interactions

Author: Bauer A
Hedges LV
Igor Ulitsky
Martin Kupiec
Ron Shamir
Sharan R
Tomer Shlomi
Van Rijsbergen CJ
Publication venue: Nature Publishing Group
Publication date
Field of study

Recent technological breakthroughs allow the quantification of hundreds of thousands of genetic interactions (GIs) in Saccharomyces cerevisiae. The interpretation of these data is often difficult, but it can be improved by the joint analysis of GIs along with complementary data types. Here, we describe a novel methodology that integrates genetic and physical interaction data. We use our method to identify a collection of functional modules related to chromosomal biology and to investigate the relations among them. We show how the resulting map of modules provides clues for the elucidation of function both at the level of individual genes and at the level of functional modules

Crossref

PubMed Central

Gaze direction when driving after dark on main and residential roads: Where is the dominant location?

Author: Holmqvist K
J Winter
McNicol D
S Fotios
S Völker
van Rijsbergen CJ
Publication venue: 'SAGE Publications'
Publication date: 17/02/2016
Field of study

CIE JTC-1 has requested data regarding the size and shape of the distribution of drivers’ eye movement in order to characterise their visual adaptation. This article reports the eye movement of drivers along two routes in Berlin after dark, a main road and a residential street, captured using eye tracking. It was found that viewing behaviour differed between the two types of road. On the main road eye movement was clustered within a circle of approximately 10° diameter, centred at the horizon of the lane. On the residential street eye movement is clustered slightly (3.8°) towards the near side; eye movements were best captured with either an ellipse of approximate axes 10° vertical and 20° horizontal, centred on the lane ahead, or a 10° circle centred 3.8° towards the near side. These distributions reflect a driver’s tendency to look towards locations of anticipated hazards

DepositOnce

Crossref

White Rose Research Online

Correlation-Adjusted Regression Survival Scores for High-Dimensional Variable Selection

Author: Benjamini Y
Cox DR
Huber PJ
Klein JP
Van Rijsbergen CJ
Zuber V
Publication venue: Statistics in Medicine
Publication date: 24/02/2018
Field of study

Background The development of classification methods for personalized medicine is highly dependent on the identification of predictive genetic markers. In survival analysis it is often necessary to discriminate between influential and non-influential markers. It is common to perform univariate screening using Cox scores, which quantify the associations between survival and each of the markers to provide a ranking. Since Cox scores do not account for dependencies between the markers, their use is suboptimal in the presence highly correlated markers. Methods As an alternative to the Cox score, we propose the correlation-adjusted regression survival (CARS) score for right-censored survival outcomes. By removing the correlations between the markers, the CARS score quantifies the associations between the outcome and the set of “de-correlated” marker values. Estimation of the scores is based on inverse probability weighting, which is applied to log-transformed event times. For high-dimensional data, estimation is based on shrinkage techniques. Results The consistency of the CARS score is proven under mild regularity conditions. In simulations with high correlations, survival models based on CARS score rankings achieved higher areas under the precision-recall curve than competing methods. Two example applications on prostate and breast cancer confirmed these results. CARS scores are implemented in the R package carSurv. Conclusions In research applications involving high-dimensional genetic data, the use of CARS scores for marker selection is a favorable alternative to Cox scores even when correlations between covariates are low. Having a straightforward interpretation and low computational requirements, CARS scores are an easy-to-use screening tool in personalized medicine research.This research was supported by the Deutsche Forschungsgemeinschaft (Project SCHM 2966/1-2), Wellcome Trust and the Royal Society (Grant Number 204623/Z/16/Z) and the UK Medical Research Council (Grant Number MC_UU_00002/7

arXiv.org e-Print Archive

Crossref

Spiral - Imperial College Digital Repository

Apollo (Cambridge)

Distance matters! Cumulative proximity expansions for ranking documents

Author: Arjen P. de Vries
B He
C Zhai
CJ Rijsbergen Van
CL Clarke
EM Keen
Jeroen B. P. Vuurens
O Vechtomova
T Sakai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2014
Field of study

In the information retrieval process, functions that rank documents according to their estimated relevance to a query typically regard query terms as being independent. However, it is often the joint presence of query terms that is of interest to the user, which is overlooked when matching independent terms. One feature that can be used to express the relatedness of co-occurring terms is their proximity in text. In past research, models that are trained on the proximity information in a collection have performed better than models that are not estimated on data. We analyzed how co-occurring query terms can be used to estimate the relevance of documents based on their distance in text, which is used to extend a unigram ranking function with a proximity model that accumulates the scores of all occurring term combinations. This proximity model is more practical than existing models, since it does not require any co-occurrence statistics, it obviates the need to tune additional parameters, and has a retrieval speed close to competing models. We show that this approach is more robust than existing models, on both Web and newswire corpora, and on average performs equal or better than existing proximity models across collections

Crossref

CWI's Institutional Repository

Learning Pretopological Spaces for Lexical Taxonomy Acquisition

Author: C Largeron
CJ Van Rijsbergen
GA Miller
L Scrucca
P Cimiano
P Velardi
R Navigli
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/09/2015
Field of study

International audienceIn this paper, we propose a new methodology for semi-supervised acquisition of lexical taxonomies from a list of existing terms. Our approach is based on the theory of pretopology that offers a powerful formalism to model semantic relations and transform a list of terms into a structured term space by combining different discriminant criteria. In order to learn a parameterized pretopological space, we define the Learning Pretopological Spaces strategy based on genetic algorithms. The rare but accurate pieces of knowledge given by an expert (semi-supervision) or automatically extracted with existing linguistic patterns (auto-supervision) are used to parameterize the different features defining the pretopological term space. Then, a structuring algorithm is used to transform the pretopological space into a lexical taxonomy, i.e. a direct acyclic graph. Results over three standard datasets (two from WordNet and one from UMLS) evidence improved performances against existing associative and pattern-based state-of-the-art approaches

HAL - Normandie Université

Crossref

Semantic distillation: a method for clustering objects by their contextual specificity

Author: AN Langville
AN Langville
Chris Godsil and Gordon Royle
CJ Rijsbergen van
DM Cvetković
F Fouss
I Yanai
J Mercer
J Shi
JC Bezdek
K Pearson
LA Zadeh
M Belkin
M Campanino
Miklós Rédei
MLD Chiara
MW Berry
N Aronszajn
P Baldi
P Gärdenfors
R Baeza-Yates
R Fan
R Homayouni
RR Coifman
S Vishveshwara
ST Wang
Sándor Dominich
Publication venue
Publication date: 01/01/2007
Field of study

Techniques for data-mining, latent semantic analysis, contextual search of databases, etc. have long ago been developed by computer scientists working on information retrieval (IR). Experimental scientists, from all disciplines, having to analyse large collections of raw experimental data (astronomical, physical, biological, etc.) have developed powerful methods for their statistical analysis and for clustering, categorising, and classifying objects. Finally, physicists have developed a theory of quantum measurement, unifying the logical, algebraic, and probabilistic aspects of queries into a single formalism. The purpose of this paper is twofold: first to show that when formulated at an abstract level, problems from IR, from statistical data analysis, and from physical measurement theories are very similar and hence can profitably be cross-fertilised, and, secondly, to propose a novel method of fuzzy hierarchical clustering, termed \textit{semantic distillation} -- strongly inspired from the theory of quantum measurement --, we developed to analyse raw data coming from various types of experiments on DNA arrays. We illustrate the method by analysing DNA arrays experiments and clustering the genes of the array according to their specificity.Comment: Accepted for publication in Studies in Computational Intelligence, Springer-Verla

arXiv.org e-Print Archive

CiteSeerX

Crossref

HAL-Rennes 1

A Compromise between Neutrino Masses and Collider Signatures in the Type-II Seesaw Model

Author: A Bookstein
BK Ghosh
CJ Rijsbergen van
D Angluin
D Cohn
DD Lewis
DJC MacKay
DT Davis
G Salton
HS Seung
J Hwang
M Plutowski
ME Maron
N Fuhr
N Fuhr
P Biebricher
P McCullagh
P. E. Hart
PE Utgoff
PJ Hayes
RO Duda
S Robertson
TM Mitchell
WA Gale
WB Croft
WG Cochran
WS Cooper
WS Cooper
Y Freund
Publication venue
Publication date: 01/01/1994
Field of study

A natural extension of the standard

SU(2)_{\rm L} \times U(1)_{\rm Y}

gauge model to accommodate massive neutrinos is to introduce one Higgs triplet and three right-handed Majorana neutrinos, leading to a

6\times 6

neutrino mass matrix which contains three

3\times 3

sub-matrices

M_{\rm L}

M_{\rm D}

and

M_{\rm R}

. We show that three light Majorana neutrinos (i.e., the mass eigenstates of

\nu_e

\nu_\mu

and

\nu_\tau

) are exactly massless in this model, if and only if

M_{\rm L} = M_{\rm D} M_{\rm R}^{-1} M_{\rm D}^T

exactly holds. This no-go theorem implies that small but non-vanishing neutrino masses may result from a significant but incomplete cancellation between

M_{\rm L}

and

M_{\rm D} M_{\rm R}^{-1} M_{\rm D}^T

terms in the Type-II seesaw formula, provided three right-handed Majorana neutrinos are of

{\cal O}(1)

TeV and experimentally detectable at the LHC. We propose three simple Type-II seesaw scenarios with the

A_4 \times U(1)_{\rm X}

flavor symmetry to interpret the observed neutrino mass spectrum and neutrino mixing pattern. Such a TeV-scale neutrino model can be tested in two complementary ways: (1) searching for possible collider signatures of lepton number violation induced by the right-handed Majorana neutrinos and doubly-charged Higgs particles; and (2) searching for possible consequences of unitarity violation of the

3\times 3

neutrino mixing matrix in the future long-baseline neutrino oscillation experiments.Comment: RevTeX 19 pages, no figure

arXiv.org e-Print Archive

Crossref