126 research outputs found
Distributional Sentence Entailment Using Density Matrices
Categorical compositional distributional model of Coecke et al. (2010)
suggests a way to combine grammatical composition of the formal, type logical
models with the corpus based, empirical word representations of distributional
semantics. This paper contributes to the project by expanding the model to also
capture entailment relations. This is achieved by extending the representations
of words from points in meaning space to density operators, which are
probability distributions on the subspaces of the space. A symmetric measure of
similarity and an asymmetric measure of entailment is defined, where lexical
entailment is measured using von Neumann entropy, the quantum variant of
Kullback-Leibler divergence. Lexical entailment, combined with the composition
map on word representations, provides a method to obtain entailment relations
on the level of sentences. Truth theoretic and corpus-based examples are
provided.Comment: 11 page
Recommended from our members
Argo real-time quality control intercomparison
The real-time quality control (RTQC) methods applied to Argo profiling float data by the United Kingdom (UK) Met Office, the United States (US) Fleet Numerical Meteorology and Oceanography Centre, the Australian Bureau of Meteorology and the Coriolis Centre are compared and contrasted. Data are taken from the period 2007 to 2011 inclusive and RTQC performance is assessed with respect to Argo delayed-mode quality control (DMQC). An intercomparison of RTQC techniques is performed using a common data set of profiles from 2010 and 2011. The RTQC systems are found to have similar power in identifying faulty Argo profiles but to vary widely in the number of good profiles incorrectly rejected. The efficacy of individual QC tests are inferred from the results of the intercomparison. Techniques to increase QC
performance are discussed
Utilizing Online Social Network and Location-Based Data to Recommend Products and Categories in Online Marketplaces
Recent research has unveiled the importance of online social networks for
improving the quality of recommender systems and encouraged the research
community to investigate better ways of exploiting the social information for
recommendations. To contribute to this sparse field of research, in this paper
we exploit users' interactions along three data sources (marketplace, social
network and location-based) to assess their performance in a barely studied
domain: recommending products and domains of interests (i.e., product
categories) to people in an online marketplace environment. To that end we
defined sets of content- and network-based user similarity features for each
data source and studied them isolated using an user-based Collaborative
Filtering (CF) approach and in combination via a hybrid recommender algorithm,
to assess which one provides the best recommendation performance.
Interestingly, in our experiments conducted on a rich dataset collected from
SecondLife, a popular online virtual world, we found that recommenders relying
on user similarity features obtained from the social network data clearly
yielded the best results in terms of accuracy in case of predicting products,
whereas the features obtained from the marketplace and location-based data
sources also obtained very good results in case of predicting categories. This
finding indicates that all three types of data sources are important and should
be taken into account depending on the level of specialization of the
recommendation task.Comment: 20 pages book chapte
From E-MAPs to module maps: dissecting quantitative genetic interactions using physical interactions
Recent technological breakthroughs allow the quantification of hundreds of thousands of genetic interactions (GIs) in Saccharomyces cerevisiae. The interpretation of these data is often difficult, but it can be improved by the joint analysis of GIs along with complementary data types. Here, we describe a novel methodology that integrates genetic and physical interaction data. We use our method to identify a collection of functional modules related to chromosomal biology and to investigate the relations among them. We show how the resulting map of modules provides clues for the elucidation of function both at the level of individual genes and at the level of functional modules
Gaze direction when driving after dark on main and residential roads: Where is the dominant location?
CIE JTC-1 has requested data regarding the size and shape of the distribution of driversâ eye movement in order to characterise their visual adaptation. This article reports the eye movement of drivers along two routes in Berlin after dark, a main road and a residential street, captured using eye tracking. It was found that viewing behaviour differed between the two types of road. On the main road eye movement was clustered within a circle of approximately 10° diameter, centred at the horizon of the lane. On the residential street eye movement is clustered slightly (3.8°) towards the near side; eye movements were best captured with either an ellipse of approximate axes 10° vertical and 20° horizontal, centred on the lane ahead, or a 10° circle centred 3.8° towards the near side. These distributions reflect a driverâs tendency to look towards locations of anticipated hazards
Correlation-Adjusted Regression Survival Scores for High-Dimensional Variable Selection
Background The development of classification methods for personalized medicine is highly dependent on the identification of predictive genetic markers. In survival analysis it is often necessary to discriminate between influential and non-influential markers. It is common to perform univariate screening using Cox scores, which quantify the associations between survival and each of the markers to provide a ranking. Since Cox scores do not account for dependencies between the markers, their use is suboptimal in the presence highly correlated markers. Methods As an alternative to the Cox score, we propose the correlation-adjusted regression survival (CARS) score for right-censored survival outcomes. By removing the correlations between the markers, the CARS score quantifies the associations between the outcome and the set of âde-correlatedâ marker values. Estimation of the scores is based on inverse probability weighting, which is applied to log-transformed event times. For high-dimensional data, estimation is based on shrinkage techniques. Results The consistency of the CARS score is proven under mild regularity conditions. In simulations with high correlations, survival models based on CARS score rankings achieved higher areas under the precision-recall curve than competing methods. Two example applications on prostate and breast cancer confirmed these results. CARS scores are implemented in the R package carSurv. Conclusions In research applications involving high-dimensional genetic data, the use of CARS scores for marker selection is a favorable alternative to Cox scores even when correlations between covariates are low. Having a straightforward interpretation and low computational requirements, CARS scores are an easy-to-use screening tool in personalized medicine research.This research was supported by the Deutsche Forschungsgemeinschaft (Project SCHM 2966/1-2), Wellcome Trust and the Royal Society (Grant Number 204623/Z/16/Z) and the UK Medical Research Council (Grant Number MC_UU_00002/7
Distance matters! Cumulative proximity expansions for ranking documents
In the information retrieval process, functions that rank documents according to their estimated relevance to a query typically regard query terms as being independent. However, it is often the joint presence of query terms that is of interest to the user, which is overlooked when matching independent terms. One feature that can be used to express the relatedness of co-occurring terms is their proximity in text. In past research, models that are trained on the proximity information in a collection have performed better than models that are not estimated on data. We analyzed how co-occurring query terms can be used to estimate the relevance of documents based on their distance in text, which is used to extend a unigram ranking function with a proximity model that accumulates the scores of all occurring term combinations. This proximity model is more practical than existing models, since it does not require any co-occurrence statistics, it obviates the need to tune additional parameters, and has a retrieval speed close to competing models. We show that this approach is more robust than existing models, on both Web and newswire corpora, and on average performs equal or better than existing proximity models across collections
Learning Pretopological Spaces for Lexical Taxonomy Acquisition
International audienceIn this paper, we propose a new methodology for semi-supervised acquisition of lexical taxonomies from a list of existing terms. Our approach is based on the theory of pretopology that offers a powerful formalism to model semantic relations and transform a list of terms into a structured term space by combining different discriminant criteria. In order to learn a parameterized pretopological space, we define the Learning Pretopological Spaces strategy based on genetic algorithms. The rare but accurate pieces of knowledge given by an expert (semi-supervision) or automatically extracted with existing linguistic patterns (auto-supervision) are used to parameterize the different features defining the pretopological term space. Then, a structuring algorithm is used to transform the pretopological space into a lexical taxonomy, i.e. a direct acyclic graph. Results over three standard datasets (two from WordNet and one from UMLS) evidence improved performances against existing associative and pattern-based state-of-the-art approaches
Semantic distillation: a method for clustering objects by their contextual specificity
Techniques for data-mining, latent semantic analysis, contextual search of
databases, etc. have long ago been developed by computer scientists working on
information retrieval (IR). Experimental scientists, from all disciplines,
having to analyse large collections of raw experimental data (astronomical,
physical, biological, etc.) have developed powerful methods for their
statistical analysis and for clustering, categorising, and classifying objects.
Finally, physicists have developed a theory of quantum measurement, unifying
the logical, algebraic, and probabilistic aspects of queries into a single
formalism. The purpose of this paper is twofold: first to show that when
formulated at an abstract level, problems from IR, from statistical data
analysis, and from physical measurement theories are very similar and hence can
profitably be cross-fertilised, and, secondly, to propose a novel method of
fuzzy hierarchical clustering, termed \textit{semantic distillation} --
strongly inspired from the theory of quantum measurement --, we developed to
analyse raw data coming from various types of experiments on DNA arrays. We
illustrate the method by analysing DNA arrays experiments and clustering the
genes of the array according to their specificity.Comment: Accepted for publication in Studies in Computational Intelligence,
Springer-Verla
A Compromise between Neutrino Masses and Collider Signatures in the Type-II Seesaw Model
A natural extension of the standard gauge
model to accommodate massive neutrinos is to introduce one Higgs triplet and
three right-handed Majorana neutrinos, leading to a neutrino mass
matrix which contains three sub-matrices ,
and . We show that three light Majorana neutrinos (i.e., the mass
eigenstates of , and ) are exactly massless in this
model, if and only if
exactly holds. This no-go theorem implies that small but non-vanishing neutrino
masses may result from a significant but incomplete cancellation between
and terms in the Type-II
seesaw formula, provided three right-handed Majorana neutrinos are of TeV and experimentally detectable at the LHC. We propose three simple
Type-II seesaw scenarios with the flavor symmetry to
interpret the observed neutrino mass spectrum and neutrino mixing pattern. Such
a TeV-scale neutrino model can be tested in two complementary ways: (1)
searching for possible collider signatures of lepton number violation induced
by the right-handed Majorana neutrinos and doubly-charged Higgs particles; and
(2) searching for possible consequences of unitarity violation of the neutrino mixing matrix in the future long-baseline neutrino oscillation
experiments.Comment: RevTeX 19 pages, no figure
- âŠ