6,880 research outputs found
Distributional Sentence Entailment Using Density Matrices
Categorical compositional distributional model of Coecke et al. (2010)
suggests a way to combine grammatical composition of the formal, type logical
models with the corpus based, empirical word representations of distributional
semantics. This paper contributes to the project by expanding the model to also
capture entailment relations. This is achieved by extending the representations
of words from points in meaning space to density operators, which are
probability distributions on the subspaces of the space. A symmetric measure of
similarity and an asymmetric measure of entailment is defined, where lexical
entailment is measured using von Neumann entropy, the quantum variant of
Kullback-Leibler divergence. Lexical entailment, combined with the composition
map on word representations, provides a method to obtain entailment relations
on the level of sentences. Truth theoretic and corpus-based examples are
provided.Comment: 11 page
On the spatial modelling of mixed and constrained geospatial data
Spatial uncertainty modelling and prediction of a set of regionalized dependent variables from various sample spaces (e.g. continuous and categorical) is a common challenge for geoscience modellers and many geoscience applications such as evaluation of mineral resources, characterization of oil reservoirs or hydrology of groundwater. To consider the complex statistical and spatial relationships, categorical data such as rock types, soil types, alteration units, and continental crustal blocks should be modelled jointly with other continuous attributes (e.g. porosity, permeability, seismic velocity, mineral and geochemical compositions or pollutant concentration). These multivariate geospatial data normally have complex statistical and spatial relationships which should be honoured in the predicted models.
Continuous variables in the form of percentages, proportions, frequencies, and concentrations are compositional which means they are non-negative values representing some parts of a whole. Such data carry just relative information and the constant sum constraint forces at least one covariance to be negative and induces spurious statistical and spatial correlations. As a result, classical (geo)statistical techniques should not be implemented on the original compositional data. Several geostatistical techniques have been developed recently for the spatial modelling of compositional data. However, few of these consider the joint statistical and/or spatial relationships of regionalized compositional data with the other dependent categorical information.
This PhD thesis explores and introduces approaches to spatial modelling of regionalized compositional and categorical data. The first proposed approach is in the multiple-point geostatistics framework, where the direct sampling algorithm is developed for joint simulation of compositional and categorical data. The second proposed method is based on two-point geostatistics and is useful for the situation where a large and representative training image is not available or difficult to build. Approaches to geostatistical simulation of regionalized compositions consisting of several populations are explored and investigated. The multi-population characteristic is usually related to a dependent categorical variable (e.g. rock type, soil type, and land use). Finally, a hybrid predictive model based on the advanced geostatistical simulation techniques for compositional data and machine learning is introduced. Such a hybrid model has the ability to rank and select features internally, which is useful for geoscience process discovery analysis.
The proposed techniques were evaluated via several case studies and results supported their usefulness and applicability
Recommended from our members
The Computational Diet: A Review of Computational Methods Across Diet, Microbiome, and Health.
Food and human health are inextricably linked. As such, revolutionary impacts on health have been derived from advances in the production and distribution of food relating to food safety and fortification with micronutrients. During the past two decades, it has become apparent that the human microbiome has the potential to modulate health, including in ways that may be related to diet and the composition of specific foods. Despite the excitement and potential surrounding this area, the complexity of the gut microbiome, the chemical composition of food, and their interplay in situ remains a daunting task to fully understand. However, recent advances in high-throughput sequencing, metabolomics profiling, compositional analysis of food, and the emergence of electronic health records provide new sources of data that can contribute to addressing this challenge. Computational science will play an essential role in this effort as it will provide the foundation to integrate these data layers and derive insights capable of revealing and understanding the complex interactions between diet, gut microbiome, and health. Here, we review the current knowledge on diet-health-gut microbiota, relevant data sources, bioinformatics tools, machine learning capabilities, as well as the intellectual property and legislative regulatory landscape. We provide guidance on employing machine learning and data analytics, identify gaps in current methods, and describe new scenarios to be unlocked in the next few years in the context of current knowledge
Unifying Amplitude and Phase Analysis: A Compositional Data Approach to Functional Multivariate Mixed-Effects Modeling of Mandarin Chinese
Mandarin Chinese is characterized by being a tonal language; the pitch (or
) of its utterances carries considerable linguistic information. However,
speech samples from different individuals are subject to changes in amplitude
and phase which must be accounted for in any analysis which attempts to provide
a linguistically meaningful description of the language. A joint model for
amplitude, phase and duration is presented which combines elements from
Functional Data Analysis, Compositional Data Analysis and Linear Mixed Effects
Models. By decomposing functions via a functional principal component analysis,
and connecting registration functions to compositional data analysis, a joint
multivariate mixed effect model can be formulated which gives insights into the
relationship between the different modes of variation as well as their
dependence on linguistic and non-linguistic covariates. The model is applied to
the COSPRO-1 data set, a comprehensive database of spoken Taiwanese Mandarin,
containing approximately 50 thousand phonetically diverse sample contours
(syllables), and reveals that phonetic information is jointly carried by both
amplitude and phase variation.Comment: 49 pages, 13 figures, small changes to discussio
De-linearizing Linearity: Projective Quantum Axiomatics from Strong Compact Closure
Elaborating on our joint work with Abramsky in quant-ph/0402130 we further
unravel the linear structure of Hilbert spaces into several constituents. Some
prove to be very crucial for particular features of quantum theory while others
obstruct the passage to a formalism which is not saturated with physically
insignificant global phases.
First we show that the bulk of the required linear structure is purely
multiplicative, and arises from the strongly compact closed tensor which,
besides providing a variety of notions such as scalars, trace, unitarity,
self-adjointness and bipartite projectors, also provides Hilbert-Schmidt norm,
Hilbert-Schmidt inner-product, and in particular, the preparation-state
agreement axiom which enables the passage from a formalism of the vector space
kind to a rather projective one, as it was intended in the (in)famous Birkhoff
& von Neumann paper.
Next we consider additive types which distribute over the tensor, from which
measurements can be build, and the correctness proofs of the protocols
discussed in quant-ph/0402130 carry over to the resulting weaker setting. A
full probabilistic calculus is obtained when the trace is moreover linear and
satisfies the \em diagonal axiom, which brings us to a second main result,
characterization of the necessary and sufficient additive structure of a both
qualitatively and quantitatively effective categorical quantum formalism
without redundant global phases. Along the way we show that if in a category a
(additive) monoidal tensor distributes over a strongly compact closed tensor,
then this category is always enriched in commutative monoids.Comment: Essential simplification of the definitions of orthostructure and
ortho-Bornian structure: the key new insights is captured by the definitions
in terms of commutative diagrams on pages 13 and 14, which state that if in a
category a (additive) monoidal tensor distributes over a strongly compact
closed tensor, then this category is always enriched in commutative monoid
An interactive semantics of logic programming
We apply to logic programming some recently emerging ideas from the field of
reduction-based communicating systems, with the aim of giving evidence of the
hidden interactions and the coordination mechanisms that rule the operational
machinery of such a programming paradigm. The semantic framework we have chosen
for presenting our results is tile logic, which has the advantage of allowing a
uniform treatment of goals and observations and of applying abstract
categorical tools for proving the results. As main contributions, we mention
the finitary presentation of abstract unification, and a concurrent and
coordinated abstract semantics consistent with the most common semantics of
logic programming. Moreover, the compositionality of the tile semantics is
guaranteed by standard results, as it reduces to check that the tile systems
associated to logic programs enjoy the tile decomposition property. An
extension of the approach for handling constraint systems is also discussed.Comment: 42 pages, 24 figure, 3 tables, to appear in the CUP journal of Theory
and Practice of Logic Programmin
Hydrological characterization of watersheds in the Blue Nile Basin, Ethiopia
Thirty-two watersheds (31–4350 km2), in the Blue Nile Basin, Ethiopia, were hydrologically characterized with data from a study of water and land resources by the US Department of Interior, Bureau of Reclamation (USBR) published in 1964. The USBR document contains data on flow, topography, geology, soil type, and land use for the period 1959 to 1963. The aim of the study was to identify watershed variables best explaining the variation in the hydrological regime, with a special focus on low flows. Moreover, this study aimed to identify variables that may be susceptible to management policies for developing and securing water resources in dry periods. Principal Component Analysis (PCA) and Partial Least Square (PLS) were used to analyze the relationship between five hydrologic response variables (total flow, high flow, low flow, runoff coefficient, low flow index) and 30 potential explanatory watershed variables. The explanatory watershed variables were classified into three groups: land use, climate and topography as well as geology and soil type. Each of the three groups had almost equal influence on the variation in hydrologic variables (R2 values ranging from 0.3 to 0.4). Specific variables from within each of the three groups of explanatory variables were better in explaining the variation. Low flow and low flow index were positively correlated to land use types woodland, dense wet forest and savannah grassland, whereas grazing land and bush land were negatively correlated. We concluded that extra care for preserving low flow should be taken on tuffs/basalts which comprise 52% of the Blue Nile Basin. Land use management plans should recognize that woodland, dense wet forest and savannah grassland can promote higher low flows, while grazing land diminishes low flows
- …