97 research outputs found
GAMs and functional kriging for air quality data
Data having spatio-temporal structure are often observed in environmental sciences. They may be considered as discrete observations from curves along time and/or space and treated as functional. Generalized Additive Models (GAMs)
represent a useful tool for modelling, for example, as pollutant concentrations describing their spatial and/or temporal trends.Usually, the prediction of a curve at an unmonitored site is necessary and, with this aim, we extend kriging for functional data to a multivariate context. Moreover, even if we are interested only in predicting a single pollutant, such as PM10, the estimation can be improved exploiting its correlation with the other pollutants. Cross validation is used to test the performance of the proposed procedure
Ranking coherence in Topic Models using Statistically Validated Networks
Probabilistic topic models have become one of the most widespread
machine learning techniques in textual analysis. Topic discovering is
an unsupervised process that does not guarantee the interpretability
of its output. Hence, the automatic evaluation of topic coherence
has attracted the interest of many researchers over the last decade,
and it is an open research area. The present article offers a new
quality evaluation method based on Statistically Validated Networks
(SVNs). The proposed probabilistic approach consists of representing
each topic as a weighted network of its most probable words. The
presence of a link between each pair of words is assessed by
statistically validating their co-occurrence in sentences against the null
hypothesis of random co-occurrence. The proposed method allows one
to distinguish between high-quality and low-quality topics, by making
use of a battery of statistical tests. The statistically significant pairwise
associations of words represented by the links in the SVN might
reasonably be expected to be strictly related to the semantic coherence
and interpretability of a topic. Therefore, the more connected the
network, the more coherent the topic in question. We demonstrate the
effectiveness of the method through an analysis of a real text corpus,
which shows that the proposed measure is more correlated with human
judgement than the state-of-the-art coherence measures
Element weighted Kemeny distance for ranking data
Preference data are a particular type of ranking data that arise when n individuals express their preferences over a finite set of items. Within this framework, the main issue concerns the aggregation of the preferences to identify a compromise or a “consensus”, defined as the closest ranking (i.e. with the minimum distance or maximum correlation) to the whole set of preferences. Many approaches have been proposed, but they are not sensitive to the importance of items: i.e. changing the rank of a highly-relevant element should result in a higher penalty than changing the rank of a negligible one. The goal of this paper is to investigate the consensus between rankings taking into account the importance of items (element weights). For this purpose, we present: i) an element weighted rank correlation coefficient tau_ew as an extension of the Emond and Mason’s tau, and ii) an element weighted rank distance d_ew as an extension of the Kemeny distance d. The one-to-one correspondence between the weighted distance and the rank correlation coefficient is analytically proved. Moreover, a procedure to obtain the consensus ranking among n individuals is described and its performance is studied both by simulation and by the application to real datasets
Clustering alternatives in preference-approvals via novel pseudometrics
Preference-approval structures combine preference rankings and approval voting for
declaring opinions over a set of alternatives. In this paper, we propose a new procedure
for clustering alternatives in order to reduce the complexity of the preferenceapproval
space and provide a more accessible interpretation of data. To that end,
we present a new family of pseudometrics on the set of alternatives that take into
account voters’ preferences via preference-approvals. To obtain clusters, we use the
Ranked k-medoids (RKM) partitioning algorithm, which takes as input the similarities
between pairs of alternatives based on the proposed pseudometrics. Finally,
using non-metric multidimensional scaling, clusters are represented in 2-dimensional
space
Nitrogen uptake and nitrogen fertilizer recovery in old and modern wheat genotypes grown in the presence or absence of interspecific competition
Choosing genotypes with a high capacity for taking up nitrogen (N) from the soil and the ability to efficiently compete with weeds for this nutrient is essential to increasing the sustainability of cropping systems that are less dependent on auxiliary inputs. This research aimed to verify whether differences exist in N uptake and N fertilizer recovery capacity among wheat genotypes and, if so, whether these differences are related to a different competitive ability against weeds of wheat genotypes. To this end, 12 genotypes, varying widely in morphological traits and year of release, were grown in the presence or absence of interspecific competition (using Avena sativa L. as a surrogate weed). Isotopic tracer 15N was used to measure the fertilizer N uptake efficiencies of the wheat genotypes and weed. A field experiment, a split-plot design with four replications, was conducted during two consecutive growing seasons in a typical Mediterranean environment. In the absence of interspecific competition, few differences in either total N uptake (range: 98–112 kg N ha–1) or the 15N fertilizer recovery fraction (range: 30.0–36.7%) were observed among the wheat genotypes. The presence of competition, compared to competitor-free conditions, resulted in reductions in grain yield (49%), total N uptake (29%), and an 15N fertilizer recovery fraction (32%) that were on average markedly higher in modern varieties than in old ones. Both biomass and grain reductions were strongly related to the biomass of the competitor (correlation coefficients > 0.95), which ranged from 135 g m–2 to 573 g m–2. Variations in both grain and biomass yield due to interspecific competition were significantly correlated with percentage of soil cover and leaf area at tillering, plant height at heading, and total N uptake, thus highlighting that the ability to take up N from the soil played a certain role in determining the different competitive abilities against weed of the genotypes
- …