2,427 research outputs found
Simple Alcohols with the Lowest Normal Boiling Point Using Topological Indices
We find simple saturated alcohols with the given number of carbon atoms and
the minimal normal boiling point. The boiling point is predicted with a
weighted sum of the generalized first Zagreb index, the second Zagreb index,
the Wiener index for vertex-weighted graphs, and a simple index caring for the
degree of a carbon atom being incident to the hydroxyl group. To find extremal
alcohol molecules we characterize chemical trees of order , which minimize
the sum of the second Zagreb index and the generalized first Zagreb index, and
also build chemical trees, which minimize the Wiener index over all chemical
trees with given vertex weights.Comment: 22 pages, 5 figures, accepted in 2014 by MATCH Commun. Math. Comput.
Che
Principal Polynomial Analysis
This paper presents a new framework for manifold learning based on a sequence
of principal polynomials that capture the possibly nonlinear nature of the
data. The proposed Principal Polynomial Analysis (PPA) generalizes PCA by
modeling the directions of maximal variance by means of curves, instead of
straight lines. Contrarily to previous approaches, PPA reduces to performing
simple univariate regressions, which makes it computationally feasible and
robust. Moreover, PPA shows a number of interesting analytical properties.
First, PPA is a volume-preserving map, which in turn guarantees the existence
of the inverse. Second, such an inverse can be obtained in closed form.
Invertibility is an important advantage over other learning methods, because it
permits to understand the identified features in the input domain where the
data has physical meaning. Moreover, it allows to evaluate the performance of
dimensionality reduction in sensible (input-domain) units. Volume preservation
also allows an easy computation of information theoretic quantities, such as
the reduction in multi-information after the transform. Third, the analytical
nature of PPA leads to a clear geometrical interpretation of the manifold: it
allows the computation of Frenet-Serret frames (local features) and of
generalized curvatures at any point of the space. And fourth, the analytical
Jacobian allows the computation of the metric induced by the data, thus
generalizing the Mahalanobis distance. These properties are demonstrated
theoretically and illustrated experimentally. The performance of PPA is
evaluated in dimensionality and redundancy reduction, in both synthetic and
real datasets from the UCI repository
Spatial and Temporal Diffusion of House Prices in the UK
This paper provides a method for the analysis of the spatial and temporal diffusion of shocks in a dynamic system. We use changes in real house prices within the UK economy at the level of regions to illustrate its use. Adjustment to shocks involves both a region specific and a spatial effect. Shocks to a dominant region - London - are propagated contemporaneously and spatially to other regions. They in turn impact on other regions with a delay. We allow for lagged effects to echo back to the dominant region. London in turn is influenced by international developments through its link to New York and other financial centers. It is shown that New York house prices have a direct effect on London house prices. We analyse the effect of shocks using generalised spatio-temporal impulse responses. These highlight the diffusion of shocks both over time (as with the conventional impulse responses) and over space.House Prices, Cross Sectional Dependence, Spatial Dependence
Compositional data for global monitoring: the case of drinking water and sanitation
Introduction
At a global level, access to safe drinking water and sanitation has been monitored by the Joint Monitoring Programme (JMP) of WHO and UNICEF. The methods employed are based on analysis of data from household surveys and linear regression modelling of these results over time. However, there is evidence of non-linearity in the JMP data. In addition, the compositional nature of these data is not taken into consideration. This article seeks to address these two previous shortcomings in order to produce more accurate estimates.
Methods
We employed an isometric log-ratio transformation designed for compositional data. We applied linear and non-linear time regressions to both the original and the transformed data. Specifically, different modelling alternatives for non-linear trajectories were analysed, all of which are based on a generalized additive model (GAM).
Results and discussion
Non-linear methods, such as GAM, may be used for modelling non-linear trajectories in the JMP data. This projection method is particularly suited for data-rich countries. Moreover, the ilr transformation of compositional data is conceptually sound and fairly simple to implement. It helps improve the performance of both linear and non-linear regression models, specifically in the occurrence of extreme data points, i.e. when coverage rates are near either 0% or 100%.Peer ReviewedPostprint (author's final draft
The influential effect of blending, bump, changing period and eclipsing Cepheids on the Leavitt law
The investigation of the non-linearity of the Leavitt law is a topic that
began more than seven decades ago, when some of the studies in this field found
that the Leavitt law has a break at about ten days. The goal of this work is to
investigate a possible statistical cause of this non-linearity. By applying
linear regressions to OGLE-II and OGLE-IV data, we find that, in order to
obtain the Leavitt law by using linear regression, robust techniques to deal
with influential points and/or outliers are needed instead of the ordinary
least-squares regression traditionally used. In particular, by using - and
-regressions we establish firmly and without doubts the linearity of the
Leavitt law in the Large Magellanic Cloud, without rejecting or excluding
Cepheid data from the analysis. This implies that light curves of Cepheids
suggesting blending, bumps, eclipses or period changes, do not affect the
Leavitt law for this galaxy. For the SMC, including this kind of Cepheids, it
is not possible to find an adequate model, probably due to the geometry of the
galaxy. In that case, a possible influence of these stars could exist.Comment: 47 pages, 1 figure, 5 tables. Accepted for publication in Ap
Spatial and Temporal Diffusion of House Prices in the UK
This paper provides a method for the analysis of the spatial and temporal diffusion of shocks in a dynamic system. We use changes in real house prices within the UK economy at the level of regions to illustrate its use. Adjustment to shocks involves both a region specific and a spatial effect. Shocks to a dominant region - London - are propagated contemporaneously and spatially to other regions. They in turn impact on other regions with a delay. We allow for lagged effects to echo back to the dominant region. London in turn is influenced by international developments through its link to New York and other financial centers. It is shown that New York house prices have a direct effect on London house prices. We analyse the effect of shocks using generalised spatio-temporal impulse responses. These highlight the diffusion of shocks both over time (as with the conventional impulse responses) and over space.house prices, cross sectional dependence, spatial dependence
A more accurate measurement of the Si lattice parameter
In 2011, a discrepancy between the values of the Planck constant measured by
counting Si atoms and by comparing mechanical and electrical powers prompted a
review, among others, of the measurement of the spacing of Si {220}
lattice planes, either to confirm the measured value and its uncertainty or to
identify errors. This exercise confirmed the result of the previous measurement
and yields the additional value am having a reduced
uncertainty.Comment: 12 pages, 17 figure, 1 table submitted to J Phys Chem Ref Dat
SISSO: a compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates
The lack of reliable methods for identifying descriptors - the sets of
parameters capturing the underlying mechanisms of a materials property - is one
of the key factors hindering efficient materials development. Here, we propose
a systematic approach for discovering descriptors for materials properties,
within the framework of compressed-sensing based dimensionality reduction.
SISSO (sure independence screening and sparsifying operator) tackles immense
and correlated features spaces, and converges to the optimal solution from a
combination of features relevant to the materials' property of interest. In
addition, SISSO gives stable results also with small training sets. The
methodology is benchmarked with the quantitative prediction of the ground-state
enthalpies of octet binary materials (using ab initio data) and applied to the
showcase example of predicting the metal/insulator classification of binaries
(with experimental data). Accurate, predictive models are found in both cases.
For the metal-insulator classification model, the predictive capability are
tested beyond the training data: It rediscovers the available pressure-induced
insulator->metal transitions and it allows for the prediction of yet unknown
transition candidates, ripe for experimental validation. As a step forward with
respect to previous model-identification methods, SISSO can become an effective
tool for automatic materials development.Comment: 11 pages, 5 figures, in press in Phys. Rev. Material
When Does More Regularization Imply Fewer Degrees of Freedom? Sufficient Conditions and Counter Examples from Lasso and Ridge Regression
Regularization aims to improve prediction performance of a given statistical
modeling approach by moving to a second approach which achieves worse training
error but is expected to have fewer degrees of freedom, i.e., better agreement
between training and prediction error. We show here, however, that this
expected behavior does not hold in general. In fact, counter examples are given
that show regularization can increase the degrees of freedom in simple
situations, including lasso and ridge regression, which are the most common
regularization approaches in use. In such situations, the regularization
increases both training error and degrees of freedom, and is thus inherently
without merit. On the other hand, two important regularization scenarios are
described where the expected reduction in degrees of freedom is indeed
guaranteed: (a) all symmetric linear smoothers, and (b) linear regression
versus convex constrained linear regression (as in the constrained variant of
ridge regression and lasso).Comment: Main text: 15 pages, 2 figures; Supplementary material is included at
the end of the main text: 9 pages, 7 figure
Stochastic Time-Domain Mapping for Comprehensive Uncertainty Assessment in Eye Diagrams
The eye diagram is one of the most common tools used for quality assessment in high-speed links. This article proposes a method of predicting the shape of the inner eye for a link subject to uncertainties. The approach relies on machine learning regression and is tested on the very challenging example of flexible link for smart textiles. Several sources of uncertainties are taken into account related to both manufacturing tolerances and physical deformation. The resulting model is fast and accurate. It is also extremely versatile: rather than focusing on a specific metric derived from the eye diagram, its aim is to fully reconstruct the inner eye and enable designers to use it as they see fit. This article investigates the features and convergence of three alternative machine learning algorithms, including the single-output support vector machine regression, together with its least squares variant, and the vector-valued kernel ridge regression. The latter method is arguably the most promising, resulting in an accurate, fast and robust tool enabling a complete parametric stochastic map of the eye
- …