2,427 research outputs found

    Simple Alcohols with the Lowest Normal Boiling Point Using Topological Indices

    Full text link
    We find simple saturated alcohols with the given number of carbon atoms and the minimal normal boiling point. The boiling point is predicted with a weighted sum of the generalized first Zagreb index, the second Zagreb index, the Wiener index for vertex-weighted graphs, and a simple index caring for the degree of a carbon atom being incident to the hydroxyl group. To find extremal alcohol molecules we characterize chemical trees of order nn, which minimize the sum of the second Zagreb index and the generalized first Zagreb index, and also build chemical trees, which minimize the Wiener index over all chemical trees with given vertex weights.Comment: 22 pages, 5 figures, accepted in 2014 by MATCH Commun. Math. Comput. Che

    Principal Polynomial Analysis

    Full text link
    This paper presents a new framework for manifold learning based on a sequence of principal polynomials that capture the possibly nonlinear nature of the data. The proposed Principal Polynomial Analysis (PPA) generalizes PCA by modeling the directions of maximal variance by means of curves, instead of straight lines. Contrarily to previous approaches, PPA reduces to performing simple univariate regressions, which makes it computationally feasible and robust. Moreover, PPA shows a number of interesting analytical properties. First, PPA is a volume-preserving map, which in turn guarantees the existence of the inverse. Second, such an inverse can be obtained in closed form. Invertibility is an important advantage over other learning methods, because it permits to understand the identified features in the input domain where the data has physical meaning. Moreover, it allows to evaluate the performance of dimensionality reduction in sensible (input-domain) units. Volume preservation also allows an easy computation of information theoretic quantities, such as the reduction in multi-information after the transform. Third, the analytical nature of PPA leads to a clear geometrical interpretation of the manifold: it allows the computation of Frenet-Serret frames (local features) and of generalized curvatures at any point of the space. And fourth, the analytical Jacobian allows the computation of the metric induced by the data, thus generalizing the Mahalanobis distance. These properties are demonstrated theoretically and illustrated experimentally. The performance of PPA is evaluated in dimensionality and redundancy reduction, in both synthetic and real datasets from the UCI repository

    Spatial and Temporal Diffusion of House Prices in the UK

    Get PDF
    This paper provides a method for the analysis of the spatial and temporal diffusion of shocks in a dynamic system. We use changes in real house prices within the UK economy at the level of regions to illustrate its use. Adjustment to shocks involves both a region specific and a spatial effect. Shocks to a dominant region - London - are propagated contemporaneously and spatially to other regions. They in turn impact on other regions with a delay. We allow for lagged effects to echo back to the dominant region. London in turn is influenced by international developments through its link to New York and other financial centers. It is shown that New York house prices have a direct effect on London house prices. We analyse the effect of shocks using generalised spatio-temporal impulse responses. These highlight the diffusion of shocks both over time (as with the conventional impulse responses) and over space.House Prices, Cross Sectional Dependence, Spatial Dependence

    Compositional data for global monitoring: the case of drinking water and sanitation

    Get PDF
    Introduction At a global level, access to safe drinking water and sanitation has been monitored by the Joint Monitoring Programme (JMP) of WHO and UNICEF. The methods employed are based on analysis of data from household surveys and linear regression modelling of these results over time. However, there is evidence of non-linearity in the JMP data. In addition, the compositional nature of these data is not taken into consideration. This article seeks to address these two previous shortcomings in order to produce more accurate estimates. Methods We employed an isometric log-ratio transformation designed for compositional data. We applied linear and non-linear time regressions to both the original and the transformed data. Specifically, different modelling alternatives for non-linear trajectories were analysed, all of which are based on a generalized additive model (GAM). Results and discussion Non-linear methods, such as GAM, may be used for modelling non-linear trajectories in the JMP data. This projection method is particularly suited for data-rich countries. Moreover, the ilr transformation of compositional data is conceptually sound and fairly simple to implement. It helps improve the performance of both linear and non-linear regression models, specifically in the occurrence of extreme data points, i.e. when coverage rates are near either 0% or 100%.Peer ReviewedPostprint (author's final draft

    The influential effect of blending, bump, changing period and eclipsing Cepheids on the Leavitt law

    Full text link
    The investigation of the non-linearity of the Leavitt law is a topic that began more than seven decades ago, when some of the studies in this field found that the Leavitt law has a break at about ten days. The goal of this work is to investigate a possible statistical cause of this non-linearity. By applying linear regressions to OGLE-II and OGLE-IV data, we find that, in order to obtain the Leavitt law by using linear regression, robust techniques to deal with influential points and/or outliers are needed instead of the ordinary least-squares regression traditionally used. In particular, by using MM- and MMMM-regressions we establish firmly and without doubts the linearity of the Leavitt law in the Large Magellanic Cloud, without rejecting or excluding Cepheid data from the analysis. This implies that light curves of Cepheids suggesting blending, bumps, eclipses or period changes, do not affect the Leavitt law for this galaxy. For the SMC, including this kind of Cepheids, it is not possible to find an adequate model, probably due to the geometry of the galaxy. In that case, a possible influence of these stars could exist.Comment: 47 pages, 1 figure, 5 tables. Accepted for publication in Ap

    Spatial and Temporal Diffusion of House Prices in the UK

    Get PDF
    This paper provides a method for the analysis of the spatial and temporal diffusion of shocks in a dynamic system. We use changes in real house prices within the UK economy at the level of regions to illustrate its use. Adjustment to shocks involves both a region specific and a spatial effect. Shocks to a dominant region - London - are propagated contemporaneously and spatially to other regions. They in turn impact on other regions with a delay. We allow for lagged effects to echo back to the dominant region. London in turn is influenced by international developments through its link to New York and other financial centers. It is shown that New York house prices have a direct effect on London house prices. We analyse the effect of shocks using generalised spatio-temporal impulse responses. These highlight the diffusion of shocks both over time (as with the conventional impulse responses) and over space.house prices, cross sectional dependence, spatial dependence

    A more accurate measurement of the 28^{28}Si lattice parameter

    Full text link
    In 2011, a discrepancy between the values of the Planck constant measured by counting Si atoms and by comparing mechanical and electrical powers prompted a review, among others, of the measurement of the spacing of 28^{28}Si {220} lattice planes, either to confirm the measured value and its uncertainty or to identify errors. This exercise confirmed the result of the previous measurement and yields the additional value d220=192014711.98(34)d_{220}=192014711.98(34) am having a reduced uncertainty.Comment: 12 pages, 17 figure, 1 table submitted to J Phys Chem Ref Dat

    SISSO: a compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates

    Full text link
    The lack of reliable methods for identifying descriptors - the sets of parameters capturing the underlying mechanisms of a materials property - is one of the key factors hindering efficient materials development. Here, we propose a systematic approach for discovering descriptors for materials properties, within the framework of compressed-sensing based dimensionality reduction. SISSO (sure independence screening and sparsifying operator) tackles immense and correlated features spaces, and converges to the optimal solution from a combination of features relevant to the materials' property of interest. In addition, SISSO gives stable results also with small training sets. The methodology is benchmarked with the quantitative prediction of the ground-state enthalpies of octet binary materials (using ab initio data) and applied to the showcase example of predicting the metal/insulator classification of binaries (with experimental data). Accurate, predictive models are found in both cases. For the metal-insulator classification model, the predictive capability are tested beyond the training data: It rediscovers the available pressure-induced insulator->metal transitions and it allows for the prediction of yet unknown transition candidates, ripe for experimental validation. As a step forward with respect to previous model-identification methods, SISSO can become an effective tool for automatic materials development.Comment: 11 pages, 5 figures, in press in Phys. Rev. Material

    When Does More Regularization Imply Fewer Degrees of Freedom? Sufficient Conditions and Counter Examples from Lasso and Ridge Regression

    Full text link
    Regularization aims to improve prediction performance of a given statistical modeling approach by moving to a second approach which achieves worse training error but is expected to have fewer degrees of freedom, i.e., better agreement between training and prediction error. We show here, however, that this expected behavior does not hold in general. In fact, counter examples are given that show regularization can increase the degrees of freedom in simple situations, including lasso and ridge regression, which are the most common regularization approaches in use. In such situations, the regularization increases both training error and degrees of freedom, and is thus inherently without merit. On the other hand, two important regularization scenarios are described where the expected reduction in degrees of freedom is indeed guaranteed: (a) all symmetric linear smoothers, and (b) linear regression versus convex constrained linear regression (as in the constrained variant of ridge regression and lasso).Comment: Main text: 15 pages, 2 figures; Supplementary material is included at the end of the main text: 9 pages, 7 figure

    Stochastic Time-Domain Mapping for Comprehensive Uncertainty Assessment in Eye Diagrams

    Get PDF
    The eye diagram is one of the most common tools used for quality assessment in high-speed links. This article proposes a method of predicting the shape of the inner eye for a link subject to uncertainties. The approach relies on machine learning regression and is tested on the very challenging example of flexible link for smart textiles. Several sources of uncertainties are taken into account related to both manufacturing tolerances and physical deformation. The resulting model is fast and accurate. It is also extremely versatile: rather than focusing on a specific metric derived from the eye diagram, its aim is to fully reconstruct the inner eye and enable designers to use it as they see fit. This article investigates the features and convergence of three alternative machine learning algorithms, including the single-output support vector machine regression, together with its least squares variant, and the vector-valued kernel ridge regression. The latter method is arguably the most promising, resulting in an accurate, fast and robust tool enabling a complete parametric stochastic map of the eye
    • …
    corecore