26 research outputs found

    Subsampling and aggregation: a solution to the scalability problem in distance based prediction for mixed-type data

    Get PDF
    The distance-based linear model (DB-LM) extends the classical linear regression to the framework of mixed-type predictors or when the only available information is a distance matrix between regressors (as it sometimes happens with big data). The main drawback of these DB methods is their computational cost, particularly due to the eigendecomposition of the Gram matrix. In this context, ensemble regression techniques provide a useful alternative to fitting the model to the whole sample. This work analyzes the performance of three subsampling and aggregation techniques in DB regression on two specific large, real datasets. We also analyze, via simulations, the performance of bagging and DB logistic regression in the classification problem with mixed-type features and large sample sizes.A. Baíllo is supported by the Spanish MCyT grant PID2019-109387GB-I00

    Visualizing Profiles of Large Datasets of Weighted and Mixed Data

    Get PDF
    This work provides a procedure with which to construct and visualize profiles, i.e., groups of individuals with similar characteristics, for weighted and mixed data by combining two classical multivariate techniques, multidimensional scaling (MDS) and the k-prototypes clustering algorithm. The well-known drawback of classical MDS in large datasets is circumvented by selecting a small random sample of the dataset, whose individuals are clustered by means of an adapted version of the k-prototypes algorithm and mapped via classical MDS. Gower’s interpolation formula is used to project remaining individuals onto the previous configuration. In all the process, Gower’s distance is used to measure the proximity between individuals. The methodology is illustrated on a real dataset, obtained from the Survey of Health, Ageing and Retirement in Europe (SHARE), which was carried out in 19 countries and represents over 124 million aged individuals in Europe. The performance of the method was evaluated through a simulation study, whose results point out that the new proposal solves the high computational cost of the classical MDS with low error.This research was funded by the Spanish Ministry of Economy and Competitiveness, grant number MTM2014-56535-R; and the V Regional Plan for Scientific Research and Technological Innovation 2016-2020 of the Community of Madrid, an agreement with Universidad Carlos III de Madrid in the action of "Excellence for University Professors.

    Smart visualization of mixed data

    Get PDF
    In this work, we propose a new protocol that integrates robust classification and visualization techniques to analyze mixed data. This protocol is based on the combination of the Forward Search Distance-Based (FS-DB) algorithm (Grané, Salini, and Verdolini 2020) and robust clustering. The resulting groups are visualized via MDS maps and characterized through an analysis of several graphical outputs. The methodology is illustrated on a real dataset related to European COVID-19 numerical health data, as well as the policy and restriction measurements of the 2020-2021 COVID-19 pandemic across the EU Member States. The results show similarities among countries in terms of incidence and the management of the emergency across several waves of the disease. With the proposed methodology, new smart visualization tools for analyzing mixed data are provided

    Outliers, GARCH-type models and risk measures: a comparison of several approaches

    Get PDF
    accounting for outlier effects by robust estimation. The main conclusions of the simulation study are that the presence of outliers bias these risk measures, being the proposal by Grane and Veiga (2010) that providing the highest bias reduction. From the out-of-sample results for four international stock market indexes we found weak evidence that more complex models (specification and error distribution) perform better in estimating the minimum capital risk requirements during the last global financial crisis. (C) 2014 Elsevier B.V. All rights reserved.Financial support from research grants MTM2010-17323, ECO2009-08100 and ECO2012-32401 (Spanish Ministries of Science and Innovation and Economy and Competitiveness)

    Visualizing health and well-being inequalities among older Europeans

    Get PDF
    Financial support from research project MTM2014-56535-R by the Spanish Ministry of Economy and Competitiveness. This paper uses data from SHARE Waves 6 (https ://doi.org/10.6103/SHARE .w6.710), see Börsch-Supan et al. (2013) for methodological details. The SHARE data collection has been funded by the European Commission through FP5 (QLK6-CT-2001-00360), FP6 (SHAREI3:RII-CT-2006-062193, COMPARE: CIT5-CT-2005-028857, SHARELIFE: CIT4-CT-2006-028812), FP7 (SHARE-PREP: GA N o211909, SHARE-LEAP: GA N o227822, SHARE M4: GA N o261982) and Horizon 2020 (SHARE-DEV3: GA N o676536, SERISS: GA N o654221) and by DG Employment, Social Affairs and Inclusion. Additional funding from the German Ministry of Education and Research, the Max Planck Society for the Advancement of Science, the U.S. National Institute on Aging (U01-AG09740-13S2, P01-AG005842, P01-AG08291, P30-AG12815, R21-AG025169, Y1-AG-4553-01, IAG-BSR06-11, OGHA-04-064, HHSN271201300071C) and from various national funding sources is gratefully acknowledged(see www.share -proje ct.org)

    Visualizing Inequality in Health and Socioeconomic Wellbeing in the EU: Findings from the SHARE Survey

    Get PDF
    This article belongs to the Special Issue The Economics of CaringThe main objective of this paper is to visualize profiles of older Europeans to better understand differing levels of dependency across Europe. Data comes from wave 6 of the Survey of Health, Ageing and Retirement in Europe (SHARE), carried out in 18 countries and representing over 124 million aged individuals in Europe. Using the information of around 30 mixed-type variables, we design four composite indices of wellbeing for each respondent: self-perception of health, physical health and nutrition, mental agility, and level of dependency. Next, by implementing the k-prototypes clustering algorithm, profiles are created by combining those indices with a collection of socio-economic and demographic variables about the respondents. Five profiles are established that segment the dataset into the least to the most individuals at risk of health and socio-economic wellbeing. The methodology we propose is wide enough to be extended to other surveys or disciplines.This research was funded by the Spanish Ministry of Economy and Competitiveness, grant number MTM2014-56535-R

    La eficacia de la reparación a la víctima en el proceso penal a través de las indemnizaciones. Un estudio de campo en la Comunidad de Madrid

    Get PDF
    Este trabajo contiene los resultados de la investigación sobre la eficacia de la ejecución de las indemnizaciones a favor de las víctimas contenidas en las sentencias penales de condena. Se ha realizado un estudio de campo de las ejecutorias en los Juzgados de lo Penal y la Audiencia Provincial de Madrid. El objetivo principal es evaluar la eficacia de la reparación económica a la víctima. El estudio de campo se realizó en dos etapas, octubre de 2015 y octubre de 2016, y la población objetivo está formada por todos los expedientes de ejecutorias en la Comunidad de Madrid desde 2012 hasta 2015, excluyendo delitos menores y delitos relacionados con delitos de tránsito y violencia de género, así como aquellos donde no hay víctima. Los principales hallazgos son que, en su mayoría, las indemnizaciones dictadas en sentencia en los procedimientos penales no son pagadas por los condenados: a pesar de que los tribunales establecen altas indemnizaciones, tanto en los Juzgados de lo Penal como en la Audiencia Provincial, la mayoría de las víctimas y los maltratados reciben menos de 300 € en concepto de indemnización. Por otro lado, la ayuda económica de la Administración española a las víctimas es casi inexistente. Otros hallazgos relevantes son: Primero, para aquellas personas que acaban cobrando al menos una parte de la indemnización, el período entre la comisión del ilícito y el pago es de aproximadamente 5 años en promedio; En segundo lugar, es más probable que la indemnización se pague cuando la persona condenada no ingresa en prisión; En tercer lugar, en general, no se utilizan mecanismos de seguro de responsabilidad civil; En la fase de investigación, solo se adoptan medidas cautelares en el 15% de los casos. Todas estas situaciones reflejan la violación de los parámetros de justicia europeos con respecto a la reparación a las víctimas

    Profile identification via weighted related metric scaling : an application to dependent Spanish children

    Get PDF
    AMS subject classification: 62-07, 62-09, 62H20, 62H99, 62P05Disability and dependency (lack of autonomy in performing common everyday actions) affect health status and quality of life, therefore they are significant public health issues. The main purpose of this study is to establish the existing relationship among different variables (continuous, categorical and binary) referred to children between 3 and 6 years old and their functional dependence in basic activities of daily living. We combine different types of information via weighted related metric scaling to obtain homogeneous profiles for dependent Spanish children. The redundant information between groups of variables is modeled with an interaction parameter that can be optimized according to several criteria. In this paper, the goal is to obtain maximum explained variability in an Euclidean configuration. Data comes from the Survey about Disabilities, Personal Autonomy and Dependence Situations, EDAD 2008, (Spanish National Institute of Statistics, 2008)This work has been partially supported by Spanish grant MTM2010-17323 (Spanish Ministry of Science and Innovation

    Constructing a Children's Subjective Well-Being Index: an Application to Socially Vulnerable Spanish Children

    Get PDF
    It is well-known that traditional economic measures such as household income appear to play less of a role in explaining children's subjective well-being than adults'. This paper focuses on the construction of a children's well-being index taking into account subjective and emotional factors, such as children's experiences of material deprivation and bullying, the quality of family relationships and with peers, the quality of services in their neighbourhood and personal well-being. The index is constructed from principal component analysis and rescaled to 0-100% for better interpretation. Data comes from a survey run in Spain in 2016 by the largest humanitarian organization involved in social programs in the country, covering socially vulnerable children aged 8-11, with around 2,900 respondents. The main findings are: (i) bullying makes the difference between children being moderate or completely unsatisfied with their lives; (ii) there is no a single Spanish region reaching satisfying well-being levels across all the components of the index. The methodology proposed for the construction of the index is general enough to be applied to general child population, regardless their social vulnerability condition or even country, adapting the questionnaire appropriately.Financial support from research project MTM2014-56535-R by the Spanish Ministry of Economy and Competitiveness

    Assessing scale-wise similarity of curves with a thick pen: As illustrated through comparisons of spectral irradiance

    Get PDF
    Forest canopies create dynamic light environments in their understorey, where spectral composition changes among patterns of shade and sunflecks, and through the seasons with canopy phenology and sun angle. Plants use spectral composition as a cue to adjust their growth strategy for optimal resource use. Quantifying the ever-changing nature of the understorey light environment is technically challenging with respect to data collection. Thus, to capture the simultaneous variation occurring in multiple regions of the solar spectrum, we recorded spectral irradiance from forest understoreys over the wavelength range 300-800 nm using an array spectroradiometer. It is also methodologically challenging to analyze solar spectra because of their multi-scale nature and multivariate lay-out. To compare spectra, we therefore used a novel method termed thick pen transform (TPT), which is simple and visually interpretable. This enabled us to show that sunlight position in the forest understorey (i.e., shade, semi-shade, or sunfleck) was the most important factor in determining shape similarity of spectral irradiance. Likewise, the contributions of stand identity and time of year could be distinguished. Spectra from sunflecks were consistently the most similar, irrespective of differences in global irradiance. On average, the degree of cross-dependence increased with increasing scale, sometimes shifting from negative (dissimilar) to positive (similar) values. We conclude that the interplay of sunlight position, stand identity, and date cannot be ignored when quantifying and comparing spectral composition in forest understoreys. Technological advances mean that array spectroradiometers, which can record spectra contiguously over very short time intervals, are being widely adopted, not only to measure irradiance under pollution, clouds, atmospheric changes, and in biological systems, but also spectral changes at small scales in the photonics industry. We consider that TPT is an applTMR was supported by Academy of Finland project through the funding decisions # 266523 and # 304519. AJ and AG were partially supported by project MTM2014-56535-R of the Spanish Ministry of Economy and Competitiveness. We thank Lammi Biological Research Station of the University of Helsinki, Research Coordinator John Loehr and Director Janne Sundell, for logistical and practical support
    corecore