3 research outputs found

    Kernel-based Information Criterion

    Full text link
    This paper introduces Kernel-based Information Criterion (KIC) for model selection in regression analysis. The novel kernel-based complexity measure in KIC efficiently computes the interdependency between parameters of the model using a variable-wise variance and yields selection of better, more robust regressors. Experimental results show superior performance on both simulated and real data sets compared to Leave-One-Out Cross-Validation (LOOCV), kernel-based Information Complexity (ICOMP), and maximum log of marginal likelihood in Gaussian Process Regression (GPR).Comment: We modified the reference 17, and the subcaptions of Figure

    Information Criteria and Effective Feature Size Estimation for Data with Inherent Dependencies

    Get PDF
    In model selection, it is necessary to select a model from a set of candidate models based on some observed data. The model should fit the data well, but without being overly complex, since that would not allow the model to generalize well its predictions to unseen data. Information criteria are widely used model selection methods that select a model based on some criteria. Information criteria estimate a score for each candidate model, and use that score to make a selection. A common way of estimating such a score, rewards the candidate model for its goodness of fit on some observed data and penalizes for the model complexity. Many popular information criteria, such as Akaike's Information Criterion (AIC) and Bayesian Information Criterion (BIC) penalize model complexity by the feature dimension. However, in a non-standard setting with inherent dependencies, these criteria are prone to over-penalizing the complexity of the model. Motivated by how these commonly used criteria tend to over-penalize, we evaluate AIC and BIC on a multi-target setting with correlated features. We compare AIC and BIC, with the Fisher Information Criterion (FIC), a criterion that takes into consideration correlations amongst features and does not penalize model complexity solely by the feature dimension of the candidate model. We evaluate the feature selection and predictive performances of the three information criteria in a linear regression setting with correlated features. We evaluate the precision, recall and F1 score of the set of features each criterion selects, compared to the feature set of the generative model. Under this setting's assumptions, we find that FIC yields the best results, compared to AIC and BIC, both in the feature selection and predictive performance evaluation. Finally, using FIC's properties for feature selection, we derive a formulation that allows to approximate the effective feature dimension of models with correlated features, in linear regression settings

    Modelos de unidad para la generación de mapas de pobreza a nivel subnacional

    Get PDF
    Existe un creciente interés por contar con estadísticas sobre diversos grupos de la población con un alto nivel de desagregación geográfica, que generalmente excede la capacidad de las encuestas de hogares para proveer información representativa a dichos niveles. La estimación en áreas pequeñas (SAE, por sus siglas en inglés) es un conjunto de técnicas estadísticas que permiten la estimación de parámetros a nivel subnacional y que buscan mejorar la calidad de las estimaciones directas basadas en encuestas de hogares cuando la desagregación no alcanza los criterios de calidad adecuados para su publicación. Este documento presenta la metodología y los resultados de la aplicación de un modelo de estimación de áreas pequeñas a nivel de unidad con errores anidados y factores de expansión para la estimación de indicadores de pobreza a nivel provincial en el Perú, comunal en Chile y municipal en Colombia. Al utilizar los censos de población como fuente de información auxiliar, el modelo permite mejorar la precisión de los indicadores de interés en áreas geográficas donde las encuestas no alcanzan la representatividad adecuada.Resumen .-- Introducción .-- I. Fuentes de información .-- II. Elementos teóricos de la metodología Pseudo – EBP .-- III. Estimación del error en el modelo EBP .-- IV. Criterios de selección y validación de los supuestos del modelo .-- V. Resultados y mapas de pobreza .-- VI. Conclusiones
    corecore