1,411 research outputs found

    Aggregative quantification for regression

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s10618-013-0308-zThe problem of estimating the class distribution (or prevalence) for a new unlabelled dataset (from a possibly different distribution) is a very common problem which has been addressed in one way or another in the past decades. This problem has been recently reconsidered as a new task in data mining, renamed quantification when the estimation is performed as an aggregation (and possible adjustment) of a single-instance supervised model (e.g., a classifier). However, the study of quantification has been limited to classification, while it is clear that this problem also appears, perhaps even more frequently, with other predictive problems, such as regression. In this case, the goal is to determine a distribution or an aggregated indicator of the output variable for a new unlabelled dataset. In this paper, we introduce a comprehensive new taxonomy of quantification tasks, distinguishing between the estimation of the whole distribution and the estimation of some indicators (summary statistics), for both classification and regression. This distinction is especially useful for regression, since predictions are numerical values that can be aggregated in many different ways, as in multi-dimensional hierarchical data warehouses. We focus on aggregative quantification for regression and see that the approaches borrowed from classification do not work. We present several techniques based on segmentation which are able to produce accurate estimations of the expected value and the distribution of the output variable. We show experimentally that these methods especially excel for the relevant scenarios where training and test distributions dramatically differ.We would like to thank the anonymous reviewers for their careful reviews, insightful comments and very useful suggestions. This work was supported by the MEC/MINECO projects CONSOLIDER-INGENIO CSD2007-00022 and TIN 2010-21062-C02-02, GVA project PROME-TEO/2008/051, the COST-European Cooperation in the field of Scientific and Technical Research IC0801 AT, and the REFRAME project granted by the European Coordinated Research on Long-term Challenges in Information and Communication Sciences & Technologies ERA-Net (CHIST-ERA), and funded by the Ministerio de Economia y Competitividad in Spain.Bella Sanjuán, A.; Ferri Ramírez, C.; Hernández Orallo, J.; Ramírez Quintana, MJ. (2014). Aggregative quantification for regression. Data Mining and Knowledge Discovery. 28(2):475-518. https://doi.org/10.1007/s10618-013-0308-zS475518282Alonzo TA, Pepe MS, Lumley T (2003) Estimating disease prevalence in two-phase studies. Biostatistics 4(2):313–326Anderson T (1962) On the distribution of the two-sample Cramer–von Mises criterion. Ann Math Stat 33(3):1148–1159Bakar AA, Othman ZA, Shuib NLM (2009) Building a new taxonomy for data discretization techniques. In: Proceedings of 2nd conference on data mining and optimization (DMO’09), pp 132–140Bella A, Ferri C, Hernández-Orallo J, Ramírez-Quintana MJ (2009a) Calibration of machine learning models. In: Handbook of research on machine learning applications. IGI Global, HersheyBella A, Ferri C, Hernández-Orallo J, Ramírez-Quintana MJ (2009b) Similarity-binning averaging: a generalisation of binning calibration. In: International conference on intelligent data engineering and automated learning. LNCS, vol 5788. Springer, Berlin, pp 341–349Bella A, Ferri C, Hernández-Orallo J, Ramírez-Quintana MJ (2010) Quantification via probability estimators. In: International conference on data mining, ICDM2010, pp 737–742Bella A, Ferri C, Hernández-Orallo J, Ramírez-Quintana MJ (2012) On the effect of calibration in classifier combination. Appl Intell. doi: 10.1007/s10489-012-0388-2Chan Y, Ng H (2006) Estimating class priors in domain adaptation for word sense disambiguation. In: Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the Association for Computational Linguistics, pp 89–96Chawla N, Japkowicz N, Kotcz A (2004) Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explor Newsl 6(1):1–6Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30Dougherty J, Kohavi R, Sahami M (1995) Supervised and unsupervised discretization of continuous features. In: Prieditis A, Russell S (eds) Proceedings of the twelfth international conference on machine learning. Morgan Kaufmann, San Francisco, pp 194–202Ferri C, Hernández-Orallo J, Modroiu R (2009) An experimental comparison of performance measures for classification. Pattern Recogn Lett 30(1):27–38Flach P (2012) Machine learning: the art and science of algorithms that make sense of data. Cambridge University Press, CambridgeForman G (2005) Counting positives accurately despite inaccurate classification. In: Proceedings of the 16th European conference on machine learning (ECML), pp 564–575Forman G (2006) Quantifying trends accurately despite classifier error and class imbalance. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 157–166Forman G (2008) Quantifying counts and costs via classification. Data Min Knowl Discov 17(2):164–206Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/mlGonzález-Castro V, Alaiz-Rodríguez R, Alegre E (2012) Class distribution estimation based on the Hellinger distance. Inf Sci 218(1):146–164Hastie TJ, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, BerlinHernández-Orallo J, Flach P, Ferri C (2012) A unified view of performance metrics: translating threshold choice into expected classification loss. J Mach Learn Res (JMLR) 13:2813–2869Hodges J, Lehmann E (1963) Estimates of location based on rank tests. Ann Math Stat 34(5):598–611Hosmer DW, Lemeshow S (2000) Applied logistic regression. Wiley, New YorkHwang JN, Lay SR, Lippman A (1994) Nonparametric multivariate density estimation: a comparative study. IEEE Trans Signal Process 42(10):2795–2810Hyndman RJ, Bashtannyk DM, Grunwald GK (1996) Estimating and visualizing conditional densities. J Comput Graph Stat 5(4):315–336Moreno-Torres J, Raeder T, Alaiz-Rodríguez R, Chawla N, Herrera F (2012) A unifying view on dataset shift in classification. Pattern Recogn 45(1):521–530Neyman J (1938) Contribution to the theory of sampling human populations. J Am Stat Assoc 33(201):101–116Platt JC (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in large margin classifiers. MIT Press, Cambridge, pp 61–74Raeder T, Forman G, Chawla N (2012) Learning from imbalanced data: evaluation matters. Data Min 23:315–331Sánchez L, González V, Alegre E, Alaiz R (2008) Classification and quantification based on image analysis for sperm samples with uncertain damaged/intact cell proportions. In: Proceedings of the 5th international conference on image analysis and recognition. LNCS, vol 5112. Springer, Heidelberg, pp 827–836Sturges H (1926) The choice of a class interval. J Am Stat Assoc 21(153):65–66Team R et al (2012) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ViennaTenenbein A (1970) A double sampling scheme for estimating from binomial data with misclassifications. J Am Stat Assoc 65(331):1350–1361Weiss G (2004) Mining with rarity: a unifying framework. ACM SIGKDD Explor Newsl 6(1):7–19Weiss G, Provost F (2001) The effect of class distribution on classifier learning: an empirical study. Technical Report ML-TR-44Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques with Java implementations. Elsevier, AmsterdamXiao Y, Gordon A, Yakovlev A (2006a) A C++ program for the Cramér–von Mises two-sample test. J Stat Softw 17:1–15Xiao Y, Gordon A, Yakovlev A (2006b) The L1-version of the Cramér-von Mises test for two-sample comparisons in microarray data analysis. EURASIP J Bioinform Syst Biol 2006:85769Xue J, Weiss G (2009) Quantification and semi-supervised classification methods for handling changes in class distribution. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 897–906Yang Y (2003) Discretization for naive-bayes learning. PhD thesis, Monash UniversityZadrozny B, Elkan C (2001) Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In: Proceedings of the 8th international conference on machine learning (ICML), pp 609–616Zadrozny B, Elkan C (2002) Transforming classifier scores into accurate multiclass probability estimates. In: The 8th ACM SIGKDD international conference on knowledge discovery and data mining, pp 694–69

    Description of the female of Poeciloxestia plagiata (Waterhouse, 1880), and of the male of Iuati spinithorax Martins & Galileo, 2010 (Coleoptera, Cerambycidae)

    Get PDF
    Based on material recently collected in Peru (Loreto department) the female of Poeciloxestia plagiata (Waterhouse, 1880), and the male of Iuati spinithorax Martins & Galileo, 2010 are described and figured for the first time. Both species are recorded for the first time from Peru

    Description of the female of Poeciloxestia plagiata (Waterhouse, 1880), and of the male of Iuati spinithorax Martins & Galileo, 2010 (Coleoptera, Cerambycidae)

    Get PDF
    Based on material recently collected in Peru (Loreto department) the female of Poeciloxestia plagiata (Waterhouse, 1880), and the male of Iuati spinithorax Martins & Galileo, 2010 are described and figured for the first time. Both species are recorded for the first time from Peru

    Modeling, Simulation, and Control of Steam Generation Processes

    Get PDF
    This chapter describes a modeling methodology to provide the main characteristics of a simulation tool to analyze the steady state, transient operation, and control of steam generation processes, such as heat recovery steam generators (HRSG). The methodology includes a modular strategy that considers individual heat exchangers such as: economizers, evaporators, superheaters, drum tanks, and control systems. The modular strategy consists of the development of a numerical modeling tool that integrates sub-models based upon first principle equations of mass, energy, and momentum balance. The main heat transfer mechanisms characterize the dynamics of steam generation systems during normal and abnormal operations, which include the response of key process variables such as vapor pressure, temperature, and mass flow rate. Other important variables are: gas temperature, fluid temperature, drum pressure, drum’s liquid level, and mass flow rate at each module. Those variables are usually analyzed with design predicted performance of real industrial equipment such as HRSG systems. Finally, two case studies of the application of the modeling strategy are provided to show the effectiveness and utility of the methodology

    Who does what the cardiologist recommends? Psychosocial markers of unhealthy behavior in coronary disease patients

    Get PDF
    Patients diagnosed with coronary heart disease should follow lifestyle recommendations that can reduce their cardiovascular risk (e.g., avoid smoking). However, some patients fail to follow these recommendations and engage in unhealthy behavior. With the aim to identify psychosocial factors that characterize patients at high risk of repeated cardiovascular events, we investigated the relationship between social support, mental health (coping, self-esteem, and perceived stress), and unhealthy behavior. We conducted a cross-sectional study of 419 patients recently diagnosed with coronary heart disease (myocardial infarction or angina) who participated in the National Health Survey in Spain (2018). Unhealthy behaviors were defined according to the European Guidelines on cardiovascular disease prevention. Only 1% of patients reported no unhealthy behaviors, with 11% reporting one, 40% two, 35% three, and 13% four or more unhealthy behaviors. In multiple regression controlling for demographic and traditional risk factors, mental health was the only significant psychosocial factor, doubling the odds of accumulated unhealthy behaviors, OR(high vs. low) = 2.03, 95% CI [1.14, 3.64]. Mental health was especially strongly related to unhealthy behavior among patients with obesity, OR(high vs. low) = 3.50, 95% CI [1.49, 8.45]. The relationship between mental health and unhealthy behaviors suggests that a large proportion of patients may not adhere to lifestyle recommendations not because they purposefully choose to do so, but because they lack coping skills to maintain the recommended healthy behaviors. Low mental well-being may be especially detrimental for behavior change of patients with obesity.Dafina Petrova is supported by a Juan de la Cierva Fellowship (FJCI-2016-28279) from the Spanish Ministry of Economy, Industry, and Competitiveness. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

    Portafolios α-estables del G20: Evidencia empírica con Markowitz, Tobin y CAPM

    Get PDF
    G20 α-stable portfolios: Empirical evidence with Markowitz, Tobin and CAPM Objective: This research extends Markowitz, Tobin, and CAPM optimal portafolio with α-stable processes. Methodology: The following procedures are performed on a portfolio with the G20 stock indices: 1) descriptive statistics and α-stable parameters of index returns are estimated, 2) a goodness-of-fit test is applied to validate the α- stable processes, 3) the covariation matrix is estimated to calculate the optimal portfolio assignments, and 4) the systematic risk indicators are estimated. Results: The efficient frontier is calculated without short sales and shows that α-stable portfolios present greater aversion to risk than Gaussian portfolios, and that α-stable portfolios are more efficient with respect to the return and risk ratio. Recommendations: The application of α-stable processes to model leptokurtosis, asymmetry and volatility clusters. Limitations: The α-stable multivariate analysis presents different stability parameters. Originality: G20 returns are modeled with α-stable processes and a sensitivity analysis is performed. Conclusion: α-stable analysis allows to quantify market risk more adequately than Gaussian analysis.Objetivo: Esta investigación extiende los portafolios de Markowitz, Tobin, y el CAPM con procesos α-estables. Metodología: son realizados los siguientes procedimientos en un portafolio con los índices bursátiles del G20: 1) son estimados los estadísticos descriptivos y los parámetros α-estables de los rendimientos, 2) es aplicada una prueba de bondad de ajuste para validar los procesos α-estables, 3) es estimada la matriz de covariación para calcular las asignaciones de los portafolios, y 4) son estimados los indicadores de riesgo sistemático. Resultados: La frontera eficiente es calculada sin ventas en corto y muestra que los portafolios α-estables presentan mayor aversión al riesgo que los portafolios gaussianos, y que los portafolios α-estables son más eficientes con respecto a la relación rendimiento y riesgo. Recomendaciones: La aplicación de procesos α-estables para modelar la leptocurtosis, la asimetría y los cúmulos de volatilidad. Limitaciones: El análisis multivariado α-estable presenta diferentes parámetros de estabilidad. Originalidad: Los rendimientos del G20 son modelados con procesos α-estables y es realizado un análisis de sensibilidad. Conclusión: El análisis α-estable permite cuantificar el riesgo de mercado más adecuadamente que el análisis gaussiano

    La ética del cuidado y la bioética en la calidad de atención por enfermería

    Get PDF
    Los conocimientos del profesional de enfermería sobre la ética y la bioética, constituyen los principios sistemáticos de la conducta humana en el ámbito de la ciencia de la vida y la salud, que buscan el cumplimiento de principios morales y humanísticos, para obtener más conocimientos y proporcionar adecuadamente los cuidados necesarios a las personas sanas o enfermas. La enfermería está reconocida socialmente como una profesión de servicio, orientada a ayudar, servir y cuidar la salud, con una responsabilidad sobre el paciente y su cuidado. La calidad de la atención en enfermería se basa en principios bioéticos enfocados a promover la salud, prevenir las enfermedades, restaurar la salud y aliviar el sufrimiento. Objetivo. Este trabajo busca una reflexión sobre las aportaciones de la ética del cuidado y la bioética en la calidad de atención de enfermería, permitiendo analizar las limitaciones y aportaciones científicas de la bioética en enfermería. Materia

    Comparison of two synthesis methods on the preparation of Fe, N-Co-doped TiO2 materials for degradation of pharmaceutical compounds under visible light

    Get PDF
    "In this work, we report the synthesis, characterization and photocatalytic evaluation of visible light active iron-nitrogen co-doped titanium dioxide (Fe3+-TiO2?xNx) nanostructured catalyst. Fe3+-TiO2?xNx was synthesized using two different chemical approaches: sol-gel (SG) and microwave (MW) methods. The materials were fully characterized using several techniques (SEM, UV–Vis diffuse reflectance DRS, X-ray diffraction XRD, and X-ray photoelectron spectroscopy XPS). The photocatalytic activity of the nanostructured materials synthesized by both methods was evaluated for the degradation of amoxicillin (AMX), streptomycin (STR) and diclofenac (DCF) in aqueous solution. Higher degradation efficiencies were encountered for the materials synthesized by the SG method, for instance, degradation efficiencies values of 58.61% (SG) and 46.12% (MW) were observed for AMX after 240 min of photocatalytic treatment under visible light at pH 3.5. With STR the following results removal efficiencies were obtained: 49.67% (SG) and 39.90% (MW) at pH 8. It was observed the increasing of degradation efficiencies values at longer treatment periods, i.e., after 300 min of photocatalytic treatment under visible light, AMX had a degradation efficiency value of 69.15% (MW) at pH 3.5, DCF 72.3% (MW) at pH 5, and STR 58.49% (MW) at pH 8.

    The Parrandas of central Cuba: a recource for the diversification of the tourist offer and local development

    Get PDF
    Introducción: Las Parrandas del Centro de Cuba declaradas Patrimonio Inmaterial de la Humanidad surgen por primera vez en 1820 en Remedios y son fiestas celebradas por dieciocho pueblos de tres provincias del país: Villa Clara, Sancti Spíritus y Ciego de Ávila. Del total de dieciocho parrandas solamente la de San Juan de los Remedios, por ser la cuna de las mismas es explotada como un atractivo, dando prueba de ello, los antecedentes de la presente investigación que solo analizan el potencial turístico de la festividad de la Octava Villa de Cuba y no así el del resto de las parrandas.  Objetivo: El objetivo de la investigación es demostrar las potencialidades de las Parrandas del Centro de Cuba como un recurso para la diversificación de la oferta turística y el desarrollo local. Metodología: Dentro de los métodos empleados están los del nivel teórico como el análisis y síntesis de documentos que fundamentan la investigación, además se utilizaron métodos empíricos como la observación directa y entrevistas a la población local y funcionarios de varias instituciones vinculadas al patrimonio inmaterial.  Resultados: La inclusión de estas dentro de la oferta turística generaría resultados positivos como mayores ingresos a la población de acogida reflejados en el desarrollo de la artesanía, la cultura gastronómica, así como mejoras a la infraestructura de dichos territorios. Para ello se hizo necesario esclarecer las distancias con los principales destinos consolidados en la región, las vías de acceso, se crearon canales de información para el conocimiento de dicha tradición dado la influencia actual de las redes sociales y se orientó el cálculo de una adecuada capacidad de carga para evitar consecuencias negativas relacionadas al turismo cultural. Conclusión: Esta investigación constituye un antecedente para la elaboración de La Ruta de las Parrandas como un futuro producto integrado de la Región Central de Cuba. Área de estudio general: Turismo. Área de estudio específica: Turismo CulturalIntroduction: The Parrandas of Central Cuba, declared Intangible Cultural Heritage of Humanity, first emerged in 1820 in Remedios, and are festivities celebrated by eighteen towns in three provinces of the country: Villa Clara, Sancti Spíritus, and Ciego de Ávila. Out of the total of eighteen parrandas, only that of San Juan de los Remedios, being the birthplace, is exploited as an attraction, as evidenced by the background of the present research, which only analyzes the tourist potential of the festivity of the Eighth Villa of Cuba, and not that of the rest of the parrandas. Objective:  The objective of the research is to demonstrate the potential of the Parrandas of Central Cuba as a resource for diversifying the tourism offering and local development. Methodology: The methods employed include theoretical methods such as document analysis and synthesis that underpin the research. Additionally, empirical methods such as direct observation and interviews with the local population and officials from various institutions linked to intangible heritage were used. Results:  Results The inclusion of these festivities in the tourism offering would yield positive results, such as increased income for the host population reflected in the development of handicrafts, culinary culture, as well as improvements to the infrastructure of these territories. To achieve this, it was necessary to clarify the distances to the main established destinations in the region, the access routes, establish information channels for the knowledge of this tradition given the current influence of social media, and calculate an appropriate carrying capacity to avoid negative consequences related to cultural tourism. Conclusion: This research serves as a precedent for the development of "La Ruta de las Parrandas" as a future integrated product of the Central Region of Cuba

    The Parrandas of central Cuba: a recource for the diversification of the tourist offer and local development

    Get PDF
    Introducción: Las Parrandas del Centro de Cuba declaradas Patrimonio Inmaterial de la Humanidad surgen por primera vez en 1820 en Remedios y son fiestas celebradas por dieciocho pueblos de tres provincias del país: Villa Clara, Sancti Spíritus y Ciego de Ávila. Del total de dieciocho parrandas solamente la de San Juan de los Remedios, por ser la cuna de las mismas es explotada como un atractivo, dando prueba de ello, los antecedentes de la presente investigación que solo analizan el potencial turístico de la festividad de la Octava Villa de Cuba y no así el del resto de las parrandas.  Objetivo: El objetivo de la investigación es demostrar las potencialidades de las Parrandas del Centro de Cuba como un recurso para la diversificación de la oferta turística y el desarrollo local. Metodología: Dentro de los métodos empleados están los del nivel teórico como el análisis y síntesis de documentos que fundamentan la investigación, además se utilizaron métodos empíricos como la observación directa y entrevistas a la población local y funcionarios de varias instituciones vinculadas al patrimonio inmaterial.  Resultados: La inclusión de estas dentro de la oferta turística generaría resultados positivos como mayores ingresos a la población de acogida reflejados en el desarrollo de la artesanía, la cultura gastronómica, así como mejoras a la infraestructura de dichos territorios. Para ello se hizo necesario esclarecer las distancias con los principales destinos consolidados en la región, las vías de acceso, se crearon canales de información para el conocimiento de dicha tradición dado la influencia actual de las redes sociales y se orientó el cálculo de una adecuada capacidad de carga para evitar consecuencias negativas relacionadas al turismo cultural. Conclusión: Esta investigación constituye un antecedente para la elaboración de La Ruta de las Parrandas como un futuro producto integrado de la Región Central de Cuba. Área de estudio general: Turismo. Área de estudio específica: Turismo CulturalIntroduction: The Parrandas of Central Cuba, declared Intangible Cultural Heritage of Humanity, first emerged in 1820 in Remedios, and are festivities celebrated by eighteen towns in three provinces of the country: Villa Clara, Sancti Spíritus, and Ciego de Ávila. Out of the total of eighteen parrandas, only that of San Juan de los Remedios, being the birthplace, is exploited as an attraction, as evidenced by the background of the present research, which only analyzes the tourist potential of the festivity of the Eighth Villa of Cuba, and not that of the rest of the parrandas. Objective:  The objective of the research is to demonstrate the potential of the Parrandas of Central Cuba as a resource for diversifying the tourism offering and local development. Methodology: The methods employed include theoretical methods such as document analysis and synthesis that underpin the research. Additionally, empirical methods such as direct observation and interviews with the local population and officials from various institutions linked to intangible heritage were used. Results:  Results The inclusion of these festivities in the tourism offering would yield positive results, such as increased income for the host population reflected in the development of handicrafts, culinary culture, as well as improvements to the infrastructure of these territories. To achieve this, it was necessary to clarify the distances to the main established destinations in the region, the access routes, establish information channels for the knowledge of this tradition given the current influence of social media, and calculate an appropriate carrying capacity to avoid negative consequences related to cultural tourism. Conclusion: This research serves as a precedent for the development of "La Ruta de las Parrandas" as a future integrated product of the Central Region of Cuba
    corecore