5 research outputs found
Nonlinear Canonical correspondence Analysis: Description of the data of Coffee
The formulation of coffee blends is of paramount importance for the coffee industry, as it provides the product with an expressive ability to compete in the market and adds sensory attributes that complement the consumption experience. Through redundancy analysis and canonical correspondence analysis, it is possible to study the relationships between a set of sensory notes and a set of blends with different proportions of coffee variety through multivariate linear regression models. However, it is unrealistic to assume that such sensory responses are given linearly in relation to the formulation of the blends, since some coffee species have greater weight in the sensory evaluation (quadratic terms) and the effect of the mixtures (term of interaction). With this motivation, this work aims to propose the use of redundancy analysis and nonlinear correspondence analysis through multivariate polynomial regression to evaluate the acceptance of different varieties of coffee blends according to the scores given by the evaluators. Finally, it is concluded that there were gains in the percentage of total explained variance in the polynomial models in relation to the classic models
New orders of 2k factorial designs generated by simulated annealing adapted to optimality criteria
Excessive changes in factor levels can lead to a high cost in practice and hinder the conduction of experiments, in addition to adding a higher computational cost and loss of the orthogonality property, resulting in numerical problems in estimating the parameters of a model. The sequential specification of experimental points, seen as treatments in a 2k factorial design, results in a high-order bias in some factors, which is caused by the accumulation of −1 or +1 signals. This study aimed to propose new designs generated by the simulated annealing technique, respecting the main A-optimal and D-optimal optimality criteria as random execution orders that minimize the order bias. This approach allowed the generation of 24 and 25 factorials, which were compared to the designs in standard order. The simulated annealing technique is a viable method to generate optimal designs with the same efficiency as the usual designs to obtain A-optimal and D-optimal designs with new execution orders, which minimize the effect of order bias relative to standard order designs. Regarding efficiency, the generated designs were precise in the variance of model parameter estimates, similar to the original designs
Modelos semiparamétricos de eventos recurrentes: caso aplicación a pacientes con cáncer de mama
La recurrencia de un evento en un paciente es la frecuencia observada de este en un periodo de tiempo durante el seguimiento al individuo, por ejemplo hospitalizaciones sucesivas de neumonía, episodios de epilepsia, recaídas de cáncer, entre otros. Los modelos de eventos recurrentes son muy útiles para la aplicación en estos fenómenos, y la presente investigación pretende ilustrar y comparar modelos particulares de datos de eventos recurrentes sin efecto aleatorio: Andersen y Gill (A-D); Wei, Lin y Weissfeld (WLW); y, Prentice, Williams y Peterson (PWP), los cuales son modelos basados en la extensión de Cox de riesgos proporcionales, en estos modelos se asumen independencia de eventos. Otro modelo estudiado es el modelo de Fragilidad Compartida Gamma para eventos recurrentes que considera un término de fragilidad y asume que este término influye en la recurrencia de los eventos de un mismo sujeto. Para la estimación de los parámetros en los modelos sin efecto aleatorio se utilizó el método de máxima verosimilitud parcial mientras que para el modelo de fragilidad fue el método de máxima verosimilitud penalizado, el cual penaliza la función de riesgo base. Los datos usados para la aplicación de estas metodologías fue proporcionada por el médico Ginecólogo Oncólogo Dr. Vladimir Villoslada Terrones del Instituto Nacional de Enfermedades Neoplásicas (INEN). Estos datos describen un conjunto de variables relacionados al cáncer de mama en una cohorte prospectiva de 68 pacientes con diagnóstico positivo, sometidos a una cirugía mastectomía. Al procesar y analizar los resultados obtenidos, se encontró que el modelo Andersen y Gill (A-D) y Prentice, Williams y Peterson (PWP) son los que ajustan mejor a este conjunto de datos. Entre los resultados encontrados se obtuvo que los factores asociados al riesgo de recurrencia de cáncer de mama son la edad de inicio al estudio, la edad de primera menstruación (menarquia) y tipo carcinoma lobulillar. Estos modelos presentan similares resultados debido a la significancia estadística en las variables y el cumplimiento del supuesto de riesgos proporcionales.The recurrence of an event in a patient is the observed frequency of this event over a period of time during follow-up, e.g. successive hospitalizations of pneumonia, episodes of epilepsy, relapses of cancer, among others. Recurrent event models are very useful for application in these phenomena, and the present research is intended to illustrate and compare particular models for recurrent event data without random effect: Andersen and Gill (A-G); Wei, Lin and Weissfeld (WLW); and Prentice, Williams and Peterson (PWP), which are models based on the Cox extension of proportional hazards, in these models assume independence of events. Another studied model is the Gamma Shared Fragility model that considers a term of fragility and assumes that this term influences the recurrence of the events of the same subject. For the estimation of the parameters in the models without random effect, the maximum likelihood method was used, while for the fragility model was the penalized maximum likelihood method, which penalizes the function of base risk. The data used for the application of these methodologies was provided by the physician Gynecologist Oncologist Dr. Vladimir Villoslada Terrones of the National Institute of Neoplastic Diseases (INEN, in its Spanish acronym). These data describe a set of variables related to breast cancer in a prospective cohort of 68 patients with positive diagnosis undergoing mastectomy surgery. When processing and analyzing the obtained results, we found that the model Andersen and Gill (A-G) and Prentice, Williams and Peterson (PWP) are the best fit to this data set. Besides, we found that the factors associated with risk of recurrence of breast cancer are the age of onset of the study, the age of first menstruation (menarche) and lobular carcinoma type. These models present similar results due to the statistical significance in the variables and compliance with the proportional risk assumption.TesisUniversidad Nacional Agraria La Molina. Escuela de Posgrado. Maestría en Estadística Aplicad
Non-linear regression models in the management of accumulated production of parchment coffee in Peru
Parchment coffee results from washing the coffee cherry, and its production has achieved a significant increase in the coffee-growing regions of Peru. Knowing the production pattern of this grain is essential to help coffee producers make decisions in the economic and social sector. As growth curves generally have sigmoidal behavior, which is well fit by non-linear models, this study aimed to model the cumulative production pattern of parchment coffee as a function of time (in months) in the year 2022, comparing the fit of the non-linear Logistic, Gompertz and von Bertalanffy models. The cumulative national production, and production of the departments of Huánuco and San Martín, in Peru, were analyzed. Data used to fit the models were obtained from the Ministry of Development and Irrigation (MIDAGRI) of Peru. To check the assumptions of normality, homoscedasticity, and independence of residuals, the Shapiro-Wilk, Breusch-Pagan, and Durbin-Watson tests were used, respectively. The model parameters were estimated using the least squares method using the Gauss-Newton algorithm in the R software. The goodness-of-fit of the models was tested using goodness-of-fit measures such as Coefficient of Determination (R2), Residual Standard Deviation (RSD), Akaike Information Criterion (AIC), and nonlinearity measures. Based on the models’ goodness-of-fit measures, the Gompertz model with a first-order autoregressive error term (AR1) fit best to national production data, and the Logistic model was the most suitable for describing the production of the departments of Huánuco, and San Martín