50 research outputs found

    Use of multivariate statistical methods for the analysis of metabolomic data

    Full text link
    [ES] En las últimas décadas los avances tecnológicos han tenido como consecuencia la generación de una creciente cantidad de datos en el campo de la biología y la biomedicina. A día de hoy, las así llamadas tecnologías "ómicas", como la genómica, epigenómica, transcriptómica o metabolómica entre otras, producen bases de datos con cientos, miles o incluso millones de variables. El análisis de datos ómicos presenta una serie de complejidades tanto metodoló-gicas como computacionales que han llevado a una revolución en el desarrollo de nuevos métodos estadísticos específicamente diseñados para tratar con este tipo de datos. A estas complejidades metodológicas hay que añadir que, en la mayor parte de los casos, las restricciones logísticas y/o económicas de los proyectos de investigación suelen conllevar que los tamaños muestrales en estas bases de datos con tantas variables sean muy bajos, lo cual no hace sino empeorar las dificultades de análisis, ya que se tienen muchísimas más variables que observaciones. Entre las técnicas desarrolladas para tratar con este tipo de datos podemos encontrar algunas basadas en la penalización de los coeficientes, como lasso o elastic net, otras basadas en técnicas de proyección sobre estructuras latentes como PCA o PLS y otras basadas en árboles o combinaciones de árboles como random forest. Todas estas técnicas funcionan muy bien sobre distintos datos ómicos presentados en forma de matriz (IxJ). Sin embargo, en ocasiones los datos ómicos pueden estar expandidos, por ejemplo, al tomar medidas repetidas en el tiempo sobre los mismos individuos, encontrándonos con estructuras de datos que ya no son matrices, sino arrays tridimensionales o three-way (IxJxK). En estos casos, la mayoría de las técnicas citadas pierden parte de su aplicabilidad, quedando muy pocas opciones viables para el análisis de este tipo de estructuras de datos. Una de las técnicas que sí es útil para el análisis de estructuras three-way es N-PLS, que permite ajustar modelos predictivos razonablemente precisos, así como interpretarlos mediante distintos gráficos. Sin embargo, relacionado con el problema de la escasez de tamaño muestral relativa al desorbitado número de variables, aparece la necesidad de realizar una selección de variables relacionadas con la variable respuesta. Esto es especialmente cierto en el ámbito de la biología y la biomedicina, ya que no solo se quiere poder predecir lo que va a suceder, sino entender por qué sucede, qué variables están implicadas y, a poder ser, no tener que volver a recoger los cientos de miles de variables para realizar una nueva predicción, sino utilizar unas cuantas, las más importantes, para poder diseñar kits predictivos coste/efectivos de utilidad real. Por ello, el objetivo principal de esta tesis es mejorar las técnicas existentes para el análisis de datos ómicos, específicamente las encaminadas a analizar datos three-way, incorporando la capacidad de selección de variables, mejorando la capacidad predictiva y mejorando la interpretabilidad de los resultados obtenidos. Todo ello se implementará además en un paquete de R completamente documentado, que incluirá todas las funciones necesarias para llevar a cabo análisis completos de datos three-way. El trabajo incluido en esta tesis por tanto, consta de una primera parte teórico-conceptual de desarrollo de la idea del algoritmo, así como su puesta a punto, validación y comprobación de su eficacia; de una segunda parte empírico-práctica de comparación de los resultados del algoritmo con otras metodologías de selección de variables existentes, y de una parte adicional de programación y desarrollo de software en la que se presenta todo el desarrollo del paquete de R, su funcionalidad y capacidades de análisis. El desarrollo y validación de la técnica, así como la publicación del paquete de R, ha permitido ampliar las opciones actuales para el análisis[CA] En les últimes dècades els avançaments tecnològics han tingut com a conseqüència la generació d'una creixent quantitat de dades en el camp de la biologia i la biomedicina. A dia d'avui, les anomenades tecnologies "òmiques", com la genòmica, epigenòmica, transcriptòmica o metabolòmica entre altres, produeixen bases de dades amb centenars, milers o fins i tot milions de variables. L'anàlisi de dades 'òmiques' presenta una sèrie de complexitats tant metodolò-giques com computacionals que han portat a una revolució en el desenvolupament de nous mètodes estadístics específicament dissenyats per a tractar amb aquest tipus de dades. A aquestes complexitats metodològiques cal afegir que, en la major part dels casos, les restriccions logístiques i / o econòmiques dels projectes de recerca solen comportar que les magnituts de les mostres en aquestes bases de dades amb tantes variables siguen molt baixes, el que no fa sinó empitjorar les dificultats d'anàlisi, ja que es tenen moltíssimes més variables que observacions Entre les tècniques desenvolupades per a tractar amb aquest tipus de dades podem trobar algunes basades en la penalització dels coeficients, com lasso o elastic net, altres basades en tècniques de projecció sobre estructures latents com PCA o PLS i altres basades en arbres o combinacions d'arbres com random forest. Totes aquestes tècniques funcionen molt bé sobre diferents dades 'òmiques' presentats en forma de matriu (IxJ), però, en ocasions les dades òmiques poden estar expandits, per exemple, cuan ni ha mesures repetides en el temps sobre els mateixos individus, trobant-se amb estructures de dades que ja no són matrius, sinó arrays tridimensionals o three-way (IxJxK). En aquestos casos, la majoria de les tècniques mencionades perden tota o bona part de la seua aplicabilitat, quedant molt poques opcions viables per a l'anàlisi d'aquest tipus d'estructures de dades. Una de les tècniques que sí que és útil per a l'anàlisi d'estructures three-way es N-PLS, que permet ajustar models predictius raonablement precisos, així com interpretar-los mitjançant diferents gràfics. No obstant això, relacionat amb el problema de l'escassetat de mostres relativa al desorbitat nombre de variables, apareix la necessitat de realitzar una selecció de variables relacionades amb la variable resposta. Això és especialment cert en l'àmbit de la biologia i la biomedicina, ja que no només es vol poder predir el que va a succeir, sinó entendre per què passa, quines variables estan implicades i, si pot ser, no haver de tornar a recollir els centenars de milers de variables per realitzar una nova predicció, sinó utilitzar unes quantes, les més importants, per poder dissenyar kits predictius cost / efectius d'utilitat real. Per això, l'objectiu principal d'aquesta tesi és millorar les tècniques existents per a l'anàlisi de dades òmiques, específicament les encaminades a analitzar dades three-way, incorporant la capacitat de selecció de variables, millorant la capacitat predictiva i millorant la interpretabilitat dels resultats obtinguts. Tot això s'implementarà a més en un paquet de R completament documentat, que inclourà totes les funcions necessàries per a dur a terme anàlisis completes de dades three-way. El treball inclòs en aquesta tesi per tant, consta d'una primera part teorica-conceptual de desenvolupament de la idea de l'algoritme, així com la seua posada a punt, validació i comprovació de la seua eficàcia, d'una segona part empíric-pràctica de comparació dels resultats de l'algoritme amb altres metodologies de selecció de variables existents i d'una part adicional de programació i desenvolupament de programació en la qual es presenta tot el desenvolupament del paquet de R, la seua funcionalitat i capacitats d'anàlisi. El desenvolupament i validació de la tècnica, així com la publicació del paquet de R, ha permès ampliar les opcions actuals per a l'anàlis[EN] In the last decades, advances in technology have enabled the gathering of an increasingly amount of data in the field of biology and biomedicine. The so called "-omics" technologies such as genomics, epigenomics, transcriptomics or metabolomics, among others, produce hundreds, thousands or even millions of variables per data set. The analysis of 'omic' data presents different complexities that can be methodological and computational. This has driven a revolution in the development of new statistical methods specifically designed for dealing with these type of data. To this methodological complexities one must add the logistic and economic restrictions usually present in scientific research projects that lead to small sample sizes paired to these wide data sets. This makes the analyses even harder, since there is a problem in having many more variables than observations. Among the methods developed to deal with these type of data there are some based on the penalization of the coefficients, such as lasso or elastic net, others based on projection techniques, such as PCA or PLS, and others based in regression or classification trees and ensemble methods such as random forest. All these techniques work fine when dealing with different 'omic' data in matrix format (IxJ), but sometimes, these IxJ data sets can be expanded by taking, for example, repeated measurements at different time points for each individual, thus having IxJxK data sets that raise more methodological complications to the analyses. These data sets are called three-way data. In this cases, the majority of the cited techniques lose all or a good part of their applicability, leaving very few viable options for the analysis of this type of data structures. One useful tool for analyzing three-way data, when some Y data structure is to be predicted, is N-PLS. N-PLS reduces the inclusion of noise in the models and obtains more robust parameters when compared to PLS while, at the same time, producing easy-to-understand plots. Related to the problem of small sample sizes and exorbitant variable numbers, comes the issue of variable selection. Variable selection is essential for facilitating biological interpretation of the results when analyzing 'omic' data sets. Often, the aim of the study is not only predicting the outcome, but also understanding why it is happening and also what variables are involved. It is also of interest being able to perform new predictions without having to collect all the variables again. Because all of this, the main goal of this thesis is to improve the existing methods for 'omic' data analysis, specifically those for dealing with three-way data, incorporating the ability of variable selection, improving predictive capacity and interpretability of results. All this will be implemented in a fully documented R package, that will include all the necessary functions for performing complete analyses of three-way data. The work included in this thesis consists in a first theoretical-conceptual part where the idea and development of the algorithm takes place, as well as its tuning, validation and assessment of its performance. Then, a second empirical-practical part comes where the algorithm is compared to other variable selection methodologies. Finally, an additional programming and software development part is presented where all the R package development takes place, and its functionality and capabilities are exposed. The development and validation of the technique, as well as the publication of the R package, has opened many future research lines.Hervás Marín, D. (2019). Use of multivariate statistical methods for the analysis of metabolomic data [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/130847TESI

    Sparse N-way partial least squares with R package sNPLS

    Full text link
    [EN] We introduce the R package sNPLS that performs N-way partial least squares (N-PLS) regression and Sparse (L1-penalized) N-PLS regression in three-way arrays. N-PLS regression is superior to other methods for three-way data based in unfolding, thanks to a better stabilization of the decomposition. This provides better interpretability and improves predictions. The sparse version also adds variable selection through L1 penalization. The sparse version of N-PLS is able to provide lower prediction errors and to further improve interpretability and usability of the N-PLS results. After a short introduction to both methods, the different functions of the package are presented by displaying their use in simulated and a real dataset.Research in this study was partially supported by the Conselleria de Educacion, Investigacion, Cultura y Deporte de la Generalitat Valenciana under the project PROMETEO/2016/093.Hervás-Marín, D.; Prats-Montalbán, JM.; Lahoz Rodríguez, AG.; Ferrer, A. (2018). Sparse N-way partial least squares with R package sNPLS. Chemometrics and Intelligent Laboratory Systems. 179:54-63. https://doi.org/10.1016/j.chemolab.2018.06.005S546317

    Sparse N-way Partial Least Squares by L1-penalization

    Full text link
    [EN] N-PLS, as the natural extension of PLS to N-way structures, tries to maximize the covariance between an X and a Y N-way data arrays. It provides a useful framework for fitting prediction models to N-way data. However, N-PLS by itself does not perform variable selection, which indeed can facilitate interpretation in different situations (e.g. the so-called ¿¿omics¿ data). In this work, we propose a method for variable selection within N-PLS by introducing sparsity in the weights matrices WJ and WK by means of L1-penalization. The sparse version of N-PLS is able to provide lower prediction errors by filtering all the noise variables and to further improve interpretability and usability of the N-PLS results. To test Sparse N-PLS performance two different simulated data sets were used, whereas to show its utility in a biological context a real time course metabolomics data set was used.Hervás-Marín, D.; Prats-Montalbán, JM.; Garcia-Cañaveras, J.; Lahoz Rodríguez, AG.; Ferrer, A. (2019). Sparse N-way Partial Least Squares by L1-penalization. Chemometrics and Intelligent Laboratory Systems. 185:85-91. https://doi.org/10.1016/j.chemolab.2019.01.004S859118

    Infliximab reduces Zaprinast-induced retinal degeneration in cultures of porcine retina

    Get PDF
    Background: cGMP-degrading phosphodiesterase 6 (PDE6) mutations cause around 4 to 5% of retinitis pigmentosa (RP), a rare form of retinal dystrophy. Growing evidence suggests that inflammation is involved in the progression of RP. The aims of this study were to corroborate the presence of high TNFα concentration in the eyes of RP patients and to evaluate whether the blockade of TNFα with Infliximab, a monoclonal anti-TNFα antibody, prevented retinal degeneration induced by PDE6 inhibition in cultures of porcine retina. Methods: Aqueous humor from 30 patients with RP and 13 healthy controls were used to quantify the inflammatory mediators IL-6, TNFα, IL-1β, IL-10 by a multiplex enzyme-linked immunosorbent assay (ELISA) system. Retinal explants from pig were exposed to Zaprinast, a PDE6 inhibitor, for 24 hours in the absence or the presence of Infliximab. Cell death was evaluated by TUNEL assay. The number and distribution of caspase-3 positive cells, indirect poly(ADP)ribose polymerase (PARP) activation and glial fibrillary acidic protein (GFAP) content were visualized by immunolabeling. Antioxidant total capacity, nitrites and thiobarbituric acid reactive substances (TBARS) formation were determined to evaluate antioxidant-oxidant status. Results: IL-6 and TNFα concentrations were higher in the aqueous humor of RP patients than in controls. Infliximab prevented retinal degeneration, as judging by the reduced presence of TUNEL-positive cells, the reduction of caspase-3 activation and also reduction of glial activation, in an ex vivo model of porcine retina. Additionally, Infliximab partially reduced oxidative stress in retinal explants exposed to Zaprinast. Conclusions: Inflammatory mediators IL-6 and TNFα were elevated in the aqueous humor of RP patients corroborating previous studies suggesting sustained chronic inflammation. Our study suggests that TNFα is playing an important role in cell death in an ex vivo model of retinal degeneration by activating different cell pathways at different cell layers of the retina that should be further studied.This work was supported by the European Regional Development Fund, Institute of Health Carlos III, PI10/01825 and PI12/0481 from the Spanish Ministry of Economy and Competitiveness (MEC). CIBERER is an initiative of the Institute of Health Carlos III from the MEC. Regina Rodrigo has a research-contract SNS Miguel Servet (CP09/118) from Institute of Health Carlos III.Medicin

    Relationship between Skin Temperature, Electrical Manifestations of Muscle Fatigue, and Exercise-Induced Delayed Onset Muscle Soreness for Dynamic Contractions: A Preliminary Study

    Get PDF
    Delayed onset muscle soreness (DOMS) indicates the presence of muscle damage and impairs force production and control. Monitorization of DOMS is useful to improving recovery intervention plans. The magnitude of DOMS may relate to muscle fatigue, which can be monitored by surface electromyography (EMG). Additionally, growing interest has been expressed in determining whether the skin temperature over a muscle group during exercise to fatigue could be a non-invasive marker for DOMS. Here we determine whether skin temperature and manifestations of muscle fatigue during exercise are correlated and can predict DOMS after concentric-eccentric bicep curl exercises. We tested 10 young adults who performed concentric-eccentric bicep curl exercises to induce muscle damage in the biceps brachialis to investigate the relationship between skin temperature and fatigue during exercise and DOMS after exercise. Muscle activation and skin temperature were recorded during exercise. DOMS was evaluated 24 h after exercise. Data analysis was performed using Bayesian regression models with regularizing priors. We found significant muscle fatigue and an increase in skin temperature during exercise. DOMS was observed 24 h after exercise. The regression models showed no correlation of changes in skin temperature and muscle fatigue during exercise with DOMS 24 h after exercise. In conclusion, our preliminary results do not support a relationship between skin temperature measured during exercise and either muscle fatigue during exercise or the ability to predict DOMS 24 h after exercise

    Expired Tidal Volume and Respiratory Rate During Postnatal Stabilization of Newborn Infants Born at Term via Cesarean Delivery

    Get PDF
    Objective To retrieve evolving respiratory measures in the first minutes after birth in normal neonates born at term using a respiratory function monitor. Study design We evaluated newborn babies delivered at term via cesarean after uncomplicated pregnancies. Immediately after birth, a respiratory function monitor with an adapted flowmeter and a face mask were applied at 2, 5, and 10 minutes after birth for 90 seconds in each period. We analyzed expired and inspired tidal volume, respiratory rate (RR), percentage of leakage, and number of analyzed breaths in each individual infant’s recording using a respiratory research software. Results A total of 243 infants completed the study. The final data set included 59 058 (48.35%) valid observations for each of the variables representing the analysis of 32 801 breaths. With these data, we constructed a reference range with 10th, 25th, 50th, 75th, and 90th percentiles for expired tidal volume and RR. Tidal volumes plateaued earlier in female than in male infants. No correlation with delayed cord clamping, gestational age, maternal morbidity, or indication for cesarean delivery were established. Conclusions We have constructed a reference range with percentiles for inspired and expired tidal volumes and RR in newborn babies born at term for the first 10 minutes after birth. Reference ranges can be employed for research and can be useful in the clinical setting to guide positive pressure ventilation in the delivery room.Enfermerí

    Impact of maternal age on infants' emotional regulation and psychomotor development

    Get PDF
    Background. Maternal age has progressively increased in industrialized countries. Most studies focus on the consequences of delayed motherhood for women's physical and mental health, but little is known about potential effects on infants' neurodevelopment. This prospective study examines the association between maternal age and offspring neurodevelopment in terms of both psychomotor development (Ages & Stages Questionnaires-3) and emotional competences (Early Childhood Behavior Questionnaire). Methods. We evaluated a cohort of healthy pregnant women aged 20-41 years and their offspring, assessed at 38 weeks gestation (n = 131) and 24 months after birth (n = 101). Potential age-related variables were considered (paternal age, education level, parity, social support, maternal cortisol levels, and maternal anxiety and depressive symptoms). Bayesian ordinal regression models were performed for each neurodevelopmental outcome. Results. Maternal age was negatively associated with poor child development in terms of personal-social skills [odds ratio (OR) −0.13, 95% confidence interval (CI) 0.77-0.99] and with difficult temperament in terms of worse emotional regulation (OR −0.13, 95% CI 0.78-0.96) and lower positive affect (OR 0.16, 95% CI 0.75-0.95). As for age-related variables, whereas maternal anxiety symptoms and cortisol levels were also correlated with poor child development and difficult temperament, maternal social support and parental educational level were associated with better psychomotor and emotional competences. Conclusion. Increasing maternal age may be associated with child temperament difficulties and psychomotor delay in terms of social interaction skills. Early detection of neurodevelopment difficulties in these babies would allow preventive psychosocial interventions to avoid future neuropsychiatric disorders

    Impact of maternal age on infants' emotional regulation and psychomotor development

    Get PDF
    Background. Maternal age has progressively increased in industrialized countries. Most studies focus on the consequences of delayed motherhood for women's physical and mental health, but little is known about potential effects on infants' neurodevelopment. This prospective study examines the association between maternal age and offspring neurodevelopment in terms of both psychomotor development (Ages & Stages Questionnaires-3) and emotional competences (Early Childhood Behavior Questionnaire). Methods. We evaluated a cohort of healthy pregnant women aged 20-41 years and their offspring, assessed at 38 weeks gestation (n = 131) and 24 months after birth (n = 101). Potential age-related variables were considered (paternal age, education level, parity, social support, maternal cortisol levels, and maternal anxiety and depressive symptoms). Bayesian ordinal regression models were performed for each neurodevelopmental outcome. Results. Maternal age was negatively associated with poor child development in terms of personal-social skills [odds ratio (OR) −0.13, 95% confidence interval (CI) 0.77-0.99] and with difficult temperament in terms of worse emotional regulation (OR −0.13, 95% CI 0.78-0.96) and lower positive affect (OR 0.16, 95% CI 0.75-0.95). As for age-related variables, whereas maternal anxiety symptoms and cortisol levels were also correlated with poor child development and difficult temperament, maternal social support and parental educational level were associated with better psychomotor and emotional competences. Conclusion. Increasing maternal age may be associated with child temperament difficulties and psychomotor delay in terms of social interaction skills. Early detection of neurodevelopment difficulties in these babies would allow preventive psychosocial interventions to avoid future neuropsychiatric disorders
    corecore