36 research outputs found
A variable selection approach for highly correlated predictors in high-dimensional genomic data
In genomic studies, identifying biomarkers associated with a variable of
interest is a major concern in biomedical research. Regularized approaches are
classically used to perform variable selection in high-dimensional linear
models. However, these methods can fail in highly correlated settings. We
propose a novel variable selection approach called WLasso, taking these
correlations into account. It consists in rewriting the initial
high-dimensional linear model to remove the correlation between the biomarkers
(predictors) and in applying the generalized Lasso criterion. The performance
of WLasso is assessed using synthetic data in several scenarios and compared
with recent alternative approaches. The results show that when the biomarkers
are highly correlated, WLasso outperforms the other approaches in sparse
high-dimensional frameworks. The method is also successfully illustrated on
publicly available gene expression data in breast cancer. Our method is
implemented in the WLasso R package which is available from the Comprehensive R
Archive Network
Parenting style and children emotion management skills among Chinese children aged 3â6: the chain mediation effect of self-control and peer interactions
Drawing on ecosystem theory, which is based on the interaction of family environment, individual characteristics, and social adaptation, this study aimed to examine the effects of parenting style on emotion management skills and the mediating roles of self-control and peer interactions among Chinese children aged 3â6âyears. Some studies have investigated the relationship between parenting style and emotion management skills. However, research on the underlying mechanisms is still deficient. A sample of 2,303 Chinese children completed the PSDQ-Short Version, the Self-Control Teacher Rating Questionnaire, the Peer Interaction Skills Scale, and the Emotion Management Skills Questionnaire. The results show that: (1) Authoritarian parenting style negatively predicted childrenâs emotion management skills, self-control, and peer interactions; (2) Authoritative parenting style positively predicted childrenâs emotion management skills, self-control, and peer interactions; (3) Structural equation models indicated that self-control and peer interactions partially mediated the effects of authoritarian and authoritative parenting styles. The parenting style of Chinese children aged 3â6âyears is related to emotion management skills, and self-control and peer interactions have chain mediating effects between parenting style and childrenâs emotion management skills. These results provide further guidance for the prevention and intervention of emotional and mental health problems in children
Structural Behavior of Thin-Walled Concrete-Filled Steel Tube Used in Cable Tunnel: An Experimental and Numerical Investigation
One steel grid and five thin-walled concrete-filled steel tubes (CTST) used as the supports of tunnel were tested in site for investigating the mechanical behavior. The mechanical influences of thickness, node form, and concrete on CTST were gained and compared with the impacts on steel grid. It is indicated that high antideformation capacity of CTST improved the stability of surrounding rock in short time. The cementitious grouted sleeve connection exhibited superior flexibility when CTST was erected and built. Although the deformation of rock and soil in the tunnel was increasing, good compression resistance was observed by CTST with the new connection type. It was also seen that vault, tube foot, and connections were with larger absolute strain values. The finite element analysis (FEA) was carried out using ABAQUS program. The results were validated by comparison with experimental results. The FE model could be referred by similar projects
RNA-Seq reveals the key pathways and genes involved in the light-regulated flavonoids biosynthesis in mango (Mangifera indica L.) peel
IntroductionFlavonoids are important water soluble secondary metabolites in plants, and light is one of the most essential environmental factors regulating flavonoids biosynthesis. In the previous study, we found bagging treatment significantly inhibited the accumulation of flavonols and anthocyanins but promoted the proanthocyanidins accumulation in the fruit peel of mango (Mangifera indica L.) cultivar âSensationâ, while the relevant molecular mechanism is still unknown.MethodsIn this study, RNA-seq was conducted to identify the key pathways and genes involved in the light-regulated flavonoids biosynthesis in mango peel.ResultsBy weighted gene co-expression network analysis (WGCNA), 16 flavonoids biosynthetic genes were crucial for different flavonoids compositions biosynthesis under bagging treatment in mango. The higher expression level of LAR (mango026327) in bagged samples might be the reason why light inhibits proanthocyanidins accumulation in mango peel. The reported MYB positively regulating anthocyanins biosynthesis in mango, MiMYB1, has also been identified by WGCNA in this study. Apart from MYB and bHLH, ERF, WRKY and bZIP were the three most important transcription factors (TFs) involved in the light-regulated flavonoids biosynthesis in mango, with both activators and repressors. Surprisingly, two HY5 transcripts, which are usually induced by light, showed higher expression level in bagged samples.DiscussionOur results provide new insights of the regulatory effect of light on the flavonoids biosynthesis in mango fruit peel
Développement de méthodes d'apprentissage statistique pour l'identification de biomarqueurs pronostiques et prédictifs à l'aide de données "-omiques" de grande dimension dans le domaine de la médecine de précision
Avec la rĂ©volution gĂ©nomique et l'arrivĂ©e de la mĂ©decine de prĂ©cision, l'identification de biomarqueurs qui sont explicatifs (biomarqueurs actifs) d'une rĂ©ponse clinique devient de plus en plus importante dans la recherche clinique. Ces biomarqueurs sont utiles pour mieux comprendre la progression d'une maladie (biomarqueurs pronostiques) et pour mieux identifier les patients les plus susceptibles de bĂ©nĂ©ficier d'un traitement donnĂ© (biomarqueurs prĂ©dictifs). Les donnĂ©es relatives aux biomarqueurs (gĂ©nomique, transcriptomique et protĂ©omique, par exemple) sont en gĂ©nĂ©ral de grande dimension, le nombre de biomarqueurs mesurĂ©s (variables) Ă©tant beaucoup plus important que la taille de l'Ă©chantillon. Cependant, seule une fraction des biomarqueurs est rĂ©ellement active, d'oĂč la nĂ©cessitĂ© de sĂ©lectionner les variables. Parmi les divers algorithmes d'apprentissage statistique, les approches rĂ©gularisĂ©es telles que le Lasso sont trĂšs utilisĂ©es pour faire de la sĂ©lection de variables dans des contextes de grande dimension en raison de leurs performances statistiques et numĂ©riques. Cependant, la consistance de leur sĂ©lection n'est pas garantie lorsque les biomarqueurs sont fortement corrĂ©lĂ©s. Au cours de ma thĂšse, plusieurs nouvelles approches ont Ă©tĂ© dĂ©veloppĂ©es pour effectuer la sĂ©lection de variables dans ce contexte difficile. Plus prĂ©cisĂ©ment, quatre mĂ©thodes sont mises en place dans diffĂ©rents modĂšles statistiques (modĂšle de rĂ©gression linĂ©aire, modĂšle de type ANCOVA et modĂšle de rĂ©gression logistique). L'idĂ©e principale est de supprimer les corrĂ©lations en blanchissant la matrice de design. Pour l'une d'entre elles, des rĂ©sultats de la consistance en signe ont Ă©tĂ© obtenus sous des hypothĂšses peu restrictives. Les approches proposĂ©es ont Ă©tĂ© Ă©valuĂ©es par des Ă©tudes de simulation et appliquĂ©es Ă des donnĂ©es publiques. Les rĂ©sultats montrent que les performances statistiques de nos mĂ©thodes sont meilleures que celles de l'Ă©tat de l'art. Nos mĂ©thodes sont implĂ©mentĂ©es dans les packages R suivants : WLasso, PPLasso, et WLogit.With the genomic revolution and the new era of precision medicine, the identification of biomarkers that are informative (i.e. active) for a response (endpoint) is becoming increasingly important in clinical research. These biomarkers are beneficial to better understand the progression of a disease (prognostic biomarkers) and to better identify patients more likely to benefit from a given treatment (predictive biomarkers). Biomarker data (e.g. genomics, transcriptomics, and proteomics) usually have a high-dimensional nature, with the number of measured biomarkers (variables) much larger than the sample size. However, only a fraction of biomarkers is truly active, therefore raising the need for variable selection. Among various statistical learning approaches, regularized methods such as Lasso have become very popular for high-dimensional variable selection due to their statistical and numerical performance. However, their selection consistency is not guaranteed when the biomarkers are highly correlated. Throughout my PhD, several novel regularized approaches were developed to perform variable selection in this challenging context. More precisely, four methods were proposed in different statistical models (linear regression model, ANCOVA-type model, and logistic regression model). The main idea is to remove the correlations by whitening the design matrix. For one of the methods, results of the sign consistency were established under mild conditions. The proposed approaches were evaluated through simulation studies and applications on publicly available datasets. The results suggest that our approaches are more performant than compared methods for selecting prognostic and predictive biomarkers in high-dimensional (correlated) settings. Three of our methods are implemented in the R packages: WLasso, PPLasso, and WLogit, available from the CRAN (Comprehensive R Archive Network)
Développement de méthodes d'apprentissage statistique pour l'identification de biomarqueurs pronostiques et prédictifs à l'aide de données "-omiques" de grande dimension dans le domaine de la médecine de précision
With the genomic revolution and the new era of precision medicine, the identification of biomarkers that are informative (i.e. active) for a response (endpoint) is becoming increasingly important in clinical research. These biomarkers are beneficial to better understand the progression of a disease (prognostic biomarkers) and to better identify patients more likely to benefit from a given treatment (predictive biomarkers). Biomarker data (e.g. genomics, transcriptomics, and proteomics) usually have a high-dimensional nature, with the number of measured biomarkers (variables) much larger than the sample size. However, only a fraction of biomarkers is truly active, therefore raising the need for variable selection. Among various statistical learning approaches, regularized methods such as Lasso have become very popular for high-dimensional variable selection due to their statistical and numerical performance. However, their selection consistency is not guaranteed when the biomarkers are highly correlated. Throughout my PhD, several novel regularized approaches were developed to perform variable selection in this challenging context. More precisely, four methods were proposed in different statistical models (linear regression model, ANCOVA-type model, and logistic regression model). The main idea is to remove the correlations by whitening the design matrix. For one of the methods, results of the sign consistency were established under mild conditions. The proposed approaches were evaluated through simulation studies and applications on publicly available datasets. The results suggest that our approaches are more performant than compared methods for selecting prognostic and predictive biomarkers in high-dimensional (correlated) settings. Three of our methods are implemented in the R packages: WLasso, PPLasso, and WLogit, available from the CRAN (Comprehensive R Archive Network).Avec la rĂ©volution gĂ©nomique et l'arrivĂ©e de la mĂ©decine de prĂ©cision, l'identification de biomarqueurs qui sont explicatifs (biomarqueurs actifs) d'une rĂ©ponse clinique devient de plus en plus importante dans la recherche clinique. Ces biomarqueurs sont utiles pour mieux comprendre la progression d'une maladie (biomarqueurs pronostiques) et pour mieux identifier les patients les plus susceptibles de bĂ©nĂ©ficier d'un traitement donnĂ© (biomarqueurs prĂ©dictifs). Les donnĂ©es relatives aux biomarqueurs (gĂ©nomique, transcriptomique et protĂ©omique, par exemple) sont en gĂ©nĂ©ral de grande dimension, le nombre de biomarqueurs mesurĂ©s (variables) Ă©tant beaucoup plus important que la taille de l'Ă©chantillon. Cependant, seule une fraction des biomarqueurs est rĂ©ellement active, d'oĂč la nĂ©cessitĂ© de sĂ©lectionner les variables. Parmi les divers algorithmes d'apprentissage statistique, les approches rĂ©gularisĂ©es telles que le Lasso sont trĂšs utilisĂ©es pour faire de la sĂ©lection de variables dans des contextes de grande dimension en raison de leurs performances statistiques et numĂ©riques. Cependant, la consistance de leur sĂ©lection n'est pas garantie lorsque les biomarqueurs sont fortement corrĂ©lĂ©s. Au cours de ma thĂšse, plusieurs nouvelles approches ont Ă©tĂ© dĂ©veloppĂ©es pour effectuer la sĂ©lection de variables dans ce contexte difficile. Plus prĂ©cisĂ©ment, quatre mĂ©thodes sont mises en place dans diffĂ©rents modĂšles statistiques (modĂšle de rĂ©gression linĂ©aire, modĂšle de type ANCOVA et modĂšle de rĂ©gression logistique). L'idĂ©e principale est de supprimer les corrĂ©lations en blanchissant la matrice de design. Pour l'une d'entre elles, des rĂ©sultats de la consistance en signe ont Ă©tĂ© obtenus sous des hypothĂšses peu restrictives. Les approches proposĂ©es ont Ă©tĂ© Ă©valuĂ©es par des Ă©tudes de simulation et appliquĂ©es Ă des donnĂ©es publiques. Les rĂ©sultats montrent que les performances statistiques de nos mĂ©thodes sont meilleures que celles de l'Ă©tat de l'art. Nos mĂ©thodes sont implĂ©mentĂ©es dans les packages R suivants : WLasso, PPLasso, et WLogit
A variable selection approach for highly correlated predictors in high-dimensional genomic data
International audienceIn genomic studies, identifying biomarkers associated with a variable of interest is a major concern in biomedical research. Regularized approaches are classically used to perform variable selection in high-dimensional linear models. However, these methods can fail in highly correlated settings.We propose a novel variable selection approach called WLasso, taking these correlations into account. It consists in rewriting the initial high-dimensional linear model to remove the correlation between the biomarkers (predictors) and in applying the generalized Lasso criterion. The performance of WLasso is assessed using synthetic data in several scenarios and compared with recent alternative approaches. The results show that when the biomarkers are highly correlated, WLasso outperforms the other approaches in sparse high-dimensional frameworks. The method is also illustrated on publicly available gene expression data in breast cancer. Our method is implemented in the WLasso R package which is available from the Comprehensive R Archive Network (CRAN)
Identification of prognostic and predictive biomarkers in high-dimensional data with PPLasso
In clinical trials, identification of prognostic and predictive biomarkers is essential to precision medicine. Prognostic biomarkers can be useful for the prevention of the occurrence of the disease, and predictive biomarkers can be used to identify patients with potential benefit from the treatment. Previous researches were mainly focused on clinical characteristics, and the use of genomic data in such an area is hardly studied. A new method is required to simultaneously select prognostic and predictive biomarkers in high dimensional genomic data where biomarkers are highly correlated. We propose a novel approach called PPLasso (Prognostic Predictive Lasso) integrating prognostic and predictive effects into one statistical model. PPLasso also takes into account the correlations between biomarkers that can alter the biomarker selection accuracy. Our method consists in transforming the design matrix to remove the correlations between the biomarkers before applying the generalized Lasso. In a comprehensive numerical evaluation, we show that PPLasso outperforms the traditional Lasso approach on both prognostic and predictive biomarker identification in various scenarios. Finally, our method is applied to publicly available transcriptomic data from clinical trial RV144. Our method is implemented in the PPLasso R package which will be soon available from the Comprehensive R Archive Network (CRAN)
STUDY ON THE 3.5-CELL DC-SC PHOTO-INJECTOR*
Abstract In order to get high quality electron beam for PKU-ERL-FEL project. A 3.5-cell DC-SC photo-injector was designed and optimized. The pierce gun and 3.5-cell superconducting Nb cavity are DC acceleration section and RF acceleration section, respectively. A tuner for the whole 3.5-cell superconducting cavity has been designed. The beam parameters of 3.5-cell DC-SC photo-injector are also presented in this paper. The disadvantage and problem of 1.5-cell DC-SC photo cathode injector which was for principle demonstration have been overcame in the design of 3.5-cell DC-SC photo cathode injector
Sign Consistency of the Generalized Elastic Net Estimator
In this paper, we propose a novel variable selection approach in the framework of high-dimensional linear models where the columns of the design matrix are highly correlated. It consists in rewriting the initial high-dimensional linear model to remove the correlation between the columns of the design matrix and in applying a generalized Elastic Net criterion since it can be seen as an extension of the generalized Lasso.The properties of our approach called gEN (generalized Elastic Net) are investigated both from a theoretical and a numerical point ofview. More precisely, we provide a new condition called GIC (Generalized Irrepresentable Condition) which generalizes the EIC (Elastic Net Irrepresentable Condition) of Jia and Yu (2010) under which we prove that our estimator can recover the positions of the null and non-null entries of the coefficients when the sample size tends to infinity.We also assess the performance of our methodology using synthetic data and compare it with alternative approaches. Our numerical experiments show that our approach improves the variable selection performance in many cases