3 research outputs found

    Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges

    Get PDF
    International audienceBackground: In high-dimensional data (HDD) settings, the number of variables associated with each observation is very large. Prominent examples of HDD in biomedical research include omics data with a large number of variables such as many measurements across the genome, proteome, or metabolome, as well as electronic health records data that have large numbers of variables recorded for each patient. The statistical analysis of such data requires knowledge and experience, sometimes of complex methods adapted to the respective research questions. Methods: Advances in statistical methodology and machine learning methods offer new opportunities for innovative analyses of HDD, but at the same time require a deeper understanding of some fundamental statistical concepts. Topic group TG9 “High-dimensional data” of the STRATOS (STRengthening Analytical Thinking for Observational Studies) initiative provides guidance for the analysis of observational studies, addressing particular statistical challenges and opportunities for the analysis of studies involving HDD. In this overview, we discuss key aspects of HDD analysis to provide a gentle introduction for non-statisticians and for classically trained statisticians with little experience specific to HDD. Results: The paper is organized with respect to subtopics that are most relevant for the analysis of HDD, in particular initial data analysis, exploratory data analysis, multiple testing, and prediction. For each subtopic, main analytical goals in HDD settings are outlined. For each of these goals, basic explanations for some commonly used analysis methods are provided. Situations are identified where traditional statistical methods cannot, or should not, be used in the HDD setting, or where adequate analytic tools are still lacking. Many key references are provided. Conclusions: This review aims to provide a solid statistical foundation for researchers, including statisticians and non-statisticians, who are new to research with HDD or simply want to better evaluate and understand the results of HDD analyses

    Identification des gènes modifiant l'âge d'apparition du glaucome primaire à angle-ouvert dans une famille canadienne-française fondatrice.

    Get PDF
    Le glaucome est un groupe hétérogène de maladies qui sont caractérisées par l'apoptose des cellules ganglionnaires de la rétine et la dégénérescence progressive du nerf optique. Il s’agit de la première cause de cécité irréversible, qui touche environ 60 millions de personnes dans le monde. Sa forme la plus commune est le glaucome à angle ouvert (GAO), un trouble polygénique causé principalement par une prédisposition génétique, en interaction avec d'autres facteurs de risque tels que l'âge et la pression intraoculaire élevée (PIO). Le GAO est une maladie génétique complexe, bien que certaines formes sévères sont autosomiques dominantes. Dix-sept loci ont été liés à la maladie et acceptés par la « Human Genome Organisation » (HUGO) et cinq gènes ont été identifiés à ces loci (MYOC, OPTN, WDR36, NTF4, ASB10). Récemment, des études d’association sur l’ensemble du génome ont identifié plus de 20 facteurs de risque fréquents, avec des effets relativement faibles. Depuis plus de 50 ans, notre équipe étudie 749 membres de la grande famille canadienne-française CA où la mutation MYOCK423E cause une forme autosomale dominante de GAO dont l’âge de début est fortement variable. Premièrement, il a été montré que cette variabilité de l’âge de début de l'hypertension intraoculaire possède une importante composante génétique causée par au moins un gène modificateur. Ce modificateur interagit avec la mutation primaire et altère la sévérité du glaucome chez les porteurs de MYOCK423E. Un gène modificateur candidat WDR36 a été génotypé dans 2 grandes familles CA et BV. Les porteurs de variations non-synonymes de WDR36 ainsi que de MYOCK423E de la famille CA ont montré une tendance à développer la maladie plus jeune. Un outil de forage de données a été développé pour représenter des informations connues relatives à la maladie et faciliter la priorisation des gènes candidats. Cet outil a été appliqué avec succès à la dépression bipolaire et au glaucome. La suite du projet consiste à finaliser un balayage de génome sur la famille CA et à séquencer les loci afin d’identifier les variations modificatrices du glaucome. Éventuellement, ces variations permettront d’identifier les individus dont le glaucome risque d’être plus agressif.Glaucoma, which is a group of ocular disorders, is characterized by optic nerve atrophy following progressive loss of retinal ganglion cells. It is the leading cause of blindness worldwide which is affecting an estimated 60 million people worldwide. Open angle glaucoma (OAG), the common form of glaucoma, is a polygenic disorder caused mainly by genetic predisposition, in interaction with other risk factors such as age and elevated intraocular pressure (IOP). OAG is a complex genetic disease, although some severe forms are simply autosomal dominant. Seventeen loci were shown to be associated with the disease and are reported by the «Human Genome Organisation» HUGO and five genes have been identified in those loci (MYOC, OPTN, WDR36, NTF4, ASB10). Recently, genome-wide association studies have found more than 20 frequent risk factors with relatively small effects. For more than 50 years, our team have been studying the CA family, a large French-Canadian pedigree (749 people), in which autosomal OAG is caused by the MYOCK423E mutation which is characterised by variable age at onset. It has been demonstrated that the variability of the age at beginning of ocular hypertension has an important genetic component caused by at least one modifier gene. A potential modifier gene, WDR36, has been genotyped in families CA and BV. In the CA family, carriers of non-synonymous WDR36 variations which are also carriers of MYOCK423E, have shown a tendency to develop the disease younger. This modifier interacts with the primary mutation and alters the severity of glaucoma for MYOCK423E mutation carriers. A data-mining tool has been developed to generate graphical diagram of a disease causal model and facilitate the prioritization of the candidate genes. It has been successfully used for bipolar disorder and glaucoma. The next step for this project is to finalize a genome scan of the CA family and to sequence loci with the goal of identifying glaucoma modifier variations. Those variations could potentially allow identification of the individuals for whom glaucoma could be far more aggressive
    corecore