2,389 research outputs found

    A Data Mining Approach to Discover Genetic and Environmental Factors involved in Multifactoral Diseases

    Get PDF
    In this paper, we are interested in discovering genetic factors that are involved in multifactorial diseases. Therefore, experiments have been achieved by the Biological Institute of Lille and a lot of data has been generated. To exploit this data, data mining tools are required and we propose a 2-phase optimization approach using a specific genetic algorithm. During the first step, we select significant features with a specific genetic algorithm. Then, during the second step, we cluster affected individuals according to the features selected by the first phase. The paper describes the specificities of the genetic problem that we are studying and presents in details the genetic algorithm that we have developed to deal with this very large size problem of feature selection. Results on both artificial and real data are presented

    THREE METHODS TO INCREASE THE LIKELY TO IDENTIFY GENE INVOLVED IN COMPLEX DISEASE

    Get PDF
    The large part of human pathology is composed by complex disease, such as heart disease, obesity, cancer, diabetes, and many common psychiatric and neurological conditions. The common feature of all these conditions is the multifactorial etiology that involves both genetic and environmental factors. The common disease-common variant (CDCV) hypothesis posits that common, interacting alleles underlie most common diseases, in association with environmental factors. Furthermore, according to the thrift genotype, such alleles have been subjected to selective pressure, mainly those involved in metabolic disease such as T2DM and obesity. Although the concept of gene-environment interaction is central to ecogenetics, and has long been recognized by geneticists (Haldane 1946), there are relatively few detailed descriptions of gene–environment interaction in biomedical literature. This lacking may be explained by difficulties in collecting environmental information of enough quality and by great difficulties in analyze them. Indeed, when the number of factors to analyze is large, become overwhelming the course of dimensionality and the multiple testing problems. In the present thesis the hypothesis that knowledge-driven approaches may improve the ability to identify genes involved in complex disease was checked. Three approaches have been presented, each of them leading to the identification of a factor or of a interaction of factors. As the study a complex disease is composed by three steps: (1) selection of candidate genes, (2) collecting of genetic and non-genetic information and (3) statistical analysis of data, it is showed that each of these steps may be improved by consideration of the biological background. The first study, regarded the possibility to exploit evolutionary information to identify genes involved in type 2 diabetes. This hypothesis was based on the thrifty genotype hypothesis. A gene was identified, ACO1, and was successfully associated to the disease. In the second study, we analyses the case of a gene, PPAGγ that have been inconsistency associated with obesity. We hypothesized that the inconsistence of association may be due to its relationship with environment. Then we jointly analyzed the genotype of the gene and comprehensive nutritional information about a cohort and proved an interaction. The genotype of PPARγ modulated the response to the diet. Ala-carriers gained more weight than ProPro individuals when had the same caloric intake. In the third study, we implemented a software tool to create simulated populations based on gene-environment interactions. The system was based on genetic information to simulate realistic populations. We used these simulated populations to collect information on statistical methods more frequently used to study case-controls samples. Afterward, we built an ensemble of these methods and applied it to a real sample. We showed that ensemble had better performances of each single methods in condition of small sample size

    Pathophysiology of age-related diseases

    Get PDF
    A Symposium regarding the Pathophysiology of Successful and Unsuccessful Ageing was held in Palermo, Italy on 7-8 April 2009. Three lectures from that Symposium by G. Campisi, L. Ginaldi and F. Licastro are here summarized. Ageing is a complex process which negatively impacts on the development of various bodily systems and its ability to function. A long life in a healthy, vigorous, youthful body has always been one of humanity's greatest dreams. Thus, a better understanding of the pathophysiology of age-related diseases is urgently required to improve our understanding of maintaining good health in the elderly and to program possible therapeutic intervention

    Utilising proteomics to understand and define hypertension: where are we and where do we go?

    Get PDF
    Introduction: Hypertension is a complex and multifactorial cardiovascular disorder. With different mechanisms contributing to a different extent to an individual’s blood pressure the discovery of novel pathogenetic principles of hypertension is challenging. However, there is an urgent and unmet clinical need to improve prevention, detection and therapy of hypertension in order to reduce the global burden associated with hypertension-related cardiovascular diseases. Areas covered: Proteomic techniques have been applied in reductionist experimental models including angiotensin II infusion models in rodents and the spontaneously hypertensive rat in order to unravel mechanisms involved in blood pressure control and end organ damage. In humans proteomic studies mainly focus on prediction and detection of organ damage, particularly of heart failure and renal disease. Whilst there are only few proteomic studies specifically addressing human primary hypertension there are more data available in hypertensive disorders in pregnancy such as preeclampsia. We will review these studies and discuss implications of proteomics on precision medicine approaches. Expert commentary: Despite the potential of proteomic studies in hypertension there has been moderate progress in this area of research. Standardised large-scale studies are required in order to make best use of the potential that proteomics offers in hypertension and other cardiovascular diseases

    Unraveling the Genetic Basis of Asthma and Allergic Diseases

    Get PDF
    Asthma and allergic diseases are believed to be complex genetic diseases which may result from the interaction of multiple genetic factors and environmental stimuli. In past decades, great efforts have been exerted in unraveling their genetic basis. The strategies in discovering genes and genetic variants, confirming their importance in pathogenesis of asthma and allergic diseases, as well as their strengths and limitations are summarized comprehensively and concisely. The current consensus about the genetic basis of asthma and allergic diseases is briefly described as well

    Data Mining: How Popular Is It?

    Get PDF
    Data Mining is a process used in the industry, to facilitate decision making. As the name implies, large volumes of data is mined or sifted, to find useful information for decision making. With the advent of E-business, Data Mining has become more important to practitioners. The purpose of this paper is to find out the importance of Data Mining by looking at the different application areas that have used data mining for decision making

    A comparison of internal validation techniques for multifactor dimensionality reduction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It is hypothesized that common, complex diseases may be due to complex interactions between genetic and environmental factors, which are difficult to detect in high-dimensional data using traditional statistical approaches. Multifactor Dimensionality Reduction (MDR) is the most commonly used data-mining method to detect epistatic interactions. In all data-mining methods, it is important to consider internal validation procedures to obtain prediction estimates to prevent model over-fitting and reduce potential false positive findings. Currently, MDR utilizes cross-validation for internal validation. In this study, we incorporate the use of a three-way split (3WS) of the data in combination with a post-hoc pruning procedure as an alternative to cross-validation for internal model validation to reduce computation time without impairing performance. We compare the power to detect true disease causing loci using MDR with both 5- and 10-fold cross-validation to MDR with 3WS for a range of single-locus and epistatic disease models. Additionally, we analyze a dataset in HIV immunogenetics to demonstrate the results of the two strategies on real data.</p> <p>Results</p> <p>MDR with 3WS is computationally approximately five times faster than 5-fold cross-validation. The power to find the exact true disease loci without detecting false positive loci is higher with 5-fold cross-validation than with 3WS before pruning. However, the power to find the true disease causing loci in addition to false positive loci is equivalent to the 3WS. With the incorporation of a pruning procedure after the 3WS, the power of the 3WS approach to detect only the exact disease loci is equivalent to that of MDR with cross-validation. In the real data application, the cross-validation and 3WS analyses indicate the same two-locus model.</p> <p>Conclusions</p> <p>Our results reveal that the performance of the two internal validation methods is equivalent with the use of pruning procedures. The specific pruning procedure should be chosen understanding the trade-off between identifying all relevant genetic effects but including false positives and missing important genetic factors. This implies 3WS may be a powerful and computationally efficient approach to screen for epistatic effects, and could be used to identify candidate interactions in large-scale genetic studies.</p

    Towards a comprehensive characterisation of the human internal chemical exposome: Challenges and perspectives

    Get PDF
    The holistic characterisation of the human internal chemical exposome using high-resolution mass spectrometry (HRMS) would be a step forward to investigate the environmental AE tiology of chronic diseases with an unprecedented precision. HRMS-based methods are currently operational to reproducibly profile thousands of endogenous metabolites as well as externally-derived chemicals and their biotransformation products in a large number of biological samples from human cohorts. These approaches provide a solid ground for the discovery of unrecognised biomarkers of exposure and metabolic effects associated with many chronic diseases. Nevertheless, some limitations remain and have to be overcome so that chemical exposomics can provide unbiased detection of chemical exposures affecting disease susceptibility in epidemiological studies. Some of these limitations include (i) the lack of versatility of analytical techniques to capture the wide diversity of chemicals; (ii) the lack of analytical sensitivity that prevents the detection of exogenous (and endogenous) chemicals occurring at (ultra) trace levels from restricted sample amounts, and (iii) the lack of automation of the annotation/identification process. In this article, we discuss a number of technological and methodological limitations hindering applications of HRMS-based methods and propose initial steps to push towards a more comprehensive characterisation of the internal chemical exposome. We also discuss other challenges including the need for harmonisation and the difficulty inherent in assessing the dynamic nature of the internal chemical exposome, as well as the need for establishing a strong international collaboration, high level networking, and sustainable research infrastructure. A great amount of research, technological development and innovative bio-informatics tools are still needed to profile and characterise the "invisible" (not profiled), "hidden" (not detected) and "dark" (not annotated) components of the internal chemical exposome and concerted efforts across numerous research fields are paramount

    A novel approach to risk exposure and epigenetics—the use of multidimensional context to gain insights into the early origins of cardiometabolic and neurocognitive health

    Get PDF
    Background: Each mother–child dyad represents a unique combination of genetic and environmental factors. This constellation of variables impacts the expression of countless genes. Numerous studies have uncovered changes in DNA methylation (DNAm), a form of epigenetic regulation, in offspring related to maternal risk factors. How these changes work together to link maternal-child risks to childhood cardiometabolic and neurocognitive traits remains unknown. This question is a key research priority as such traits predispose to future non-communicable diseases (NCDs). We propose viewing risk and the genome through a multidimensional lens to identify common DNAm patterns shared among diverse risk profiles. Methods: We identified multifactorial Maternal Risk Profiles (MRPs) generated from population-based data (n = 15,454, Avon Longitudinal Study of Parents and Children (ALSPAC)). Using cord blood HumanMethylation450 BeadChip data, we identified genome-wide patterns of DNAm that co-vary with these MRPs. We tested the prospective relation of these DNAm patterns (n = 914) to future outcomes using decision tree analysis. We then tested the reproducibility of these patterns in (1) DNAm data at age 7 and 17 years within the same cohort (n = 973 and 974, respectively) and (2) cord DNAm in an independent cohort, the Generation R Study (n = 686).Results:We identified twenty MRP-related DNAm patterns at birth in ALSPAC. Four were prospectively related to cardiometabolic and/or neurocognitive childhood outcomes. These patterns were replicated in DNAm data from blood collected at later ages. Three of these patterns were externally validated in cord DNAm data in Generation R. Compared to previous literature, DNAm patterns exhibited novel spatial distribution across the genome that intersects with chromatin functional and tissue-specific signatures. Conclusions: To our knowledge, we are the first to leverage multifactorial population-wide data to detect patterns of variability in DNAm. This context-based approach decreases biases stemming from overreliance on specific samples or variables. We discovered molecular patterns demonstrating prospective and replicable relations to complex traits. Moreover, results suggest that patterns harbour a genome-wide organisation specific to chromatin regulation and target tissues. These preliminary findings warrant further investigation to better reflect the reality of human context in molecular studies of NCDs. Graphical Abstract: [Figure not available: see fulltext.].</p
    corecore