180 research outputs found

    R-Gada: a fast and flexible pipeline for copy number analysis in association studies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome-wide association studies (GWAS) using Copy Number Variation (CNV) are becoming a central focus of genetic research. CNVs have successfully provided target genome regions for some disease conditions where simple genetic variation (i.e., SNPs) has previously failed to provide a clear association.</p> <p>Results</p> <p>Here we present a new R package, that integrates: (i) data import from most common formats of Affymetrix, Illumina and aCGH arrays; (ii) a fast and accurate segmentation algorithm to call CNVs based on Genome Alteration Detection Analysis (GADA); and (iii) functions for displaying and exporting the Copy Number calls, identification of recurrent CNVs, multivariate analysis of population structure, and tools for performing association studies. Using a large dataset containing 270 HapMap individuals (Affymetrix Human SNP Array 6.0 Sample Dataset) we demonstrate a flexible pipeline implemented with the package. It requires less than one minute per sample (3 million probe arrays) on a single core computer, and provides a flexible parallelization for very large datasets. Case-control data were generated from the HapMap dataset to demonstrate a GWAS analysis.</p> <p>Conclusions</p> <p>The package provides the tools for creating a complete integrated pipeline from data normalization to statistical association. It can effciently handle a massive volume of data consisting of millions of genetic markers and hundreds or thousands of samples with very accurate results.</p

    An analysis of single amino acid repeats as use case for application specific background models

    Get PDF
    Background Sequence analysis aims to identify biologically relevant signals against a backdrop of functionally meaningless variation. Increasingly, it is recognized that the quality of the background model directly affects the performance of analyses. State-of-the-art approaches rely on classical sequence models that are adapted to the studied dataset. Although performing well in the analysis of globular protein domains, these models break down in regions of stronger compositional bias or low complexity. While these regions are typically filtered, there is increasing anecdotal evidence of functional roles. This motivates an exploration of more complex sequence models and application-specific approaches for the investigation of biased regions. Results Traditional Markov-chains and application-specific regression models are compared using the example of predicting runs of single amino acids, a particularly simple class of biased regions. Cross-fold validation experiments reveal that the alternative regression models capture the multi-variate trends well, despite their low dimensionality and in contrast even to higher-order Markov-predictors. We show how the significance of unusual observations can be computed for such empirical models. The power of a dedicated model in the detection of biologically interesting signals is then demonstrated in an analysis identifying the unexpected enrichment of contiguous leucine-repeats in signal-peptides. Considering different reference sets, we show how the question examined actually defines what constitutes the 'background'. Results can thus be highly sensitive to the choice of appropriate model training sets. Conversely, the choice of reference data determines the questions that can be investigated in an analysis. Conclusions Using a specific case of studying biased regions as an example, we have demonstrated that the construction of application-specific background models is both necessary and feasible in a challenging sequence analysis situation

    Hierarchical Models in the Brain

    Get PDF
    This paper describes a general model that subsumes many parametric models for continuous data. The model comprises hidden layers of state-space or dynamic causal models, arranged so that the output of one provides input to another. The ensuing hierarchy furnishes a model for many types of data, of arbitrary complexity. Special cases range from the general linear model for static data to generalised convolution models, with system noise, for nonlinear time-series analysis. Crucially, all of these models can be inverted using exactly the same scheme, namely, dynamic expectation maximization. This means that a single model and optimisation scheme can be used to invert a wide range of models. We present the model and a brief review of its inversion to disclose the relationships among, apparently, diverse generative models of empirical data. We then show that this inversion can be formulated as a simple neural network and may provide a useful metaphor for inference and learning in the brain

    Social disparities in food preparation behaviours: a DEDIPAC study

    Get PDF
    BACKGROUND: The specific role of major socio-economic indicators in influencing food preparation behaviours could reveal distinct socio-economic patterns, thus enabling mechanisms to be understood that contribute to social inequalities in health. This study investigated whether there was an independent association of each socio-economic indicator (education, occupation, income) with food preparation behaviours. METHODS: A total of 62,373 adults participating in the web-based NutriNet-Santé cohort study were included in our cross-sectional analyses. Cooking skills, preparation from scratch and kitchen equipment were assessed using a 0-10-point score; frequency of meal preparation, enjoyment of cooking and willingness to cook better/more frequently were categorical variables. Independent associations between socio-economic factors (education, income and occupation) and food preparation behaviours were assessed using analysis of covariance and logistic regression models stratified by sex. The models simultaneously included the three socio-economic indicators, adjusting for age, household composition and whether or not they were the main cook in the household. RESULTS: Participants with the lowest education, the lowest income group and female manual and office workers spent more time preparing food daily than participants with the highest education, those with the highest income and managerial staff (P < 0.0001). The lowest educated individuals were more likely to be non-cooks than those with the highest education level (Women: OR = 3.36 (1.69;6.69); Men: OR = 1.83 (1.07;3.16)) while female manual and office workers and the never-employed were less likely to be non-cooks (OR = 0.52 (0.28;0.97); OR = 0.30 (0.11;0.77)). Female manual and office workers had lower scores of preparation from scratch and were less likely to want to cook more frequently than managerial staff (P < 0.001 and P < 0.001). Women belonging to the lowest income group had a lower score of kitchen equipment (P < 0.0001) and were less likely to enjoy cooking meal daily (OR = 0.68 (0.45;0.86)) than those with the highest income. CONCLUSION: Lowest socio-economic groups, particularly women, spend more time preparing food than high socioeconomic groups. However, female manual and office workers used less raw or fresh ingredients to prepare meals than managerial staff. In the unfavourable context in France with reduced time spent preparing meals over last decades, our findings showed socioeconomic disparities in food preparation behaviours in women, whereas few differences were observed in men

    Genome-Wide Analysis of Transcriptional Reprogramming in Mouse Models of Acute Myeloid Leukaemia

    Get PDF
    Acute leukaemias are commonly caused by mutations that corrupt the transcriptional circuitry of haematopoietic stem/progenitor cells. However, the mechanisms underlying large-scale transcriptional reprogramming remain largely unknown. Here we investigated transcriptional reprogramming at genome-scale in mouse retroviral transplant models of acute myeloid leukaemia (AML) using both gene-expression profiling and ChIP-sequencing. We identified several thousand candidate regulatory regions with altered levels of histone acetylation that were characterised by differential distribution of consensus motifs for key haematopoietic transcription factors including Gata2, Gfi1 and Sfpi1/Pu.1. In particular, downregulation of Gata2 expression was mirrored by abundant GATA motifs in regions of reduced histone acetylation suggesting an important role in leukaemogenic transcriptional reprogramming. Forced re-expression of Gata2 was not compatible with sustained growth of leukaemic cells thus suggesting a previously unrecognised role for Gata2 in downregulation during the development of AML. Additionally, large scale human AML datasets revealed significantly higher expression of GATA2 in CD34+ cells from healthy controls compared with AML blast cells. The integrated genome-scale analysis applied in this study represents a valuable and widely applicable approach to study the transcriptional control of both normal and aberrant haematopoiesis and to identify critical factors responsible for transcriptional reprogramming in human cancer

    A social engineering model for poverty alleviation

    Get PDF
    Poverty, the quintessential denominator of a developing nation, has been traditionally defined against an arbitrary poverty line; individuals (or countries) below this line are deemed poor and those above it, not so! This has two pitfalls. First, absolute reliance on a single poverty line, based on basic food consumption, and not on total consumption distribution, is only a partial poverty index at best. Second, a single expense descriptor is an exogenous quantity that does not evolve from income-expenditure statistics. Using extensive income-expenditure statistics from India, here we show how a self-consistent endogenous poverty line can be derived from an agent-based stochastic model of market exchange, combining all expenditure modes (basic food, other food and non-food), whose parameters are probabilistically estimated using advanced Machine Learning tools. Our mathematical study establishes a consumption based poverty measure that combines labor, commodity, and asset market outcomes, delivering an excellent tool for economic policy formulation

    Stable Genetic Effects on Symptoms of Alcohol Abuse and Dependence from Adolescence into Early Adulthood

    Get PDF
    Relatively little is known about how genetic influences on alcohol abuse and dependence (AAD) change with age. We examined the change in influence of genetic and environmental factors which explain symptoms of AAD from adolescence into early adulthood. Symptoms of AAD were assessed using the four AAD screening questions of the CAGE inventory. Data were obtained up to six times by self-report questionnaires for 8,398 twins from the Netherlands Twin Register aged between 15 and 32 years. Longitudinal genetic simplex modeling was performed with Mx. Results showed that shared environmental influences were present for age 15–17 (57%) and age 18–20 (18%). Unique environmental influences gained importance over time, contributing 15% of the variance at age 15–17 and 48% at age 30–32. At younger ages, unique environmental influences were largely age-specific, while at later ages, age-specific influences became less important. Genetic influences on AAD symptoms over age could be accounted for by one factor, with the relative influence of this factor differing across ages. Genetic influences increased from 28% at age 15–17 to 58% at age 21–23 and remained high in magnitude thereafter. These results are in line with a developmentally stable hypothesis that predicts that a single set of genetic risk factors acts on symptoms of AAD from adolescence into young adulthood
    corecore