4 research outputs found

    How to do quantile normalization correctly for gene expression data analyses.

    Full text link
    Quantile normalization is an important normalization technique commonly used in high-dimensional data analysis. However, it is susceptible to class-effect proportion effects (the proportion of class-correlated variables in a dataset) and batch effects (the presence of potentially confounding technical variation) when applied blindly on whole data sets, resulting in higher false-positive and false-negative rates. We evaluate five strategies for performing quantile normalization, and demonstrate that good performance in terms of batch-effect correction and statistical feature selection can be readily achieved by first splitting data by sample class-labels before performing quantile normalization independently on each split ("Class-specific"). Via simulations with both real and simulated batch effects, we demonstrate that the "Class-specific" strategy (and others relying on similar principles) readily outperform whole-data quantile normalization, and is robust-preserving useful signals even during the combined analysis of separately-normalized datasets. Quantile normalization is a commonly used procedure. But when carelessly applied on whole datasets without first considering class-effect proportion and batch effects, can result in poor performance. If quantile normalization must be used, then we recommend using the "Class-specific" strategy

    Growth gone awry: exploring the role of embryonic liver development genes in HCV induced cirrhosis and hepatocellular carcinoma

    Get PDF
    Introduction and methods: Hepatocellular carcinoma (HCC) remains a difficult disease to study even after a decade of genomic analysis. Metabolic and cell-cycle perturbations are known, large changes in tumors that add little to our understanding of the development of tumors, but generate “noise” that obscures potentially important smaller scale expression changes in “driver genes”. Recently, some researchers have suggested that HCC shares pathways involving the master regulators of embryonic development. Here, we investigated the involvement and specificity of developmental genes in HCV-cirrhosis and HCV-HCC. We obtained microarray studies from 30 patients with HCV-cirrhosis and 49 patients with HCV-HCC and compared to 12 normal livers. Differential gene expression is specific to liver development genes: 86 of 202 (43%) genes specific to liver development had differential expression between normal and cirrhotic or HCC samples. Of 60 genes with paralogous function, which are specific to development of other organs and have known associations with other cancer types, none were expressed in either adult normal liver or tumor tissue. Developmental genes are widely differentially expressed in both cirrhosis and early HCC, but not late HCC: 69 liver development genes were differentially expressed in cirrhosis, and 58 of these (84%) were also dysregulated in early HCC. 19/58 (33%) had larger-magnitude changes in cirrhosis and 5 (9%) had larger-magnitude changes in early HCC. 16 (9%) genes were uniquely altered in early tumors, while only 2 genes were uniquely changed in late-stage (T3 and T4) HCC. Together, these results suggest that the involvement of the master regulators of liver development are active in the pre-cancerous cirrhotic liver and in cirrhotic livers with emerging tumors but play a limited role in the transition from early to late stage HCC. Common patterns of coordinated developmental gene expression include: (1) Dysregulation of BMP2 signaling in cirrhosis followed by overexpression of BMP inhibitors in HCC. BMP inhibitor GPC3 was overexpressed in nearly all tumors, while GREM1 was associated specifically with recurrence-free survival after ablation and transplant. (2) Cirrhosis tissues acquire a progenitor-like signature including high expression of Vimentin, EPCAM, and KRT19, and these markers remain over-expressed to a lesser extent in HCC. (3) Hepatocyte proliferation inhibitors (HPI) E-cadherin (CDH1), BMP2, and MST1 were highly expressed in cirrhosis and remained over-expressed in 16 HCC patients who were transplanted with excellent recurrence-free survival (94% survival after 2 years; mean recurrence-free survival = 5.6 yrs), while loss in early HCC was associated with early recurrence and (2 year). Loss of HPI overexpression was also correlated with overexpression of c-MET and loss of STAT3, LAMA2, FGFR2, CITED2, KIT, SMAD7, GATA6, ERBB2, and NOTCH2

    Interplay of genetic, epigenetic and transcription factors in the regulation of transcriptional variation in Plasmodium falciparum

    Get PDF
    Programa de Doctorat en Biomedicina / Tesi realitzada a l'Institut de Salut Global de Barcelona (ISGlobal)[eng] The most severe form of malaria, caused by Plasmodium falciparum parasites, still kills over half a million people every year, most of them children under the age of five. Despite huge research efforts, reduction in the global burden of disease has stalled in recent years. P. falciparum has a very complex life cycle including, among other steps, sexual reproduction in female Anopheles mosquitos and an asexual intra-erythoricitic development cycle (IDC) inside the human host, which causes the disease. During the IDC, the parasite needs to continuously adapt to changes in its environment including fluctuations in blood temperature, concentration of nutrients and other metabolites, presence of drugs, and a constant fight against the host’s immune system. In this thesis, we have studied the adaptation mechanisms of P. falciparum to this plethora of challenges, with a special focus on clonally variant genes (CVGs). In P. falciparum, CVGs are a set of genes, participating in host-parasite interactions, which can be found both in a transcriptionally active state, characterized by euchromatin, or a transcriptionally silenced state, characterized by heterochromatin. The state of CVGs is inherited by the progeny of a parasite, with stochastic switches occurring at a low frequency. Parasites with the most optimal patterns of CVGs expression are continuously selected as the environment changes, leading to adaptation and survival of the infecting population. In the first paper of this thesis, we have analyzed subcloned parasite populations to characterize, with unprecedented detail, the heterochromatin distribution associated with the active and silenced states of CVGs. This has allowed us to define different kinds of heterochromatin transitions between the active and silenced states of CVGs and has given us new insights on the regulation of var genes (one of the main virulence factors for malaria) and into the regulation of sexual conversion, a process crucial for malaria transmission. Continuing with CVG regulation, in the second paper of the thesis, we have analyzed how patterns of CVG expression are established at the onset of human infections, after passage through transmission stages. Our results suggest a loss of the epigenetic memory during transmission stages and a reset of the heterochromatin patterns that drive CVG expression. Similar patterns of CVG expression arose in different infected individuals, suggesting that the activation probability of a given CVG is an intrinsic property of the gene. In the third paper of the thesis, we have further studied the sexual conversion phenomenon. We have generated a conditional over-expression system for pfap2-g, the CVG that acts as master regulator of sexual conversion, achieving sexual conversion rates of ~90% after induction. Our results have provided new insights on how heterochromatin at different positions affects expression of pfap2-g and have allowed us to characterize the transcriptional profile of the initial stages of sexual commitment with unprecedented sensitivity. Finally, in the fourth paper of this thesis, we have studied the adaptation of the parasite to heat-shock, which happens in natural infections due to fever episodes. We expected CVGs to participate in this phenomenon, but instead we have identified pfap2-hs, a non-clonally variant transcription factor (TF), as the main driver of the heat-shock response in P. falciparum. AP2-HS acts as the functional homolog of HSF1 (a TF that drives the heat-shock response from yeast to mammals, but is absent in P. falciparum), driving a very tight transcriptional response to heat-shock, characterized by the up-regulation of hsp70 and hsp90. Although the presence of directed responses had previously been demonstrated for other cues, it is the first time that the transcription factor driving such a response is identified in P. falciparum. Taken together, the results of this thesis have broadened our knowledge of the regulation of adaptive mechanisms in P. falciparum. Learning about this deadly parasite’s defense mechanisms will be instrumental to design better strategies to fight it back in the future

    Interplay of genetic, epigenetic and transcription factors in the regulation of transcriptional variation in Plasmodium falciparum

    Full text link
    [eng] The most severe form of malaria, caused by Plasmodium falciparum parasites, still kills over half a million people every year, most of them children under the age of five. Despite huge research efforts, reduction in the global burden of disease has stalled in recent years. P. falciparum has a very complex life cycle including, among other steps, sexual reproduction in female Anopheles mosquitos and an asexual intra-erythoricitic development cycle (IDC) inside the human host, which causes the disease. During the IDC, the parasite needs to continuously adapt to changes in its environment including fluctuations in blood temperature, concentration of nutrients and other metabolites, presence of drugs, and a constant fight against the host’s immune system. In this thesis, we have studied the adaptation mechanisms of P. falciparum to this plethora of challenges, with a special focus on clonally variant genes (CVGs). In P. falciparum, CVGs are a set of genes, participating in host-parasite interactions, which can be found both in a transcriptionally active state, characterized by euchromatin, or a transcriptionally silenced state, characterized by heterochromatin. The state of CVGs is inherited by the progeny of a parasite, with stochastic switches occurring at a low frequency. Parasites with the most optimal patterns of CVGs expression are continuously selected as the environment changes, leading to adaptation and survival of the infecting population. In the first paper of this thesis, we have analyzed subcloned parasite populations to characterize, with unprecedented detail, the heterochromatin distribution associated with the active and silenced states of CVGs. This has allowed us to define different kinds of heterochromatin transitions between the active and silenced states of CVGs and has given us new insights on the regulation of var genes (one of the main virulence factors for malaria) and into the regulation of sexual conversion, a process crucial for malaria transmission. Continuing with CVG regulation, in the second paper of the thesis, we have analyzed how patterns of CVG expression are established at the onset of human infections, after passage through transmission stages. Our results suggest a loss of the epigenetic memory during transmission stages and a reset of the heterochromatin patterns that drive CVG expression. Similar patterns of CVG expression arose in different infected individuals, suggesting that the activation probability of a given CVG is an intrinsic property of the gene. In the third paper of the thesis, we have further studied the sexual conversion phenomenon. We have generated a conditional over-expression system for pfap2-g, the CVG that acts as master regulator of sexual conversion, achieving sexual conversion rates of ~90% after induction. Our results have provided new insights on how heterochromatin at different positions affects expression of pfap2-g and have allowed us to characterize the transcriptional profile of the initial stages of sexual commitment with unprecedented sensitivity. Finally, in the fourth paper of this thesis, we have studied the adaptation of the parasite to heat-shock, which happens in natural infections due to fever episodes. We expected CVGs to participate in this phenomenon, but instead we have identified pfap2-hs, a non-clonally variant transcription factor (TF), as the main driver of the heat-shock response in P. falciparum. AP2-HS acts as the functional homolog of HSF1 (a TF that drives the heat-shock response from yeast to mammals, but is absent in P. falciparum), driving a very tight transcriptional response to heat-shock, characterized by the up-regulation of hsp70 and hsp90. Although the presence of directed responses had previously been demonstrated for other cues, it is the first time that the transcription factor driving such a response is identified in P. falciparum. Taken together, the results of this thesis have broadened our knowledge of the regulation of adaptive mechanisms in P. falciparum. Learning about this deadly parasite’s defense mechanisms will be instrumental to design better strategies to fight it back in the future
    corecore