40 research outputs found
Combining Support Vector Machines to Predict Novel Angiogenesis Genes
Vähk on tänapäeval üks levinumaid ja ohtlikumaid haigusi põhjustades igal aastal 13% kõigist surmajuhtumitest üle maailma. Hoolimata aastatepikkustest jõupingutustest ei ole seni ikka veel efektiivset ravi selle haiguse vastu leitud. Küll on aga teada, et vähi arengus on olulisel kohal angiogenees, mille käigus vähk paneb enda ümber asuvad veresooned hargnema ja kasvama. Parem arusaamine sellest protsessist võimaldaks potentsiaalselt luua uusi ja efektiivsemaid ravimeetodeid.
Aastate jooksul tehtud eksperimentide käigus on mõõdetud enamiku inimese geenide ekpressiooni rohkem kui 5000 tingimuses. Lisaks on meie koostööpartnerid koostanud nimekirja 341-st veresoonte loomega seotud geenist. Käesoleva töö eesmärgiks ongi uurida, kuidas geeniekspressiooni andmete ja väikese hulga tuntud angiogeneesi geenide põhjal on võimalik ennustada uusi angiogeneesiga seotud geene.
Selleks võrreldakse kõigepealt mitmeid olemasolevaid masinõppe meetodeid ja avalikult kättesaadavaid bioinformaatika tööriistu, mida saaks kasutada kandidaatgeenide ennustamiseks. Kõigi nende meetodite puhul kasutatakse sisendiks võimalikult sarnaseid andmeid ning mõõdetakse siis 10-kordse ristvalideerimise abil, kui edukad need on juba tuntud angiogeneesi geenide ülesleidmisel.
Töö teises osas pakutakse välja uudne Comb-SVM meetod kandidaatgeenide ennustamiseks. Selle põhiidee baseerub kolmel sammul. Kõigepealt kasutatakse juba tuntud angiogeneesi geene ning juhuslikult valitud negatiivseid geene, et treenida paralleelselt mitu tugivektormasinal (ingl k Support Vector Machine) põhinevat klassifitseerijat. Järgnevalt kasutakse neid klassifitseerijaid uute angiogeneesi geenide ennustamiseks. Viimaks agregeeritakse kõigi klassifitseerijate tulemused kokku üheks ennustuseks.
Töö lõpus näidatakse, et 10-kordse ristvalideerimise põhjal on Comb-SVM täpsem kui enamik olemasolevaid meetodeid. Lisaks näidatakse, et Comb-SVM ennustused on oluliselt stabiilsemad väikeste muudatuste suhtes treeningandmetes kui paremuselt teise algoritmi tulemused. Kõige lõpuks kasu- tatakse teaduskirjandust ning Gene Ontology andmebaasi veendumaks, et uued ennustatud geenid on tõpoolest seotud angiogeneesiga.Angiogenesis is the process of growing new blood vessels. It is part of normal bodily functions like wound healing, but it also plays an important role in cancer development. Without angiogenesis, tumors would not be able to grow larger than 1-2 millimeters in diameter due to the lack of oxygen and nutrients. However, only a part of the genes involved in angiogenesis are known.
In this work, we proposed a new Comb-SVM machine learning method to predict new members to the positive class, that does not require a clearly defined negative examples. The idea is to train multiple Support Vector Machines (SVMs) using known genes as positive samples and various randomly selected sets of genes as negative examples. The multiple SVMs are then used to separately classify all remaining human genes and the results are finally aggregated using a rank aggregation algorithm. The outcome is a list of genes ranked according to their similarity to known input genes.
We applied this method to 341 known angiogenesis genes. Experiments were conducted on a large Affymetrix microarray gene expression matrix consisting of 5732 experiments and 22283 probe sets obtained from ArrayExpress. We compared Comb-SVM to many other state-of-the-art approaches. According to cross-validation experiments, our method outperformed most of the existing methods when looking at areas under Receiver Operator Characteristic and Precision-Recall curves. We also determined that our method gave significantly more stable results than the second best approach. Finally, we verified the biological relevance of the predicted genes by searching the literature and Gene Ontology
Recommended from our members
Regulation of gene expression in macrophage immune response
Gene expression quantitative trait loci (eQTL) mapping studies can provide mechanistic insights into the functions of disease-associated variants. However, many eQTLs are cell type and context specific. This is particularly relevant for immune cells, whose cellular function and behaviour can be substantially altered by external cues. Furthermore, understanding mechanisms behind eQTLs is hindered by the difficulty of identifying causal variants. We differentiated macrophages from induced pluripotent stem cells from 86 unrelated, healthy individuals derived as part of the Human Induced Pluripotent Stem Cells Initiative. We generated RNA-seq data from these cells in four experimental conditions: naïve, interferon- gamma (IFNɣ) treatment (18h), Salmonella infection (5h), and IFNγ treatment followed by Salmonella infection. We also measured chromatin accessibility with ATAC-seq in 31-42 individuals in the same four conditions. We detected gene expression QTLs (eQTLs) for 4326 genes, over 900 of which were condition-specific. We also detected a similar number of transcript ratio QTLs (trQTLs) that influenced mRNA processing and alternative splicing. Macrophage eQTLs and trQTLs were enriched for variants associated with Alzheimer’s disease, multiple autoimmune disorders and lipid traits. We also detected chromatin accessibility QTLs (caQTLs) for 14,602 accessible regions, including hundreds of long-range interactions. Joint analysis of eQTLs with caQTLs allowed us to greatly reduce the set of credible causal variants, often pinpointing to a single most likely variant. We found that caQTLs were less condition- specific than eQTLs and ~50% of the stimulation-specific eQTLs manifested on the chromatin level already in the naive cells. These observations might help to explain the discrepancy between strong enrichment of diseases associations in regulatory elements but only modest overlap with current eQTL studies, suggesting that many regulatory elements are in a ‘primed’ state waiting for an appropriate environmental signal before regulating gene expression.Wellcome Trust scholarship for the PhD programme in Mathematical Genomics and Medicin
eQTL Catalogue 2023: New datasets, X chromosome QTLs, and improved detection and visualisation of transcript-level QTLs
The eQTL Catalogue is an open database of uniformly processed human molecular quantitative trait loci (QTLs). We are continuously updating the resource to further increase its utility for interpreting genetic associations with complex traits. Over the past two years, we have increased the number of uniformly processed studies from 21 to 31 and added X chromosome QTLs for 19 compatible studies. We have also implemented Leafcutter to directly identify splice-junction usage QTLs in all RNA sequencing datasets. Finally, to improve the interpretability of transcript-level QTLs, we have developed static QTL coverage plots that visualise the association between the genotype and average RNA sequencing read coverage in the region for all 1.7 million fine mapped associations. To illustrate the utility of these updates to the eQTL Catalogue, we performed colocalisation analysis between vitamin D levels in the UK Biobank and all molecular QTLs in the eQTL Catalogue. Although most GWAS loci colocalised both with eQTLs and transcript-level QTLs, we found that visual inspection could sometimes be used to distinguish primary splicing QTLs from those that appear to be secondary consequences of large-effect gene expression QTLs. While these visually confirmed primary splicing QTLs explain just 6/53 of the colocalising signals, they are significantly less pleiotropic than eQTLs and identify a prioritised causal gene in 4/6 cases
Common genetic variation drives molecular heterogeneity in human iPSCs.
Technology utilizing human induced pluripotent stem cells (iPS cells) has enormous potential to provide improved cellular models of human disease. However, variable genetic and phenotypic characterization of many existing iPS cell lines limits their potential use for research and therapy. Here we describe the systematic generation, genotyping and phenotyping of 711 iPS cell lines derived from 301 healthy individuals by the Human Induced Pluripotent Stem Cells Initiative. Our study outlines the major sources of genetic and phenotypic variation in iPS cells and establishes their suitability as models of complex human traits and cancer. Through genome-wide profiling we find that 5-46% of the variation in different iPS cell phenotypes, including differentiation capacity and cellular morphology, arises from differences between individuals. Additionally, we assess the phenotypic consequences of genomic copy-number alterations that are repeatedly observed in iPS cells. In addition, we present a comprehensive map of common regulatory variants affecting the transcriptome of human pluripotent cells
Loss of IL-10 signaling in macrophages limits bacterial killing driven by prostaglandin E2
Loss of IL-10 signaling in macrophages (Mφs) leads to inflammatory bowel disease (IBD). Induced pluripotent stem cells
(iPSCs) were generated from an infantile-onset IBD patient lacking a functional IL10RB gene. Mφs differentiated from IL10RB−/− iPSCs lacked IL-10RB mRNA expression, were unable to phosphorylate STAT3, and failed to reduce LPS induced
inflammatory cytokines in the presence of exogenous IL-10. IL-10RB−/− Mφs exhibited a striking defect in their ability to kill
Salmonella enterica serovar Typhimurium, which was rescuable after experimentally introducing functional copies of the IL10RB
gene. Genes involved in synthesis and receptor pathways for eicosanoid prostaglandin E2 (PGE2) were more highly induced in
IL-10RB−/− Mφs, and these Mφs produced higher amounts of PGE2 after LPS stimulation compared with controls.
Furthermore, pharmacological inhibition of PGE2 synthesis and PGE2 receptor blockade enhanced bacterial killing in Mφs.
These results identify a regulatory interaction between IL-10 and PGE2, dysregulation of which may drive aberrant Mφ
activation and impaired host defense contributing to IBD pathogenesis
Elucidating the transcriptional regulatory network controlling the TPO1 response to benzoic acid in yeast
Multidrug resistance (MDR) is the simultaneous acquisition of resistance to wide range of structurally and functionally unrelated cytotoxic chemical compounds that has severe consequences in cancer therapy, agriculture and food industry.
Saccharomyces cerevisiae is a well-established model organism used to study the mechanisms of MDR.
In yeast and other related organisms, MDR is often caused by drug-efflux pumps that are able to export a wide range of unrelated chemicals.
Tpo1, a drug:H+ antiporter of the major facilitator superfamily, is one such drug-efflux pump.
In the current work, our aim was to characterize the transcriptional regulatory network controlling TPO1 response to benzoic acid.
We have employed two complementary approaches to achieve this aim.
First, we have used RT-PCR to measure the transcript levels of Tpo1 and five of its known and putative regulators (GCN4, STP1, STP2, PDR1, PDR3) over a time course in wild type and respective deletion mutants.
We have subsequently used this information to construct a logical model of TPO1 regulation.
In the second part, we have developed a computational approach that combines data from multiple public sources to predict novel regulators for TPO1 and we have verified some of the prediction experimentally using, ß-galactosidase assays.
Our results indicate that in benzoic acid stress, Pdrl/Pdr3 seem to play no role in regulating TPO1 and instead, a complex interplay between Gcn4, and Stp1 is responsible for the up regulation of TP01.
Screening for new regulators revealed Hal9 and Ash1 that seem to be repressing TPOl expression in control conditions or in benzoic acid stress, respectively.
Furthermore, multiple transcription factors previously implicated in pseudohyphal growth also have a small effect on TPOl expression
Chromatin accessibility QTL lead variants in macrophages stimulated with IFNg and Salmonella
Lead caQTL variants from RASQUAL and FastQTL analyses
Summary statistics of transcript usage QTLs in naive and stimulated macrophages (part 1)
<p>Summary statistics of transcript usage QTLs in naive and stimulated macrophages</p
Revised transcript annotations for GRCh38 reference genome and Ensembl v87.
Custom transcript annotations generated using the reviseAnnotations package.
Reference genome: GRCh38
Ensembl version: 87
See the GitHub page of reviseAnnotations for more details:
https://github.com/kauralasoo/reviseAnnotation