12 research outputs found

    Rare variant collapsing in conjunction with mean log p-value and gradient boosting approaches applied to Genetic Analysis Workshop 17 data

    Get PDF
    In addition to methods that can identify common variants associated with susceptibility to common diseases, there has been increasing interest in approaches that can identify rare genetic variants. We use the simulated data provided to the participants of Genetic Analysis Workshop 17 (GAW17) to identify both rare and common single-nucleotide polymorphisms and pathways associated with disease status. We apply a rare variant collapsing approach and the usual association tests for common variants to identify candidates for further analysis using pathway-based and tree-based ensemble approaches. We use the mean log p-value approach to identify a top set of pathways and compare it to those used in simulation of GAW17 dataset. We conclude that the mean log p-value approach is able to identify those pathways in the top list and also related pathways. We also use the stochastic gradient boosting approach for the selected subset of single-nucleotide polymorphisms. When compared the result of this tree-based method with the list of single-nucleotide polymorphisms used in dataset simulation, in addition to correct SNPs we observe number of false positives

    Classification and multiple testing for microarray data

    No full text
    This thesis aims to provide a solution to the classification and hypothesis testing problems as well as to create a tool to perform clustering, hypothesis testing or classification tasks automatically via simple menu-driven interface. Since the first appearance of microarrays in 1995, they became a technique for large gene expression screening worldwide. The quantity of data generated from microarray experiments is enormous, requiring new careful methods of analysis of these high-dimensional data. One of the problems encountered when dealing with this type of data is overfitting. Overfitting happens when information selected is related to the condition of interest only by chance. This thesis consists of four major parts. The first part contains the overview of microarray methodology and current techniques applied to analyze gene expression data. The second part uses partial least squares themed idea to develop the algorithm where one can control the FDR (false discovery rate) to extract differentially expressed genes in the analysis of gene expression data. The above procedure can be either used separately or as a part of the scheme where it provides weights that can be used together with another selection method or as a part of ensemble. The third part of the thesis deals with the problem of comparing several treatments to the control. In the setting where one wants to find a ‘bump’ in measurements of several groups, the test statistic is considered that is based on maximum and minimum of the group mean differences. Then the derived distribution of a proposed test statistic can be used to make inferences. The fourth part describes the software developed to provide a menu-driven computing environment for data manipulation and analysis. It includes different methods that can be used to compare expression profiles of genes and methods for gene clustering and various visualization and exploration.Ph.D.Includes bibliographical referencesIncludes vitaby Yauheniya Cherka

    ABC gene-ranking for prediction of drug-induced cholestasis in rats

    Get PDF
    As legacy toxicogenomics databases have become available, improved data mining approaches are now key to extracting and visualizing subtle relationships between toxicants and gene expression. In the present study, a novel “aggregating bundles of clusters” (ABC) procedure was applied to separate cholestatic from non-cholestatic drugs and model toxicants in the Johnson & Johnson (Janssen) rat liver toxicogenomics database [3]. Drug-induced cholestasis is an important issue, particularly when a new compound enters the market with this liability, with standard preclinical models often mispredicting this toxicity. Three well-characterized cholestasis-responsive genes (Cyp7a1, Mrp3 and Bsep) were chosen from a previous in-house Janssen gene expression signature; these three genes show differing, non-redundant responses across the 90+ paradigm compounds in our database. Using the ABC procedure, extraneous contributions were minimized in comparisons of compound gene responses. All genes were assigned weights proportional to their correlations with Cyp7a1, Mrp3 and Bsep, and a resampling technique was used to derive a stable measure of compound similarity. The compounds that were known to be associated with rat cholestasis generally had small values of this measure relative to each other but also had large values of this measure relative to non-cholestatic compounds. Visualization of the data with the ABC-derived signature showed a very tight, essentially identically behaving cluster of robust human cholestatic drugs and experimental cholestatic toxicants (ethinyl estradiol, LPS, ANIT and methylene dianiline, disulfiram, naltrexone, methapyrilene, phenacetin, alpha-methyl dopa, flutamide, the NSAIDs–—indomethacin, flurbiprofen, diclofenac, flufenamic acid, sulindac, and nimesulide, butylated hydroxytoluene, piperonyl butoxide, and bromobenzene), some slightly less active compounds (3′-acetamidofluorene, amsacrine, hydralazine, tannic acid), some drugs that behaved very differently, and were distinct from both non-cholestatic and cholestatic drugs (ketoconazole, dipyridamole, cyproheptadine and aniline), and many postulated human cholestatic drugs that in rat showed no evidence of cholestasis (chlorpromazine, erythromycin, niacin, captopril, dapsone, rifampicin, glibenclamide, simvastatin, furosemide, tamoxifen, and sulfamethoxazole). Most of these latter drugs were noted previously by other groups as showing cholestasis only in humans. The results of this work suggest that the ABC procedure and similar statistical approaches can be instrumental in combining data to compare toxicants across toxicogenomics databases, extract similarities among responses and reduce unexplained data varation. Keywords: Cluster analysis, Cholestasis, Gene signature, Microarray, Prediction, Toxicogenomic

    Hindsinght of Habsburgs Empire by historians from the United Kingdom

    No full text
    V tématu své práce bych se chtěla věnovat srovnání přístupu k nazírání na dějiny Habsurské monarchie. Budu porovnávat přístup českých a anglických historiků a jejich pohledy na naše dějiny či dějiny Habsburské monarchie. Bude mezi jejich a naším pohledem rozdíl? Vzhledem k tomu, že nepředpokládám velký zápal Britů o znalost dějin habsburské monarchie, budu své téma rozšiřovat i někdy o starší a nové děj iny Českých zemí. Jelikož se s děj inami setkáváme ve vyučovacích hodinách, tak v didaktické části práce budu porovnávat přístup k vyučování dějepisu u nás a ve Velké Británi. Vyberu si určitou věkovou skupinu žáků a porovnáme učebnice používané k výuce u nich s našimi. Stejně jako jejich vzdělávací systém. Později svému zkoumání podrobím všeobecné historické publikace vydané ve Velké Británii. Otázkou výzkumu bude, zda vůbec se v dějepisných encyklopediích či odborných publikacích objevují události z našich dějin či nikoliv? A pokud ano, které události z naší minulosti to jsou? Výstupem z porovnávání publikací bude pro mne orientace v tématech, které znají či neznají britští autoři. Kam budou sahat jejich znalosti? Která témata jsou vykládána rozdílně oproti jejich pojetí u nás? To jsou mé hlavní otázky, jenž Sl kladu. Aplikovat budu nakonec svůj poznatek zcela konkrétně. Vyberu Sl jedno z problematických..

    Additional file 7: Figure S3. of Integrative genomic deconvolution of rheumatoid arthritis GWAS loci into gene and cell type associations

    No full text
    Genes associated with RA GWAS in T cell specific epigenomic datasets. Heatmap of genes associated with RA GWAS SNPs overlapping enhancers in the shown T cell datasets. PB, peripheral blood. Two T cell epigenomes uniquely identify genes that might be explained by unique aspects of the selection markers and (when carried out) in vitro differentiation protocols: (1) “Primary T cells from PB” was the only T cell sample to use CD3+ as a selection marker; and (2) “Primary T helper cells PMA-I stimulated” was the only sample that used Magnetic-activated cell sorting (MACS) [5]. The differences between the two “Primary T helper memory cells from PB” samples might be explained by their different differentiation protocols (number 1 uniquely used CD25M and CD45RO as selection markers) or by the differing donors of origin [5]. (PDF 7665 kb
    corecore