244 research outputs found

    Rgtsp: a generalized top scoring pairs package for class prediction

    Get PDF
    Summary: A top scoring pair (TSP) classifier consists of a pair of variables whose relative ordering can be used for accurately predicting the class label of a sample. This classification rule has the advantage of being easily interpretable and more robust against technical variations in data, as those due to different microarray platforms. Here we describe a parallel implementation of this classifier which significantly reduces the training time, and a number of extensions, including a multi-class approach, which has the potential of improving the classification performance. Availability and Implementation: Full C++ source code and R package Rgtsp are freely available from http://lausanne.isb-sib.ch/~vpopovic/research/. The implementation relies on existing OpenMP libraries. Contact: [email protected]

    MADAP, a flexible clustering tool for the interpretation of one-dimensional genome annotation data

    Get PDF
    A recurring task in the analysis of mass genome annotation data from high-throughput technologies is the identification of peaks or clusters in a noisy signal profile. Examples of such applications are the definition of promoters on the basis of transcription start site profiles, the mapping of transcription factor binding sites based on ChIP-chip data and the identification of quantitative trait loci (QTL) from whole genome SNP profiles. Input to such an analysis is a set of genome coordinates associated with counts or intensities. The output consists of a discrete number of peaks with respective volumes, extensions and center positions. We have developed for this purpose a flexible one-dimensional clustering tool, called MADAP, which we make available as a web server and as standalone program. A set of parameters enables the user to customize the procedure to a specific problem. The web server, which returns results in textual and graphical form, is useful for small to medium-scale applications, as well as for evaluation and parameter tuning in view of large-scale applications, requiring a local installation. The program written in C++ can be freely downloaded from ftp://ftp.epd.unil.ch/pub/software/unix/madap. The MADAP web server can be accessed at http://www.isrec.isb-sib.ch/madap/

    Relationship between estrogen receptor α location and gene induction reveals the importance of downstream sites and cofactors

    Get PDF
    BACKGROUND: To understand cancer-related modifications to transcriptional programs requires detailed knowledge about the activation of signal-transduction pathways and gene expression programs. To investigate the mechanisms of target gene regulation by human estrogen receptor alpha (hERalpha), we combine extensive location and expression datasets with genomic sequence analysis. In particular, we study the influence of patterns of DNA occupancy by hERalpha on expression phenotypes. RESULTS: We find that strong ChIP-chip sites co-localize with strong hERalpha consensus sites and detect nucleotide bias near hERalpha sites. The localization of ChIP-chip sites relative to annotated genes shows that weak sites are enriched near transcription start sites, while stronger sites show no positional bias. Assessing the relationship between binding configurations and expression phenotypes, we find binding sites downstream of the transcription start site (TSS) to be equally good or better predictors of hERalpha-mediated expression as upstream sites. The study of FOX and SP1 cofactor sites near hERalpha ChIP sites shows that induced genes frequently have FOX or SP1 sites. Finally we integrate these multiple datasets to define a high confidence set of primary hERalpha target genes. CONCLUSION: Our results support the model of long-range interactions of hERalpha with the promoter-bound cofactor SP1 residing at the promoter of hERalpha target genes. FOX motifs co-occur with hERalpha motifs along responsive genes. Importantly we show that the spatial arrangement of sites near the start sites and within the full transcript is important in determining response to estrogen signaling

    HIV Modifies the m6A and m5C Epitranscriptomic Landscape of the Host Cell

    Get PDF
    The study of RNA modifications, today known as epitranscriptomics, is of growing interest. The N6-methyladenosine (m6A) and 5-methylcytosine (m5C) RNA modifications are abundantly present on mRNA molecules, and impact RNA interactions with other proteins or molecules, thereby affecting cellular processes, such as RNA splicing, export, stability, and translation. Recently m6A and m5C marks were found to be present on human immunodeficiency (HIV) transcripts as well and affect viral replication. Therefore, the discovery of RNA methylation provides a new layer of regulation of HIV expression and replication, and thus offers novel array of opportunities to inhibit replication. However, no study has been performed to date to investigate the impact of HIV replication on the transcript methylation level in the infected cell. We used a productive HIV infection model, consisting of the CD4+ SupT1 T cell line infected with a VSV-G pseudotyped HIVeGFP-based vector, to explore the temporal landscape of m6A and m5C epitranscriptomic marks upon HIV infection, and to compare it to mock-treated cells. Cells were collected at 12, 24, and 36 h post-infection for mRNA extraction and FACS analysis. M6A RNA modifications were investigated by methylated RNA immunoprecipitation followed by high-throughput sequencing (MeRIP-Seq). M5C RNA modifications were investigated using a bisulfite conversion approach followed by high-throughput sequencing (BS-Seq). Our data suggest that HIV infection impacted the methylation landscape of HIV-infected cells, inducing mostly increased methylation of cellular transcripts upon infection. Indeed, differential methylation (DM) analysis identified 59 m6A hypermethylated and only 2 hypomethylated transcripts and 14 m5C hypermethylated transcripts and 7 hypomethylated ones. All data and analyses are also freely accessible on an interactive web resource (http://sib-pc17.unil.ch/HIVmain.html). Furthermore, bothm6A andm5Cmethylations were detected on viral transcripts and viral particle RNA genomes, as previously described, but additional patterns were identified. This work used differential epitranscriptomic analysis to identify novel players involved in HIV life cycle, thereby providing innovative opportunities for HIV regulation

    The Eukaryotic Promoter Database EPD: the impact of in silico primer extension

    Get PDF
    The Eukaryotic Promoter Database (EPD) is an annotated non‐redundant collection of eukaryotic POL II promoters, experimentally defined by a transcription start site (TSS). There may be multiple promoter entries for a single gene. The underlying experimental evidence comes from journal articles and, starting from release 73, from 5′ ESTs of full‐length cDNA clones used for so‐called in silico primer extension. Access to promoter sequences is provided by pointers to TSS positions in nucleotide sequence entries. The annotation part of an EPD entry includes a description of the type and source of the initiation site mapping data, links to other biological databases and bibliographic references. EPD is structured in a way that facilitates dynamic extraction of biologically meaningful promoter subsets for comparative sequence analysis. Web‐based interfaces have been developed that enable the user to view EPD entries in different formats, to select and extract promoter sequences according to a variety of criteria and to navigate to related databases exploiting different cross‐references. Tools for analysing sequence motifs around TSSs defined in EPD are provided by the signal search analysis server. EPD can be accessed at http://www.epd. isb‐sib.c

    A novel bioinformatics pipeline to discover genes related to arbuscular mycorrhizal symbiosis based on their evolutionary conservation pattern among higher plants

    Get PDF
    Genes involved in arbuscular mycorrhizal (AM) symbiosis have been identified primarily by mutant screens, followed by identification of the mutated genes (forward genetics). In addition, a number of AM-related genes has been identified by their AM-related expression patterns, and their function has subsequently been elucidated by knock-down or knock-out approaches (reverse genetics). However, genes that are members of functionally redundant gene families, or genes that have a vital function and therefore result in lethal mutant phenotypes, are difficult to identify. If such genes are constitutively expressed and therefore escape differential expression analyses, they remain elusive. The goal of this study was to systematically search for AM-related genes with a bioinformatics strategy that is insensitive to these problems. The central element of our approach is based on the fact that many AM-related genes are conserved only among AM-competent species.Results: Our approach involves genome-wide comparisons at the proteome level of AM-competent host species with non-mycorrhizal species. Using a clustering method we first established orthologous/paralogous relationships and subsequently identified protein clusters that contain members only of the AM-competent species. Proteins of these clusters were then analyzed in an extended set of 16 plant species and ranked based on their relatedness among AM-competent monocot and dicot species, relative to non-mycorrhizal species. In addition, we combined the information on the protein-coding sequence with gene expression data and with promoter analysis. As a result we present a list of yet uncharacterized proteins that show a strongly AM-related pattern of sequence conservation, indicating that the respective genes may have been under selection for a function in AM. Among the top candidates are three genes that encode a small family of similar receptor-like kinases that are related to the S-locus receptor kinases involved in sporophytic self-incompatibility.Conclusions: We present a new systematic strategy of gene discovery based on conservation of the protein-coding sequence that complements classical forward and reverse genetics. This strategy can be applied to diverse other biological phenomena if species with established genome sequences fall into distinguished groups that differ in a defined functional trait of interest

    Context-based retrieval of functional modules in protein-protein interaction networks

    Full text link
    Various techniques have been developed for identifying the most probable interactants of a protein under a given biological context. In this article, we dissect the effects of the choice of the protein–protein interaction network (PPI) and the manipulation of PPI settings on the network neighborhood of the influenza A virus (IAV) network, as well as hits in genome-wide small interfering RNA screen results for IAV host factors. We investigate the potential of context filtering, which uses text mining evidence linked to PPI edges, as a complement to the edge confidence scores typically provided in PPIs for filtering, for obtaining more biologically relevant network neighborhoods. Here, we estimate the maximum performance of context filtering to isolate a Kyoto Encyclopedia of Genes and Genomes (KEGG) network Ki from a union of KEGG networks and its network neighborhood. The work gives insights on the use of human PPIs in network neighborhood approaches for functional inference

    A Well-Controlled Experimental System to Study Interactions of Cytotoxic T Lymphocytes with Tumor Cells.

    Get PDF
    While T cell-based immunotherapies are steadily improving, there are still many patients who progress, despite T cell-infiltrated tumors. Emerging evidence suggests that T cells themselves may provoke immune escape of cancer cells. Here, we describe a well-controlled co-culture system for studying the dynamic T cell - cancer cell interplay, using human melanoma as a model. We explain starting material, controls, and culture parameters to establish reproducible and comparable cultures with highly heterogeneous tumor cells. Low passage melanoma cell lines and melanoma-specific CD8+ T cell clones generated from patient blood were cultured together for up to 3 days. Living melanoma cells were isolated from the co-culture system by fluorescence-activated cell sorting. We demonstrate that the characterization of isolated melanoma cells is feasible using flow cytometry for protein expression analysis as well as an Agilent whole human genome microarray and the NanoString technology for differential gene expression analysis. In addition, we identify five genes (ALG12, GUSB, RPLP0, KRBA2, and ADAT2) that are stably expressed in melanoma cells independent of the presence of T cells or the T cell-derived cytokines IFNγ and TNFα. These genes are essential for correct normalization of gene expression data by NanoString. Further to the characterization of melanoma cells after exposure to CTLs, this experimental system might be suitable to answer a series of questions, including how the affinity of CTLs for their target antigen influences the melanoma cell response and whether CTL-induced gene expression changes in melanoma cells are reversible. Taken together, our human T cell - melanoma cell culture system is well suited to characterize immune-related mechanisms in cancer cells
    corecore