81 research outputs found

    Predicting variation of DNA shape preferences in protein-DNA interaction in cancer cells with a new biophysical model

    Full text link
    DNA shape readout is an important mechanism of target site recognition by transcription factors, in addition to the sequence readout. Several models of transcription factor-DNA binding which consider DNA shape have been developed in recent years. We present a new biophysical model of protein-DNA interaction by considering the DNA shape features, which is based on a neighbour dinucleotide dependency model BayesPI2. The parameters of the new model are restricted to a subspace spanned by the 2-mer DNA shape features, which allowing a biophysical interpretation of the new parameters as position-dependent preferences towards certain values of the features. Using the new model, we explore the variation of DNA shape preferences in several transcription factors across cancer cell lines and cellular conditions. We find evidence of DNA shape variations at FOXA1 binding sites in MCF7 cells after treatment with steroids. The new model is useful for elucidating finer details of transcription factor-DNA interaction. It may be used to improve the prediction of cancer mutation effects in the future

    A new framework for identifying combinatorial regulation of transcription factors: A case study of the yeast cell cycle

    Get PDF
    AbstractBy integrating heterogeneous functional genomic datasets, we have developed a new framework for detecting combinatorial control of gene expression, which includes estimating transcription factor activities using a singular value decomposition method and reducing high-dimensional input gene space by considering genomic properties of gene clusters. The prediction of cooperative gene regulation is accomplished by either Gaussian Graphical Models or Pairwise Mixed Graphical Models. The proposed framework was tested on yeast cell cycle datasets: (1) 54 known yeast cell cycle genes with 9 cell cycle regulators and (2) 676 putative yeast cell cycle genes with 9 cell cycle regulators. The new framework gave promising results on inferring TF–TF and TF-gene interactions. It also revealed several interesting mechanisms such as negatively correlated protein–protein interactions and low affinity protein–DNA interactions that may be important during the yeast cell cycle. The new framework may easily be extended to study other higher eukaryotes

    Tumor classification and marker gene prediction by feature selection and fuzzy c-means clustering using microarray data

    Get PDF
    BACKGROUND: Using DNA microarrays, we have developed two novel models for tumor classification and target gene prediction. First, gene expression profiles are summarized by optimally selected Self-Organizing Maps (SOMs), followed by tumor sample classification by Fuzzy C-means clustering. Then, the prediction of marker genes is accomplished by either manual feature selection (visualizing the weighted/mean SOM component plane) or automatic feature selection (by pair-wise Fisher's linear discriminant). RESULTS: The proposed models were tested on four published datasets: (1) Leukemia (2) Colon cancer (3) Brain tumors and (4) NCI cancer cell lines. The models gave class prediction with markedly reduced error rates compared to other class prediction approaches, and the importance of feature selection on microarray data analysis was also emphasized. CONCLUSIONS: Our models identify marker genes with predictive potential, often better than other available methods in the literature. The models are potentially useful for medical diagnostics and may reveal some insights into cancer classification. Additionally, we illustrated two limitations in tumor classification from microarray data related to the biology underlying the data, in terms of (1) the class size of data, and (2) the internal structure of classes. These limitations are not specific for the classification models used

    M-CGH: Analysing microarray-based CGH experiments

    Get PDF
    BACKGROUND: Microarray-based comparative genomic hybridisation (array CGH) is a technique by which variation in relative copy numbers between two genomes can be analysed by competitive hybridisation to DNA microarrays. This technology has most commonly been used to detect chromosomal amplifications and deletions in cancer. Dedicated tools are needed to analyse the results of such experiments, which include appropriate visualisation, and to take into consideration the physical relation in the genome between the probes on the array. RESULTS: M-CGH is a MATLAB toolbox with a graphical user interface designed specifically for the analysis of array CGH experiments, with multiple approaches to ratio normalization. Specifically, the distributions of three classes of DNA copy numbers (gains, normal and losses) can be estimated using a maximum likelihood method. Amplicon boundaries are computed by either the fuzzy K-nearest neighbour method or a wavelet approach. The program also allows linking each genomic clone with the corresponding genomic information in the Ensembl database . CONCLUSIONS: M-CGH, which encompasses the basic tools needed for analysing array CGH experiments, is freely available for academics , and does not require any other MATLAB toolbox

    Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study

    Get PDF
    BACKGROUND: A method to evaluate and analyze the massive data generated by series of microarray experiments is of utmost importance to reveal the hidden patterns of gene expression. Because of the complexity and the high dimensionality of microarray gene expression profiles, the dimensional reduction of raw expression data and the feature selections necessary for, for example, classification of disease samples remains a challenge. To solve the problem we propose a two-level analysis. First self-organizing map (SOM) is used. SOM is a vector quantization method that simplifies and reduces the dimensionality of original measurements and visualizes individual tumor sample in a SOM component plane. Next, hierarchical clustering and K-means clustering is used to identify patterns of gene expression useful for classification of samples. RESULTS: We tested the two-level analysis on public data from diffuse large B-cell lymphomas. The analysis easily distinguished major gene expression patterns without the need for supervision: a germinal center-related, a proliferation, an inflammatory and a plasma cell differentiation-related gene expression pattern. The first three patterns matched the patterns described in the original publication using supervised clustering analysis, whereas the fourth one was novel. CONCLUSIONS: Our study shows that by using SOM as an intermediate step to analyze genome-wide gene expression data, the gene expression patterns can more easily be revealed. The "expression display" by the SOM component plane summarises the complicated data in a way that allows the clinician to evaluate the classification options rather than giving a fixed diagnosis

    BayesPI-BAR2: A New Python Package for Predicting Functional Non-coding Mutations in Cancer Patient Cohorts

    Get PDF
    Most of somatic mutations in cancer occur outside of gene coding regions. These mutations may disrupt the gene regulation by affecting protein-DNA interaction. A study of these disruptions is important in understanding tumorigenesis. However, current computational tools process DNA sequence variants individually, when predicting the effect on protein-DNA binding. Thus, it is a daunting task to identify functional regulatory disturbances among thousands of mutations in a patient. Previously, we have reported and validated a pipeline for identifying functional non-coding somatic mutations in cancer patient cohorts, by integrating diverse information such as gene expression, spatial distribution of the mutations, and a biophysical model for estimating protein binding affinity. Here, we present a new user-friendly Python package BayesPI-BAR2 based on the proposed pipeline for integrative whole-genome sequence analysis. This may be the first prediction package that considers information from both multiple mutations and multiple patients. It is evaluated in follicular lymphoma and skin cancer patients, by focusing on sequence variants in gene promoter regions. BayesPI-BAR2 is a useful tool for predicting functional non-coding mutations in whole genome sequencing data: it allows identification of novel transcription factors (TFs) whose binding is altered by non-coding mutations in cancer. BayesPI-BAR2 program can analyze multiple datasets of genome-wide mutations at once and generate concise, easily interpretable reports for potentially affected gene regulatory sites. The package is freely available at http://folk.uio.no/junbaiw/BayesPI-BAR2/

    Characterizing a collective and dynamic component of chromatin immunoprecipitation enrichment profiles in yeast

    Get PDF
    Background: Recent chromatin immunoprecipitation (ChIP) experiments in fly, mouse, and human have revealed the existence of high-occupancy target (HOT) regions or “hotspots” that show enrichment across many assayed DNA-binding proteins. Similar co-enrichment observed in yeast so far has been treated as artifactual, and has not been fully characterized. Results: Here we reanalyze ChIP data from both array-based and sequencing-based experiments to show that in the yeast S. cerevisiae, the collective enrichment phenomenon is strongly associated with proximity to noncoding RNA genes and with nucleosome depletion. DNA sequence motifs that confer binding affinity for the proteins are largely absent from these hotspots, suggesting that protein-protein interactions play a prominent role. The hotspots are condition-specific, suggesting that they reflect a chromatin state or protein state, and are not a static feature of underlying sequence. Additionally, only a subset of all assayed factors is associated with these loci, suggesting that the co-enrichment cannot be simply explained by a chromatin state that is universally more prone to immunoprecipitation. Conclusions: Together our results suggest that the co-enrichment patterns observed in yeast represent transcription factor co-occupancy. More generally, they make clear that great caution must be used when interpreting ChIP enrichment profiles for individual factors in isolation, as they will include factor-specific as well as collective contributions

    Application of new probabilistic graphical models in the genetic regulatory networks studies

    Get PDF
    This paper introduces two new probabilistic graphical models for reconstruction of genetic regulatory networks using DNA microarray data. One is an Independence Graph (IG) model with either a forward or a backward search algorithm and the other one is a Gaussian Network (GN) model with a novel greedy search method. The performances of both models were evaluated on four MAPK pathways in yeast and three simulated data sets. Generally, an IG model provides a sparse graph but a GN model produces a dense graph where more information about gene-gene interactions is preserved. Additionally, we found two key limitations in the prediction of genetic regulatory networks using DNA microarray data, the first is the sufficiency of sample size and the second is the complexity of network structures may not be captured without additional data at the protein level. Those limitations are present in all prediction methods which used only DNA microarray data.Comment: 38 pages, 3 figure

    POLD2 and KSP37 (FGFBP2) Correlate Strongly with Histology, Stage and Outcome in Ovarian Carcinomas

    Get PDF
    BACKGROUND:Epithelial ovarian cancer (EOC) constitutes more than 90% of ovarian cancers and is associated with high mortality. EOC comprises a heterogeneous group of tumours, and the causes and molecular pathology are essentially unknown. Improved insight into the molecular characteristics of the different subgroups of EOC is urgently needed, and should eventually lead to earlier diagnosis as well as more individualized and effective treatments. Previously, we reported a limited number of mRNAs strongly upregulated in human osteosarcomas and other malignancies, and six were selected to be tested for a possible association with three subgroups of ovarian carcinomas and clinical parameters. METHODOLOGY/PRINCIPAL FINDINGS:The six selected mRNAs were quantified by RT-qPCR in biopsies from eleven poorly differentiated serous carcinomas (PDSC, stage III-IV), twelve moderately differentiated serous carcinomas (MDSC, stage III-IV) and eight clear cell carcinomas (CCC, stage I-IV) of the ovary. Superficial scrapings from six normal ovaries (SNO), as well as biopsies from three normal ovaries (BNO) and three benign ovarian cysts (BBOC) were analyzed for comparison. The gene expression level was related to the histological and clinical parameters of human ovarian carcinoma samples. One of the mRNAs, DNA polymerase delta 2 small subunit (POLD2), was increased in average 2.5- to almost 20-fold in MDSC and PDSC, respectively, paralleling the degree of dedifferentiation and concordant with a poor prognosis. Except for POLD2, the serous carcinomas showed a similar transcription profile, being clearly different from CCC. Another mRNA, Killer-specific secretory protein of 37 kDa (KSP37) showed six- to eight-fold higher levels in CCC stage I compared with the more advanced staged carcinomas, and correlated positively with an improved clinical outcome. CONCLUSIONS/SIGNIFICANCE:We have identified two biomarkers which are markedly upregulated in two subgroups of ovarian carcinomas and are also associated with stage and outcome. The results suggest that POLD2 and KSP37 might be potential prognostic biomarkers
    • …
    corecore