27 research outputs found

    Factorbook: an Updated Catalog of Transcription Factor Motifs and Candidate Regulatory Motif Sites [preprint]

    Get PDF
    The human genome contains roughly 1,600 transcription factors (TFs) (1), DNA-binding proteins recognizing characteristic sequence motifs to exert regulatory effects on gene expression. The binding specificities of these factors have been profiled both in vitro, using techniques such as HT-SELEX (2), and in vivo, using techniques including ChIP-seq (3, 4). We previously developed Factorbook, a TF-centric database of annotations, motifs, and integrative analyses based on ChIP-seq data from Phase II of the ENCODE Project. Here we present an update to Factorbook which significantly expands the breadth of cell type and TF coverage. The update includes an expanded motif catalog derived from thousands of ENCODE Phase II and III ChIP-seq experiments and HT-SELEX experiments; this motif catalog is integrated with the ENCODE registry of candidate cis-regulatory elements to annotate a comprehensive collection of genome-wide candidate TF binding sites. The database also offers novel tools for applying the motif models within machine learning frameworks and using these models for integrative analysis, including annotation of variants and disease and trait heritability. We will continue to expand the resource as ENCODE Phase IV data are released

    IRaPPA: information retrieval based integration of biophysical models for protein assembly selection

    Get PDF
    Motivation: In order to function, proteins frequently bind to one another and form 3D assemblies. Knowledge of the atomic details of these structures helps our understanding of how proteins work together, how mutations can lead to disease, and facilitates the designing of drugs which prevent or mimic the interaction. Results: Atomic modeling of protein-protein interactions requires the selection of near-native structures from a set of docked poses based on their calculable properties. By considering this as an information retrieval problem, we have adapted methods developed for Internet search ranking and electoral voting into IRaPPA, a pipeline integrating biophysical properties. The approach enhances the identification of near-native structures when applied to four docking methods, resulting in a near-native appearing in the top 10 solutions for up to 50% of complexes benchmarked, and up to 70% in the top 100. Availability and Implementation: IRaPPA has been implemented in the SwarmDock server ( http://bmm.crick.ac.uk/ approximately SwarmDock/ ), pyDock server ( http://life.bsc.es/pid/pydockrescoring/ ) and ZDOCK server ( http://zdock.umassmed.edu/ ), with code available on request. Contact: [email protected]. Supplementary information: Supplementary data are available at Bioinformatics online

    Consensus guidelines for the use and interpretation of angiogenesis assays

    Get PDF
    The formation of new blood vessels, or angiogenesis, is a complex process that plays important roles in growth and development, tissue and organ regeneration, as well as numerous pathological conditions. Angiogenesis undergoes multiple discrete steps that can be individually evaluated and quantified by a large number of bioassays. These independent assessments hold advantages but also have limitations. This article describes in vivo, ex vivo, and in vitro bioassays that are available for the evaluation of angiogenesis and highlights critical aspects that are relevant for their execution and proper interpretation. As such, this collaborative work is the first edition of consensus guidelines on angiogenesis bioassays to serve for current and future reference

    High-Throughput MicroRNA (miRNAs) Arrays Unravel the Prognostic Role of MiR-211 in Pancreatic Cancer

    Get PDF
    BACKGROUND: Only a subset of radically resected pancreatic ductal adenocarcinoma (PDAC) patients benefit from chemotherapy, and identification of prognostic factors is warranted. Recently miRNAs emerged as diagnostic biomarkers and innovative therapeutic targets, while high-throughput arrays are opening new opportunities to evaluate whether they can predict clinical outcome. The present study evaluated whether comprehensive miRNA expression profiling correlated with overall survival (OS) in resected PDAC patients. METHODOLOGY/PRINCIPAL FINDINGS: High-resolution miRNA profiles were obtained with the Toray's 3D-Gene™-miRNA-chip, detecting more than 1200 human miRNAs. RNA was successfully isolated from paraffin-embedded primary tumors of 19 out of 26 stage-pT3N1 homogeneously treated patients (adjuvant gemcitabine 1000 mg/m(2)/day, days-1/8/15, every 28 days), carefully selected according to their outcome (OS<12 (N = 13) vs. OS>30 months (N = 6), i.e. short/long-OS). Highly stringent statistics included t-test, distance matrix with Spearman-ranked correlation, and iterative approaches. Unsupervised hierarchical analysis revealed that PDACs clustered according to their short/long-OS classification, while the feature selection algorithm RELIEF identified the top 4 discriminating miRNAs between the two groups. These miRNAs target more than 1500 transcripts, including 169 targeted by two or more. MiR-211 emerged as the best discriminating miRNA, with significantly higher expression in long- vs. short-OS patients. The expression of this miRNA was subsequently assessed by quantitative-PCR in an independent cohort of laser-microdissected PDACs from 60 resected patients treated with the same gemcitabine regimen. Patients with low miR-211 expression according to median value had a significantly shorter median OS (14.8, 95%CI = 13.1-16.5, vs. 25.7 months, 95%CI = 16.2-35.1, log-rank-P = 0.004). Multivariate analysis demonstrated that low miR-211 expression was an independent factor of poor prognosis (hazard ratio 2.3, P = 0.03) after adjusting for all the factors influencing outcome. CONCLUSIONS/SIGNIFICANCE: Through comprehensive microarray analysis and PCR validation we identified miR-211 as a prognostic factor in resected PDAC. These results prompt further prospective studies and research on the biological role of miR-211 in PDAC

    Delineation of chromatin states and transcription factor binding in mouse and tools for large-scale data integration

    No full text
    The goal of the ENCODE project has been to characterize regulatory elements in the human genome, such as regions bound by transcription factors (TFs), regions of open chromatin and regions with altered histone modifications. The ENCODE consortium has performed a large number of whole-genome experiments to measure TF binding, chromatin accessibility, gene expression and histone modifications, on a multitude of cell types and conditions in both human and mouse. In this dissertation I describe the analysis of numerous datasets comprising 66 epigenomes, chromatin accessibility and expression data across twelve tissues and seven time points, during mouse embryonic development. We defined chromatin states using histone modification data and performed integrative analysis on the states. We observed coordinated changes of histone mark signals at enhancers and promoters with gene expression. We detected evolutionary conserved bivalent promoters, selectively silencing ~3,400 genes, including hundreds of TFs regulating embryonic development. Second, I present a supervised method to predict TF binding across cell types, with features based on DNA sequence and patterns in DNase I cleavage data. We found that sequence and DNase read counts can outperform other features as well as state-of-the-art methods. I also describe our contribution to the ENCODE TF Binding DREAM challenge where we developed a method, using multiscale features and Extreme Boosting. Third, I describe methods, tools, and computational infrastructure that we have developed to handle large amounts of experimental data and metadata. These tools are fundamental to the selection and integration of large experimental datasets and are at the core of our pipelines, which are described in this dissertation. Finally, I present the protein docking server I developed, as well as algorithms and routines for post-processing predictions and protein structures. Collectively, this body of work encompasses computational approaches to the analyses of chromatin states, gene regulation, and the integration of large experimental datasets.2021-08-31T00:00:00

    Quantifying the Displacement of Mismatches in Multiple Sequence Alignment Benchmarks - Fig 1

    No full text
    <p>(A): CS score compares columns in the reference to corresponding columns in the query; SP score compares all aligned pairs in the reference to the corresponding pairs in the query. (B): Toy alignments to illustrate the difference punishment for alignment shifts in CS, SP, and SPdist (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0127431#pone.0127431.e009" target="_blank">Eq 8</a>) scores. All sequences in this contain a small sequence motif H-x-D-E which has to be aligned correctly. Both query A and B contain the same number of mismatches (indicated by the same SP score for both queries), the sole difference between these two queries is that sequence 2 in query B is shifted five positions further. In this situation one might argue that although both alignment are wrong, i.e. the sequence motifs are not aligned properly, query A is a slightly better alignment than B as the motifs in query A is positioned closer to each other than in query B.</p

    Distributions of residue pairs distances.

    No full text
    <p>In this figure one can clearly observe that the tools which perform less well in our benchmark have relatively high number of peaks in the residue pair distances around 50–350s.</p

    The distribution of the shift score, SP score and the SPdist score.

    No full text
    <p>Here the reference sets 1–5 of the BAliBASE dataset are used to calculate the scores. The ranking for the different methods changes at larger distance thresholds, compared to both the SP and the shift score; note that SP score = SPdist(1). Note that the variance of individual alignment score remain high at large distance thresholds. The average values associated with the median values displayed in this figure are reproduced in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0127431#pone.0127431.s002" target="_blank">S1 Table</a>.</p

    The average SPdist(threshold) for each method against the distance thresholds.

    No full text
    <p>The shaded areas represent standard error (</p><p></p><p><mi>S</mi><mi>E</mi><mo>=</mo></p><p><mi>σ</mi></p><p><mi>N</mi></p><p></p><p></p><p></p>). Here the reference sets 1–5 of the BAliBASE dataset are used to calculate the SPdist score. Some methods are more accurate at small distance thresholds, while others converge better at large distances.<p></p

    The result page of the VerAlign web server.

    No full text
    <p>The result page of the VerAlign web server.</p
    corecore