Search CORE

23 research outputs found

Mining Phenotypes for Protein Function Prediction

Author: Groth Philip
Leser Ulf
Pohlenz Hans-Dieter
Weiss Bertram
Publication venue: Dagstuhl Seminar Proceedings. 08131 - Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives
Publication date: 01/01/2008
Field of study

Until very recently, phenotypes only very rarely were studied in a systematic manner. While ontologies for describing gene functions now have a 10 year long tradition, similar vocabularies for describing the phenotype of genes are only emerging now; similarly, the techniques for determining phenotypes on a large scale (especially RNAi) are available only for a few years, while genomic sequencing or gene expression studies are already established for a much longer time. In this talk, we describe results from a study for exploiting phenotype descriptions for protein function prediction. We used the data from PhenomicsDB, a phenotype database integrated from several publicly available data sources. Due to the lack of standardization, phenotypes in PhenomicsDB can only be viewed as text (short statements, abstracts, singular terms, ...). We clustered these texts and analyzed the corresponding gene clusters in terms of their coherence in functional annotation and their interconnectedness by protein-protein-interactions. We also devised a method for using the close similarity in their phenotype descriptions to predict the function of proteins. We show that this methods yields a very good precision at acceptable coverage

DROPS Dagstuhl Research Online Publication Server

PhenomicDB: a new cross-species genotype/phenotype resource

Author: Georgiev Georgi
Groth Philip
Kalev Ivan
Pavlova Nadia
Pohlenz Hans-Dieter
Tonov Spas
Weiss Bertram
Publication venue: Oxford University Press
Publication date: 18/09/2006
Field of study

Phenotypes are an important subject of biomedical research for which many repositories have already been created. Most of these databases are either dedicated to a single species or to a single disease of interest. With the advent of technologies to generate phenotypes in a high-throughput manner, not only is the volume of phenotype data growing fast but also the need to organize these data in more useful ways. We have created PhenomicDB (freely available at ), a multi-species genotype/phenotype database, which shows phenotypes associated with their corresponding genes and grouped by gene orthologies across a variety of species. We have enhanced PhenomicDB recently by additionally incorporating quantitative and descriptive RNA interference (RNAi) screening data, by enabling the usage of phenotype ontology terms and by providing information on assays and cell lines. We envision that integration of classical phenotypes with high-throughput data will bring new momentum and insights to our understanding. Modern analysis tools under development may help exploiting this wealth of information to transform it into knowledge and, eventually, into novel therapeutic approaches

CiteSeerX

Crossref

PubMed Central

Mining phenotypes for gene function prediction

Author: A Kahraman
A Keller
AA Dobritsa
AJ Butte
B Hur
B Schwikowski
Bertram Weiss
BP Kelley
CR Scriver
D Kuttenkeuler
D Lin
D Sieburth
E SanJuana
EC Green
F Piano
G Pandey
G Roman
GJ Hannon
Hans-Dieter Pohlenz
JZ Wang
KA Kellerman
KC Gunsalus
KC Gunsalus
KJ Gaulton
LB Vosshall
M Bate
M Steinbach
MA Huynen
MA van Driel
N Daraselia
N Freimer
P Bhandari
P Groth
P Groth
Philip Groth
PW Lord
RM Cripps
S Jaeger
S Raychaudhuri
SC Rison
SD Brown
T Schupbach
U Nongthomba
Ulf Leser
US Eggert
V Mermall
V Spirin
X Guo
Y Lussier
Y Shi
Y Tao
Y Zhao
Y Zhao
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Health and disease of organisms are reflected in their phenotypes. Often, a genetic component to a disease is discovered only after clearly defining its phenotype. In the past years, many technologies to systematically generate phenotypes in a high-throughput manner, such as RNA interference or gene knock-out, have been developed and used to decipher functions for genes. However, there have been relatively few efforts to make use of phenotype data beyond the single genotype-phenotype relationships. Results We present results on a study where we use a large set of phenotype data – in textual form – to predict gene annotation. To this end, we use text clustering to group genes based on their phenotype descriptions. We show that these clusters correlate well with several indicators for biological coherence in gene groups, such as functional annotations from the Gene Ontology (GO) and protein-protein interactions. We exploit these clusters for predicting gene function by carrying over annotations from well-annotated genes to other, less-characterized genes in the same cluster. For a subset of groups selected by applying objective criteria, we can predict GO-term annotations from the biological process sub-ontology with up to 72.6% precision and 16.7% recall, as evaluated by cross-validation. We manually verified some of these clusters and found them to exhibit high biological coherence, e.g. a group containing all available antennal Drosophila odorant receptors despite inconsistent GO-annotations. Conclusion The intrinsic nature of phenotypes to visibly reflect genetic activity underlines their usefulness in inferring new gene functions. Thus, systematically analyzing these data on a large scale offers many possibilities for inferring functional annotation of genes. We show that text clustering can play an important role in this process.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Slow-Binding Inhibition of Escherichia coli

Author: Albrecht Messerschmidt
Bernd Laber
Hans-Dieter Pohlenz
Robert Huber
Tim Clausen
Publication venue: 'American Chemical Society (ACS)'
Publication date
Field of study

Crossref

Deciphering Seed Sequence Based Off-Target Effects in a Large-Scale RNAi Reporter Screen for E-Cadherin Expression

Author: Barbara Nicke (794319)
Florian Sohler (794320)
Hans-Dieter Pohlenz (45029)
Robert Adams (77677)
Publication venue
Publication date: 11/09/2015
Field of study

<div>Functional RNAi based screening is affected by large numbers of false positive and negative hits due to prevalent sequence based off-target effects. We performed a druggable genome targeting siRNA screen intended to identify novel regulators of E-cadherin (CDH1) expression, a known key player in epithelial mesenchymal transition (EMT). Analysis of primary screening results indicated a large number of false-positive hits. To address these crucial difficulties we developed an analysis method, SENSORS, which, similar to published methods, is a seed enrichment strategy for analyzing siRNA off-targets in RNAi screens. Using our approach, we were able to demonstrate that accounting for seed based off-target effects stratifies primary screening results and enables the discovery of additional screening hits. While traditional hit detection methods are prone to false positive results which are undetected, we were able to identify false positive hits robustly. Transcription factor MYBL1 was identified as a putative novel target required for CDH1 expression and verified experimentally. No siRNA pool targeting MYBL1 was present in the used siRNA library. Instead, MYBL1 was identified as a putative CDH1 regulating target solely based on the SENSORS off-target score, i.e. as a gene that is a cause for off-target effects down regulating E-cadherin expression.</div

Directory of Open Access Journals

The Francis Crick Institute

CDK5R1 false positive prediction and validation.

Author: Barbara Nicke (794319)
Florian Sohler (794320)
Hans-Dieter Pohlenz (45029)
Robert Adams (77677)
Publication venue
Publication date
Field of study

(A) ZEB1 and KRAS were the most significant off-targets in our screen causing an E-cadherin up regulation while CDH1 and MYBL1 are strong negative off-targets causing a loss of E-cadherin expression. The red dashed line is the hit threshold for primary screening data (shown on the y-axis). Pools that fell within the orange zone (i.e. pools showing a primary score above the primary screen threshold but that have no significant off-target z-score) and that have at least one seed matching into the strong positive off-targets are considered likely false positives (red circles). These pools were deconvoluted and validated experimentally. (B) Common seed analysis for the CDK5R1 pool. While no other siRNAs with the seed sequence GTACCTC exhibited a significant phenotypic score, some of the siRNAs with the seed sequence AACAATG (match in ZEB1 3’UTR) showed a similar phenotype to the CDK5R1 pool (red points). One seed sequence is only present in the CDK5R1 pool. (C) Deconvolution of CDK5R1 siRNAs. The siRNA containing the seed AACAATG (si16899) was the only one showing a significant up regulation of E-cadherin expression, while all other siRNAs targeted against CDK5R1 showed no phenotype. (D) C911 control. The C911 control for si16899 kept the phenotype of the unaltered siRNA, indicating that the observed phenotype is due to a seed sequence-mediated off-target effect. The ZEB1 C911 siRNA showed no phenotype indicating that the ZEB1 phenotype is a true positive (on-target) result.</p

The Francis Crick Institute

Seed based off-target effects in pooled siRNA screens.

Author: Barbara Nicke (794319)
Florian Sohler (794320)
Hans-Dieter Pohlenz (45029)
Robert Adams (77677)
Publication venue
Publication date
Field of study

(A) An on-target siRNA match is generally understood as a perfect match of nucleotides 1–19 of an 21 nucleotide long siRNA guide strand within the coding sequence of an intended transcript [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0137640#pone.0137640.ref008" target="_blank">8</a>]. We define an off-target heptamer seed match as a perfect match of nucleotides 2–8 of the guide strand within the 3‘UTR of an unintended transcript. (B) While an on-target siRNA effect is limited to one or few different transcripts, mostly for one gene, a match for a seed can occur in thousands of different transcripts and several times within one 3‘UTR. (C) For pooled screens the elucidation of seed-based off-target effects is much more complex than for single screened siRNAs. The seeds of the three pool siRNAs may match thousands of transcripts and may translate into unintentional transcript silencing. For an on-target pool situation (left) it is always known from which transcript knock-down the phenotype results (yellow flash symbols near the transcript) while for the off-target situation it is unknown from which on- or off-target knock-down of transcripts the phenotype for a pool results (yellow and grey flash symbols near the pool).</p

The Francis Crick Institute

Seed enrichment visualizations for 4 high scoring off-targets.

Author: Barbara Nicke (794319)
Florian Sohler (794320)
Hans-Dieter Pohlenz (45029)
Robert Adams (77677)
Publication venue
Publication date
Field of study

Density curves show the tendency of high scoring positive (red) and negative (blue) off-targets. The x-axes show the rank of the indicated numbers of seeds while the density of the respective ranks is shown on the y-axes. The difference in trends of high and low scoring off-targets is clearly visible by left- and right-skewed densities, respectively. ZEB1 (top left) and CDH1 (bottom right) were the most significant off-targets observed.</p

The Francis Crick Institute

Primary screening results and expression of screened targets.

Author: Barbara Nicke (794319)
Florian Sohler (794320)
Hans-Dieter Pohlenz (45029)
Robert Adams (77677)
Publication venue
Publication date
Field of study

(A) Overlaid box and violin plot showing the primary screen phenotype distribution. Colored circles show effects of the ZEB1 positive control pool (red), the CDH1 pool (gold) and the CDK5R1 pool (blue), respectively. The dashed grey line indicates the hit threshold. (B) Histogram of log values of primary screening results combined with the expression status for a subset of 8,977 genes. The red dashed line indicates the hit threshold.</p

The Francis Crick Institute

Contingency table.

Author: Barbara Nicke (794319)
Florian Sohler (794320)
Hans-Dieter Pohlenz (45029)
Robert Adams (77677)
Publication venue
Publication date
Field of study

8,977 genes that were classified as expressed or non-expressed by integrating two gene expression data sets for PANC-1 cells (only genes that are absent or present in all data sets were considered) were examined for expression by integrating two PANC-1 expression data sets and assigned with an absent or present expression status by stringent criteria. For genes targeted by siRNA pools exhibiting a significant phenotype (primary screening hits) no significant difference between expressed and non-expressed genes could be detected (p = 0.32).Contingency table.</p

The Francis Crick Institute