43 research outputs found

    Expression profiles of EFS signature genes.

    No full text
    <p>Shown is a heatmap depicting the fold-changes of the top 10 informative genes from the predicted EFS signature. Rows represent genes and columns represent treatment groups. Cell colors indicate the strength and direction of differential expression relative to the corresponding control groups (red: upregulation, green: downregulation). Treatment groups which belong to different compound classes are separated by solid vertical lines. The respective classes are indicated by the color bar on top of the heatmap.</p

    Evaluation of EFS-based and SR-based signatures on TG-GATEs data.

    No full text
    <p>The ROC curves obtained from different cross-validation folds were averaged based on the thresholds for class discrimination and drawn separately for each of the six classification methods. The classifiers evaluated here were trained on features selected using our (<b>A–C</b>) EFS methodology in conjunction with (<b>A</b>) the standard gene selection methods Golub-Ratio, PAM, SVM and RFE, (<b>B</b>) the statistical inference methods t-test, Wilcoxon rank-sum test and permutation test or (<b>C</b>) all previously stated methods. (<b>D</b>) The prediction accuracy was also determined for the SR signature-based models and the corresponding ROC curves were generated as described previously.</p

    Comparison of predicted and known signature genes.

    No full text
    <p>(<b>A</b>) The Venn diagram depicts the overlap between the informative genes from the EFS and SR signature and the genes contained in 7 previously reported signatures. The symbols of known marker genes independently confirmed by our approach are listed in the intersection of the two sets. (<b>B</b>) This diagram illustrates the overlaps between the two novel signatures predicted using the Ensemble Feature Selection (EFS) method and the Specificity Ratio (SR), respectively. (<b>C</b>) The histogram shows the absolute selection frequencies of informative genes which are part of the 2 novel signatures and the 7 previously published ones. The genes which are included in 2 or more signatures are named on top of the corresponding bars. (<b>D</b>) For each signature a pathway enrichment analysis against KEGG was performed and then a histogram was generated that shows which pathways were most frequently detected as enriched among the informative genes. Some of the KEGG pathways which were detected to be significantly enriched in two or more signatures are listed on top of the respective bars.</p

    Selected compounds from TG-GATEs database.

    No full text
    <p>The table lists the compounds from the TG-GATEs database which were included into our computational analysis of NGC-specific expression profiles. For each compound CAS numbers are provided as a reference. According to the annotation files from TG-GATEs, either corn oil (CO) or methyl cellulose (MC) were used as vehicles for the administration to rats. From the three dose levels available at TG-GATEs, we selected for each compound individually the dose level on the basis of the tumorigenic dose rate 50 (TD<sub>50</sub>) known from published animal studies. The liver samples of 3 Sprague-Dawley rats were taken for one group. The corresponding IDs include the compound short name as well as the selected dose. Low Dose (LD) = 1/8 of LD<sub>50</sub>, Medium Dose (MD) = 1/4 of LD<sub>50</sub>, High Dose (HD) = 1/2 of LD<sub>50</sub>.</p

    Separation and classification of compounds based on EFS and SR signature.

    No full text
    <p>(<b>A</b>) The dots correspond to different treatment groups and are colored according to the classes of the compounds used for treatment. Each treatment group was originally represented by a vector composed of the fold-changes of the 54 signature genes measured after 14 days of repeated dosing. In order to inspect the compound-specific expression profiles in a lower-dimensional space, these vectors were transformed to the first and second principal component resulting from PCA. In order to highlight clusters of NGCs and NCs, convex hulls were drawn around the respective compounds. The compounds WY, MP and MCT were considered as undefined, due to ambiguous outcomes of published studies. (<b>B</b>) PCA plot similar to (A), but generated on the basis of the SR signature. (<b>C</b>) The heatmaps depict the confidence of the predictions made by diverse classifiers for assessing the carcinogenic potential of GCs (AAF, DEN) and undefined compounds (MP, WY, MCT). Columns represent compounds and rows correspond to classifiers. The compound classes are indicated by the colorbar on top. The discrimination between carcinogens (blue) and non-carcinogens (green) was done based on the EFS signature. (<b>D</b>) Toxicogenomics-based assessment of the carcinogenic potential of GCs and undefined compounds using diverse classifiers which incorporate the SR signature genes as predictive features.</p

    Evaluation of EFS-based and SR-based signatures on datasets from previous studies.

    No full text
    <p>The bar plots depict the area under the ROC curve achieved by specific prediction models built and evaluated on datasets which have been used in related toxicogenomics studies. The prediction models differ in the adopted classifiers and in the incorporated signatures. On each dataset, two signatures were extracted using the EFS and SR method, respectively, and compared to the signature from the original study in terms of classification performance. Each dataset corresponds to a certain panel (see panel headers), each signature is represented by a group of bars, and the classifiers are indicated by different colors (see legend).</p

    ZBIT Bioinformatics Toolbox: A Web-Platform for Systems Biology and Expression Data Analysis

    No full text
    <div><p>Bioinformatics analysis has become an integral part of research in biology. However, installation and use of scientific software can be difficult and often requires technical expert knowledge. Reasons are dependencies on certain operating systems or required third-party libraries, missing graphical user interfaces and documentation, or nonstandard input and output formats. In order to make bioinformatics software easily accessible to researchers, we here present a web-based platform. The Center for Bioinformatics Tuebingen (ZBIT) Bioinformatics Toolbox provides web-based access to a collection of bioinformatics tools developed for systems biology, protein sequence annotation, and expression data analysis. Currently, the collection encompasses software for conversion and processing of community standards SBML and BioPAX, transcription factor analysis, and analysis of microarray data from transcriptomics and proteomics studies. All tools are hosted on a customized Galaxy instance and run on a dedicated computation cluster. Users only need a web browser and an active internet connection in order to benefit from this service. The web platform is designed to facilitate the usage of the bioinformatics tools for researchers without advanced technical background. Users can combine tools for complex analyses or use predefined, customizable workflows. All results are stored persistently and reproducible. For each tool, we provide documentation, tutorials, and example data to maximize usability. The ZBIT Bioinformatics Toolbox is freely available at <a href="https://webservices.cs.uni-tuebingen.de/" target="_blank">https://webservices.cs.uni-tuebingen.de/</a>.</p></div

    Methodologies used for signature extraction and compound classification.

    No full text
    <p>(<b>A</b>) Ensemble Feature Selection method: First, the compounds are subdivided into a training set and a test set. Then <i>n</i> different subsamplings (bootstraps) each containing 90% of the training compounds are randomly drawn. Gene rankings are generated on each bootstrap by <i>m</i> different algorithms and <i>n×m</i> signatures are inferred. The individual signatures are subsequently merged and incorporated into diverse classifiers which are applied to the test compounds in order to assess the performance. (<b>B</b>) Specificity Ratio method: After splitting the dataset, the gene ranking is directly performed on the training compounds. A preliminary signature is generated from a gene ranking according to the specificity ratio. Then genes exhibiting inconsistent expression profiles across the compounds of the primary class are removed using a ROC-based filter. Finally, the performance of the filtered signature is evaluated on the test set.</p

    TFpredict and SABINE: Sequence-Based Prediction of Structural and Functional Characteristics of Transcription Factors

    Get PDF
    <div><p>One of the key mechanisms of transcriptional control are the specific connections between transcription factors (TF) and <i>cis</i>-regulatory elements in gene promoters. The elucidation of these specific protein-DNA interactions is crucial to gain insights into the complex regulatory mechanisms and networks underlying the adaptation of organisms to dynamically changing environmental conditions. As experimental techniques for determining TF binding sites are expensive and mostly performed for selected TFs only, accurate computational approaches are needed to analyze transcriptional regulation in eukaryotes on a genome-wide level. We implemented a four-step classification workflow which for a given protein sequence (1) discriminates TFs from other proteins, (2) determines the structural superclass of TFs, (3) identifies the DNA-binding domains of TFs and (4) predicts their <i>cis</i>-acting DNA motif. While existing tools were extended and adapted for performing the latter two prediction steps, the first two steps are based on a novel numeric sequence representation which allows for combining existing knowledge from a BLAST scan with robust machine learning-based classification. By evaluation on a set of experimentally confirmed TFs and non-TFs, we demonstrate that our new protein sequence representation facilitates more reliable identification and structural classification of TFs than previously proposed sequence-derived features. The algorithms underlying our proposed methodology are implemented in the two complementary tools TFpredict and SABINE. The online and stand-alone versions of TFpredict and SABINE are freely available to academics at <a href="http://www.cogsys.cs.uni-tuebingen.de/software/TFpredict/" target="_blank">http://www.cogsys.cs.uni-tuebingen.de/software/TFpredict/</a> and <a href="http://www.cogsys.cs.uni-tuebingen.de/software/SABINE/" target="_blank">http://www.cogsys.cs.uni-tuebingen.de/software/SABINE/</a>.</p></div

    Bioinformatics pipeline for the structural and functional annotation of transcription factors.

    No full text
    <p>First the input protein sequence is aligned to a non-redundant protein database using the BLAST heuristic. The bit score distributions of the TFs and non-TFs among the BLAST hits are represented by means of percentiles. These percentiles are incorporated into SVM classifiers for the discrimination of TFs from non-TFs (Step 1). If a given protein sequence was classified as a TF, another SVM is applied to predict its structural superclass (Step 2). The tool InterProScan is used to predict the functional domains of the TF and the DNA-binding domains among these are identified based on the associated GO terms (Step 3). Finally, the tool SABINE infers a DNA motif using an SVR-based algorithm (see Methods section) that takes the structural superclass and DNA-binding domains of the TF as input (Step 4).</p
    corecore