Search CORE

133 research outputs found

Simultaneous regression and classification for drug sensitivity prediction using an advanced random forest method

Author: Eckhart Lea
Gerstner Nico
Kehl Tim
Lenhof Hans-Peter
Lenhof Kerstin
Publication venue: Saarländische Universitäts- und Landesbibliothek
Publication date: 01/01/2022
Field of study

Machine learning methods trained on cancer cell line panels are intensively studied for the prediction of optimal anti-cancer therapies. While classifcation approaches distinguish efective from inefective drugs, regression approaches aim to quantify the degree of drug efectiveness. However, the high specifcity of most anti-cancer drugs induces a skewed distribution of drug response values in favor of the more drug-resistant cell lines, negatively afecting the classifcation performance (class imbalance) and regression performance (regression imbalance) for the sensitive cell lines. Here, we present a novel approach called SimultAneoUs Regression and classifcatiON Random Forests (SAURON-RF) based on the idea of performing a joint regression and classifcation analysis. We demonstrate that SAURON-RF improves the classifcation and regression performance for the sensitive cell lines at the expense of a moderate loss for the resistant ones. Furthermore, our results show that simultaneous classifcation and regression can be superior to regression or classifcation alone

PubMed Central

Universaar

Acronym

NightShift: NMR shift inference by general hybrid model training - a framework for NMR chemical shift prediction

Author: Andreas Hildebrandt
Anna Dehof
Hans-Peter Lenhof
Simon Loew
Publication venue: Springer Nature
Publication date: 01/01/2013
Field of study

BACKGROUND: NMR chemical shift prediction plays an important role in various applications in computational biology. Among others, structure determination, structure optimization, and the scoring of docking results can profit from efficient and accurate chemical shift estimation from a three-dimensional model. A variety of NMR chemical shift prediction approaches have been presented in the past, but nearly all of these rely on laborious manual data set preparation and the training itself is not automatized, making retraining the model, e.g., if new data is made available, or testing new models a time-consuming manual chore. RESULTS: In this work, we present the framework NightShift (NMR Shift Inference by General Hybrid Model Training), which enables automated data set generation as well as model training and evaluation of protein NMR chemical shift prediction. In addition to this main result – the NightShift framework itself – we describe the resulting, automatically generated, data set and, as a proof-of-concept, a random forest model called Spinster that was built using the pipeline. CONCLUSION: By demonstrating that the performance of the automatically generated predictors is at least en par with the state of the art, we conclude that automated data set and predictor generation is well-suited for the design of NMR chemical shift estimators. The framework can be downloaded from https://bitbucket.org/akdehof/nightshift. It requires the open source Biochemical Algorithms Library (BALL), and is available under the conditions of the GNU Lesser General Public License (LGPL). We additionally offer a browser-based user interface to our NightShift instance employing the Galaxy framework via https://ballaxy.bioinf.uni-sb.de/

Springer - Publisher Connector

PubMed Central

Glycosylation Patterns of Proteins Studied by Liquid Chromatography-Mass Spectrometry and Bioinformatic Tools

Author: Berger Peter
Hildebrandt Andreas
Hofmann Andreas
Huber Christian G.
Lenhof Hans Peter
Oberacher Herbert
Publication venue: Dagstuhl Seminar Proceedings. 05471 - Computational Proteomics
Publication date: 01/01/2006
Field of study

Due to their extensive structural heterogeneity, the elucidation of glycosylation patterns in glycoproteins such as the subunits of chorionic gonadotropin (CG), CG-alpha and CG-beta remains one of the most challenging problems in the proteomic analysis of posttranslational modifications. In consequence, glycosylation is usually studied after decomposition of the intact proteins to the proteolytic peptide level. However, by this approach all information about the combination of the different glycopeptides in the intact protein is lost. In this study we have, therefore, attempted to combine the results of glycan identification after tryptic digestion with molecular mass measurements on the intact glycoproteins. Despite the extremely high number of possible combinations of the glycans identified in the tryptic peptides by high-performance liquid chromatography-mass spectrometry (> 1000 for CG-alpha and > 10.000 for CG-beta), the mass spectra of intact CG-alpha and CG-beta revealed only a limited number of glycoforms present in CG preparations from pools of pregnancy urines. Peak annotations for CG-alpha were performed with the help of an algorithm that generates a database containing all possible modifications of the proteins (inclusive possible artificial modifications such as oxidation or truncation) and subsequent searches for combinations fitting the mass difference between the polypeptide backbone and the measured molecular masses. Fourteen different glycoforms of CG-alpha, including methionine-oxidized and N-terminally truncated forms, were readily identified. For CG-beta, however, the relatively high mass accuracy of Â± 2 Da was still insufficient to unambiguously assign the possible combinations of posttranslational modifications. Finally, the mass spectrometric fingerprints of the intact molecules were shown to be very useful for the characterization of glycosylation patterns in different CG preparations

Dagstuhl Research Online Publication Server

Phylogenetics from paralogs

Author: Hellmuth Marc
Lechner Markus
Lenhof Hans-Peter
Middendorf Martin
Stadler Peter F.
Wieseke Nicolas
Publication venue: Fakultät 6 - Naturwissenschaftlich-Technische Fakultät I. Fachrichtung 6.2 - Informatik
Publication date: 01/01/2014
Field of study

Motivation: Sequence-based phylogenetic approaches heavily rely on initial data sets to be composed of orthologous sequences only. Paralogs are treated as a dangerous nuisance that has to be detected and removed. Recent advances in mathematical phylogenetics, however, have indicated that gene duplications can also convey meaningful phylogenetic information provided orthologs and paralogs can be distinguished with a degree of certainty. Results: We demonstrate that plausible phylogenetic trees can be inferred from paralogy information only. To this end, tree-free estimates of orthology, the complement of paralogy, are first corrected to conform cographs and then translated into equivalent event-labeled gene phylogenies. A certain subset of the triples displayed by these trees translates into constraints on the species trees. While the resolution is very poor for individual gene families, we observe that genome-wide data sets are sufficient to generate fully resolved phylogenetic trees of several groups of eubacteria. The novel method introduced here relies on solving three intertwined NP-hard optimization problems: the cograph editing problem, the maximum consistent triple set problem, and the least resolved tree problem. Implemented as Integer Linear Program, paralogy-based phylogenies can be computed exactly for up to some twenty species and their complete protein complements. Availability:The ILP formulation is implemented in the Software ParaPhylo using IBM ILOG CPLEX (TM) Optimizer 12.6 and is freely available from http://pacosy.informatik.uni-leipzig.de/paraphyl

Universaar

Acronym

GraBCas: a bioinformatics tool for score-based prediction of Caspase- and Granzyme B-cleavage sites in protein sequences

Author: Backes Christina
Comtesse Nicole
Kuentzer Jan
Lenhof Hans-Peter
Meese Eckart
Publication venue: Oxford University Press
Publication date: 27/06/2005
Field of study

Caspases and granzyme B are proteases that share the primary specificity to cleave at the carboxyl terminal of aspartate residues in their substrates. Both, caspases and granzyme B are enzymes that are involved in fundamental cellular processes and play a central role in apoptotic cell death. Although various targets are described, many substrates still await identification and many cleavage sites of known substrates are not identified or experimentally verified. A more comprehensive knowledge of caspase and granzyme B substrates is essential to understand the biological roles of these enzymes in more detail. The relatively high variability in cleavage site recognition sequence often complicates the identification of cleavage sites. As of yet there is no software available that allows identification of caspase and/or granzyme with cleavage sites differing from the consensus sequence. Here, we present a bioinformatics tool ‘GraBCas’ that provides score-based prediction of potential cleavage sites for the caspases 1–9 and granzyme B including an estimation of the fragment size. We tested GraBCas on already known substrates and showed its usefulness for protein sequence analysis. GraBCas is available at

Crossref

PubMed Central

Transcriptome analysis by GeneTrail revealed regulation of functional categories in response to alterations of iron homeostasis in Arabidopsis thaliana

Author: Backes Christina
Bauer Petra
Keller Andreas
Lenhof Hans-Peter
Philippar Katrin
Schuler Mara
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background High-throughput technologies have opened new avenues to study biological processes and pathways. The interpretation of the immense amount of data sets generated nowadays needs to be facilitated in order to enable biologists to identify complex gene networks and functional pathways. To cope with this task multiple computer-based programs have been developed. GeneTrail is a freely available online tool that screens comparative transcriptomic data for differentially regulated functional categories and biological pathways extracted from common data bases like KEGG, Gene Ontology (GO), TRANSPATH and TRANSFAC. Additionally, GeneTrail offers a feature that allows screening of individually defined biological categories that are relevant for the respective research topic. Results We have set up GeneTrail for the use of <it>Arabidopsis thaliana</it>. To test the functionality of this tool for plant analysis, we generated transcriptome data of root and leaf responses to Fe deficiency and the Arabidopsis metal homeostasis mutant <it>nas4x-1</it>. We performed Gene Set Enrichment Analysis (GSEA) with eight meaningful pairwise comparisons of transcriptome data sets. We were able to uncover several functional pathways including metal homeostasis that were affected in our experimental situations. Representation of the differentially regulated functional categories in Venn diagrams uncovered regulatory networks at the level of whole functional pathways. Over-Representation Analysis (ORA) of differentially regulated genes identified in pairwise comparisons revealed specific functional plant physiological categories as major targets upon Fe deficiency and in <it>nas4x-1</it>. Conclusion Here, we obtained supporting evidence, that the <it>nas4x-1 </it>mutant was defective in metal homeostasis. It was confirmed that <it>nas4x-1 </it>showed Fe deficiency in roots and signs of Fe deficiency and Fe sufficiency in leaves. Besides metal homeostasis, biotic stress, root carbohydrate, leaf photosystem and specific cell biological categories were discovered as main targets for regulated changes in response to - Fe and <it>nas4x-1</it>. Among 258 differentially expressed genes in response to - Fe and <it>nas4x-1 </it>five functional categories were enriched covering metal homeostasis, redox regulation, cell division and histone acetylation. We proved that GeneTrail offers a flexible and user-adapted way to identify functional categories in large-scale plant transcriptome data sets. The distinguished feature that allowed analysis of individually assembled functional categories facilitated the study of the <it>Arabidopsis thaliana </it>transcriptome.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Open Access LMU

Efficient Interpretation of Tandem Mass Tags in Top-Down Proteomics

Author: Althaus Ernst
Hildebrandt Andreas
Hildebrandt Anna Katharina
Hung Chien-Wen
Lenhof Hans-Peter
Tholey Andreas
Publication venue: OASIcs - OpenAccess Series in Informatics. German Conference on Bioinformatics 2013
Publication date: 01/01/2013
Field of study

Mass spectrometry is the major analytical tool for the identification and quantification of proteins in biological samples. In so-called top-down proteomics, separation and mass spectrometric analysis is performed at the level of intact proteins, without preparatory digestion steps. It has been shown that the tandem mass tag (TMT) labeling technology, which is often used for quantification based on digested proteins (bottom-up studies), can be applied in top-down proteomics as well. This, however, leads to a complex interpretation problem, where we need to annotate measured peaks with their respective generating protein, the number of charges, and the a priori unknown number of TMT-groups attached to this protein. In this work, we give an algorithm for the efficient enumeration of all valid annotations that fulfill available experimental constraints. Applying the algorithm to real-world data, we show that the annotation problem can indeed be efficiently solved. However, our experiments also demonstrate that reliable annotation in complex mixtures requires at least partial sequence information and high mass accuracy and resolution to go beyond the proof-of-concept stage

Dagstuhl Research Online Publication Server

A minimally invasive multiple marker approach allows highly efficient detection of meningioma tumors

Author: Comtesse Nicole
Hildebrandt Andreas
Keller Andreas
Lenhof Hans-Peter
Ludwig Nicole
Meese Eckart
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The development of effective frameworks that permit an accurate diagnosis of tumors, especially in their early stages, remains a grand challenge in the field of bioinformatics. Our approach uses statistical learning techniques applied to multiple antigen tumor antigen markers utilizing the immune system as a very sensitive marker of molecular pathological processes. For validation purposes we choose the intracranial meningioma tumors as model system since they occur very frequently, are mostly benign, and are genetically stable. RESULTS: A total of 183 blood samples from 93 meningioma patients (WHO stages I-III) and 90 healthy controls were screened for seroreactivity with a set of 57 meningioma-associated antigens. We tested several established statistical learning methods on the resulting reactivity patterns using 10-fold cross validation. The best performance was achieved by Naïve Bayes Classifiers. With this classification method, our framework, called Minimally Invasive Multiple Marker (MIMM) approach, yielded a specificity of 96.2%, a sensitivity of 84.5%, and an accuracy of 90.3%, the respective area under the ROC curve was 0.957. Detailed analysis revealed that prediction performs particularly well on low-grade (WHO I) tumors, consistent with our goal of early stage tumor detection. For these tumors the best classification result with a specificity of 97.5%, a sensitivity of 91.3%, an accuracy of 95.6%, and an area under the ROC curve of 0.971 was achieved using a set of 12 antigen markers only. This antigen set was detected by a subset selection method based on Mutual Information. Remarkably, our study proves that the inclusion of non-specific antigens, detected not only in tumor but also in normal sera, increases the performance significantly, since non-specific antigens contribute additional diagnostic information. CONCLUSION: Our approach offers the possibility to screen members of risk groups as a matter of routine such that tumors hopefully can be diagnosed immediately after their genesis. The early detection will finally result in a higher cure- and lower morbidity-rate

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central