Application of multivariate statistics and machine learning to phenotypic imaging and chemical high-content data

Abstract

Image-based high-content screens (HCS) hold tremendous promise for cell-based phenotypic screens. Challenges related to HCS include not only storage and management of data, but critical analysis of the complex image-based data. I implemented a data storage and screen management framework and developed approaches for data analysis of a number high-content microscopy screen formats. I visualized and analysed pilot screens to develop a robust multi-parametric assay for the identification of genes involved in DNA damage repair in HeLa cells. Further, I developed and implemented new approaches for image processing and screen data normalization. My analyses revealed that the ubiquitin ligase RNF8 plays a central role in DNA-damage response and that a related ubiquitin ligase RNF168 causes the cellular and developmental phenotypes characteristic for the RIDDLE syndrome. My approaches also uncovered a role for the MMS22LTONSL complex in DSB repair and its role in the recombination-dependent repair of stalled or collapsed replication forks. The discovery of novel bioactive molecules is a challenge because the fraction of active candidate molecules is usually small and confounded by noise in experimental readouts. Cheminformatics can improve robustness of chemical high-throughput screens and functional genomics data sets by taking structure-activity relationships into account. I applied statistics, machine learning and cheminformatics to different data sets to discern novel bioactive compounds. I showed that phenothiazines and apomorphines are regulators for cell differentiation in murine embryonic stem cells. Further, I pioneered computational methods for the identification of structural features that influence the degradation and retention of compounds in the nematode C. elegans. I used chemoinformatics to assemble a comprehensive screening library of previously approved drugs for redeployment in new bioassays. A combination of chemical genetic interactions, cheminformatics and machine learning allowed me to predict novel synergistic antifungal small molecule combinations from sensitized screens with the drug library. In another study on the biological effects of commonly prescribed psychoactive compounds, I discovered a strong link between lipophilicity and bioactivity of compounds in yeast and unexpected off-target effects that could account for unwanted side effects in humans. I also investigated structure-activity relationships and assessed the chemical diversity of a compound collection that was used to probe chemical-genetic interactions in yeast. Finally, I have made these methods and tools available to the scientific community, including an open source software package called MolClass that allows researchers to make predictions about bioactivity of small molecules based on their chemical structure

    Similar works