The Partial Order Kernel and its Application to Understanding the Regulatory Grammar of Conserved Non-coding Elements

Abstract

PhDConserved non-coding elements (CNEs) are regions of non-coding DNA which have remained evolutionarily conserved across various species over millions of years and are found to cluster near genes involved in early embryonic development, suggesting that they play an important role as regulatory elements. Indeed, many CNEs have been shown to act as enhancers; however, not all regulatory elements are conserved and in some cases, deletion of CNEs did not result in any notable phenotypes. These opposing ndings indicate that the functions of CNEs are still poorly understood and further research on these elements is needed to uncover the reasons for their extreme conservation. The aim of this thesis is to investigate the use and development of algorithms for decoding the regulatory grammar of CNEs. Initially, an assessment of several methods for functional classi cation of CNEs is provided. The results obtained using these methods are validated by functional assays and their limitations in capturing the grammar of CNEs are discussed. Motivated by these limitations, a partial order graph representation of the sequence of transcription factor binding sites (TFBSs) in a CNE that allows e cient handling of the overlapping sites is introduced. A dynamic programming-based method for aligning two such graphs and identifying regulatory signatures composed of co-occurring TFBSs is proposed and evaluated. The results demonstrate the predictive ability of this method, which can be used to prioritise regions for experimental validation. Building on this method, the partial order kernel (POKer) for comparison of strings containing alternative substrings and represented by partial order graphs is introduced. The POKer is evaluated in di erent sequence comparison tasks, including visual localisation. An approach using the POKer for functional classi cation of CNEs is introduced and its e ectiveness in capturing the grammar of CNEs is demonstrated. Finally, the implications of the results presented in this work for modelling the evolution of CNEs are discussed

    Similar works