The discovery of drugs that can effectively treat disease and alleviate pain is one of the core challenges facing modern medicine. The tools and techniques of machine learning have perhaps the greatest potential to provide a fast and efficient route toward the fabrication of novel and effective drugs. In particular, modern structured kernel methods have been successfully applied to range of problem domains and have been recently adapted for graph structures making them directly applicable to pharmaceutical drug discovery. Specifically graph structures have a natural fit with molecular data, in that a graph consists of a set of nodes that represent atoms that are connected by bonds. In this thesis we use graph kernels that utilize three different graph representations: molecular, topological pharmacophore and reduced graphs. We introduce a set of novel graph kernels which are based on a measure of the number of finite walks within a graph. To calculate this measure we employ a dynamic programming framework which allows us to extend graph kernels so they can deal with non-tottering, softmatching and allows the inclusion of gaps. In addition we review several graph colouring methods and subsequently incorporate colour into our graph kernels models. These kernels are designed for molecule classification in general, although we show how they can be adapted to other areas in drug discovery. We conduct three sets of experiments and discuss how our augmented graph kernels are designed and adapted for these areas. First, we classify molecules based on their activity in comparison to a biological target. Second, we explore the related problem of lead hopping. Here one set of chemicals is used to predict another that is structurally dissimilar. We discuss the problems that arise due to the fact that some patterns are filtered from the dataset. By analyzing lead hopping we are able to go beyond the typical cross-validation approach and construct a dataset that more accurately reflect real-world tasks. Lastly, we explore methods of integrating information from multiple targets. We test our models as a multi-response problem and later introduce a new approach that employs Kernel Canonical Correlation Analysis (KCCA) to predict the best molecules for an unseen target. Overall, we show that graph kernels achieve good results in classification, lead hopping and multiple target experiments