210 research outputs found

    A systematic pathway-based network approach for in silico drug repositioning

    Get PDF
    Drug repositioning, the method of finding new uses for existing drugs, holds the potential to reduce the cost and time of drug development. Successful drug repositioning strategies depend heavily on the availability and aggregation of different drug and disease databases. Moreover, to yield greater understanding of drug prioritisation approaches, it is necessary to objectively assess (benchmark) and compare different methods. Data aggregation requires extensive curation of non-standardised drug nomenclature. To overcome this, we used a graph-theoretic approach to construct a drug synonym resource that collected drug identifiers from a range of publicly available sources, establishing missing links between databases. Thus, we could systematically assess the performance of available in silico drug repositioning methodologies with increased power for scoring true positive drug-disease pairs. We developed a novel pathway-based drug repositioning pipeline, based on a bipartite network of pathway- and drug-gene set correlations that captured functional relationships. To prioritise drugs, we used our bipartite network and the differentially expressed pathways in a given disease that formed a disease signature. We then took the cumulative network correlation between disease pathway and drug signatures to generate a drug prioritisation score. We prioritised drugs for three case studies: juvenile idiopathic arthritis, Alzheimer's and Parkinson's disease. We explored the use of different true positive lists in the evaluation of drug repositioning performance, providing insight into the most appropriate benchmark designs. We have identified several promising drug candidates and showed that our method successfully prioritises disease-modifying treatments over drugs offering symptomatic relief. We have compared the pipeline’s performance to an alternative well-established method and showed that our method has increased sensitivity to current treatment trends. The successful translation of drug candidates identified in this thesis has the potential to speed up the drug-discovery pipeline and thus more rapidly and efficiently deliver disease-modifying treatments to patients

    Ms Pac-Man versus Ghost Team CEC 2011 competition

    Get PDF
    Games provide an ideal test bed for computational intelligence and significant progress has been made in recent years, most notably in games such as Go, where the level of play is now competitive with expert human play on smaller boards. Recently, a significantly more complex class of games has received increasing attention: real-time video games. These games pose many new challenges, including strict time constraints, simultaneous moves and open-endedness. Unlike in traditional board games, computational play is generally unable to compete with human players. One driving force in improving the overall performance of artificial intelligence players are game competitions where practitioners may evaluate and compare their methods against those submitted by others and possibly human players as well. In this paper we introduce a new competition based on the popular arcade video game Ms Pac-Man: Ms Pac-Man versus Ghost Team. The competition, to be held at the Congress on Evolutionary Computation 2011 for the first time, allows participants to develop controllers for either the Ms Pac-Man agent or for the Ghost Team and unlike previous Ms Pac-Man competitions that relied on screen capture, the players now interface directly with the game engine. In this paper we introduce the competition, including a review of previous work as well as a discussion of several aspects regarding the setting up of the game competition itself. © 2011 IEEE

    Domino D5.1 - Metrics and analysis approach

    Get PDF
    This deliverable presents the metrics proposed to assess the impact of innovations in the ATM system and a stylized ABM model, called a ‘toy model’, to be used as a test ground for the metrics. Existing network metrics are reviewed and their limitations are highlighted by applying them to real data. New metrics are then suggested to overcome these limitations. Their better results in measuring interconnections and causal relationships between the elements of the ATM system are shown for empirical case studies. The design of the toy model is presented and preliminary results of its baseline implementation are shown

    Unsupervised learning for anomaly detection in Australian medical payment data

    Full text link
    Fraudulent or wasteful medical insurance claims made by health care providers are costly for insurers. Typically, OECD healthcare organisations lose 3-8% of total expenditure due to fraud. As Australia’s universal public health insurer, Medicare Australia, spends approximately A34billionperannumontheMedicareBenefitsSchedule(MBS)andPharmaceuticalBenefitsScheme,wastedspendingofA 34 billion per annum on the Medicare Benefits Schedule (MBS) and Pharmaceutical Benefits Scheme, wasted spending of A1–2.7 billion could be expected.However, fewer than 1% of claims to Medicare Australia are detected as fraudulent, below international benchmarks. Variation is common in medicine, and health conditions, along with their presentation and treatment, are heterogenous by nature. Increasing volumes of data and rapidly changing patterns bring challenges which require novel solutions. Machine learning and data mining are becoming commonplace in this field, but no gold standard is yet available. In this project, requirements are developed for real-world application to compliance analytics at the Australian Government Department of Health and Aged Care (DoH), covering: unsupervised learning; problem generalisation; human interpretability; context discovery; and cost prediction. Three novel methods are presented which rank providers by potentially recoverable costs. These methods used association analysis, topic modelling, and sequential pattern mining to provide interpretable, expert-editable models of typical provider claims. Anomalous providers are identified through comparison to the typical models, using metrics based on costs of excess or upgraded services. Domain knowledge is incorporated in a machine-friendly way in two of the methods through the use of the MBS as an ontology. Validation by subject-matter experts and comparison to existing techniques shows that the methods perform well. The methods are implemented in a software framework which enables rapid prototyping and quality assurance. The code is implemented at the DoH, and further applications as decision-support systems are in progress. The developed requirements will apply to future work in this fiel

    Integration and visualisation of clinical-omics datasets for medical knowledge discovery

    Get PDF
    In recent decades, the rise of various omics fields has flooded life sciences with unprecedented amounts of high-throughput data, which have transformed the way biomedical research is conducted. This trend will only intensify in the coming decades, as the cost of data acquisition will continue to decrease. Therefore, there is a pressing need to find novel ways to turn this ocean of raw data into waves of information and finally distil those into drops of translational medical knowledge. This is particularly challenging because of the incredible richness of these datasets, the humbling complexity of biological systems and the growing abundance of clinical metadata, which makes the integration of disparate data sources even more difficult. Data integration has proven to be a promising avenue for knowledge discovery in biomedical research. Multi-omics studies allow us to examine a biological problem through different lenses using more than one analytical platform. These studies not only present tremendous opportunities for the deep and systematic understanding of health and disease, but they also pose new statistical and computational challenges. The work presented in this thesis aims to alleviate this problem with a novel pipeline for omics data integration. Modern omics datasets are extremely feature rich and in multi-omics studies this complexity is compounded by a second or even third dataset. However, many of these features might be completely irrelevant to the studied biological problem or redundant in the context of others. Therefore, in this thesis, clinical metadata driven feature selection is proposed as a viable option for narrowing down the focus of analyses in biomedical research. Our visual cortex has been fine-tuned through millions of years to become an outstanding pattern recognition machine. To leverage this incredible resource of the human brain, we need to develop advanced visualisation software that enables researchers to explore these vast biological datasets through illuminating charts and interactivity. Accordingly, a substantial portion of this PhD was dedicated to implementing truly novel visualisation methods for multi-omics studies.Open Acces

    Computational approaches for analysing and engineering micropollutant degradation in microbial communities

    Get PDF
    PhD ThesisThe presence of micropollutants in wastewater is problematic, as many micropollutants exert negative ecological and toxicological effects in their environment. A well-known effect of micropollutants is the feminisation of aquatic wildlife by environmental estrogens, a proportion of which enter water courses from municipal sources via wastewater treatment plants (WWTPs). While WWTPs remove some micropollutants, they are not designed to do so. Given that WWTPs already have high operating costs (both financially and energetically), there is a need for novel approaches to micropollutant removal that are both cost-effective and environmentally sustainable. One proposed approach is to use enzymes to degrade micropollutants, which requires an understanding of metabolic pathways for the desired micropollutant, and a strategy for deploying the enzymes in the environment. Although tools exist to assist with metabolic pathway prediction and enzyme discovery, there are currently no computational approaches that are able to identify enzymes from a user’s collection of proteins (given a query compound and expected change to that query compound). To address this research gap, we developed EnSeP, a data-driven, transformation-specific approach to enzyme discovery. Using EnSeP, we then identified candidate enzymes involved in estradiol degradation. Recent advances in synthetic biology mean that deploying a single synthetic construct in multiple microorganisms is feasible. In the context of micropollutant metabolism, this means that a biodegradative pathway could be introduced into multiple organisms in a community simultaneously, providing more opportunities for the construct (and its functionality) to persist in the population long-term. However, current design tools have not yet been adapted for multiple organism applications. To address this research gap, we developed an evolutionary algorithm (EA) that optimises a single coding sequence (CDS) for multiple hosts. Finally, based on insights from developing the EA, we developed an improved version of the single-organism CDS optimisation algorithm that the EA is based on

    Computer Aided Synthesis Prediction to Enable Augmented Chemical Discovery and Chemical Space Exploration

    Get PDF
    The drug-like chemical space is estimated to be 10 to the power of 60 molecules, and the largest generated database (GDB) obtained by the Reymond group is 165 billion molecules with up to 17 heavy atoms. Furthermore, deep learning techniques to explore regions of chemical space are becoming more popular. However, the key to realizing the generated structures experimentally lies in chemical synthesis. The application of which was previously limited to manual planning or slow computer assisted synthesis planning (CASP) models. Despite the 60-year history of CASP few synthesis planning tools have been open-sourced to the community. In this thesis I co-led the development of and investigated one of the only fully open-source synthesis planning tools called AiZynthFinder, trained on both public and proprietary datasets consisting of up to 17.5 million reactions. This enables synthesis guided exploration of the chemical space in a high throughput manner, to bridge the gap between compound generation and experimental realisation. I firstly investigate both public and proprietary reaction data, and their influence on route finding capability. Furthermore, I develop metrics for assessment of retrosynthetic prediction, single-step retrosynthesis models, and automated template extraction workflows. This is supplemented by a comparison of the underlying datasets and their corresponding models. Given the prevalence of ring systems in the GDB and wider medicinal chemistry domain, I developed ‘Ring Breaker’ - a data-driven approach to enable the prediction of ring-forming reactions. I demonstrate its utility on frequently found and unprecedented ring systems, in agreement with literature syntheses. Additionally, I highlight its potential for incorporation into CASP tools, and outline methodological improvements that result in the improvement of route-finding capability. To tackle the challenge of model throughput, I report a machine learning (ML) based classifier called the retrosynthetic accessibility score (RAscore), to assess the likelihood of finding a synthetic route using AiZynthFinder. The RAscore computes at least 4,500 times faster than AiZynthFinder. Thus, opens the possibility of pre-screening millions of virtual molecules from enumerated databases or generative models for synthesis informed compound prioritization. Finally, I combine chemical library visualization with synthetic route prediction to facilitate experimental engagement with synthetic chemists. I enable the navigation of chemical property space by using interactive visualization to deliver associated synthetic data as endpoints. This aids in the prioritization of compounds. The ability to view synthetic route information alongside structural descriptors facilitates a feedback mechanism for the improvement of CASP tools and enables rapid hypothesis testing. I demonstrate the workflow as applied to the GDB databases to augment compound prioritization and synthetic route design

    Novel Algorithm Development for ‘NextGeneration’ Sequencing Data Analysis

    Get PDF
    In recent years, the decreasing cost of ‘Next generation’ sequencing has spawned numerous applications for interrogating whole genomes and transcriptomes in research, diagnostic and forensic settings. While the innovations in sequencing have been explosive, the development of scalable and robust bioinformatics software and algorithms for the analysis of new types of data generated by these technologies have struggled to keep up. As a result, large volumes of NGS data available in public repositories are severely underutilised, despite providing a rich resource for data mining applications. Indeed, the bottleneck in genome and transcriptome sequencing experiments has shifted from data generation to bioinformatics analysis and interpretation. This thesis focuses on development of novel bioinformatics software to bridge the gap between data availability and interpretation. The work is split between two core topics – computational prioritisation/identification of disease gene variants and identification of RNA N6 -adenosine Methylation from sequencing data. The first chapter briefly discusses the emergence and establishment of NGS technology as a core tool in biology and its current applications and perspectives. Chapter 2 introduces the problem of variant prioritisation in the context of Mendelian disease, where tens of thousands of potential candidates are generated by a typical sequencing experiment. Novel software developed for candidate gene prioritisation is described that utilises data mining of tissue-specific gene expression profiles (Chapter 3). The second part of chapter investigates an alternative approach to candidate variant prioritisation by leveraging functional and phenotypic descriptions of genes and diseases from multiple biomedical domain ontologies (Chapter 4). Chapter 5 discusses N6 AdenosineMethylation, a recently re-discovered posttranscriptional modification of RNA. The core of the chapter describes novel software developed for transcriptome-wide detection of this epitranscriptomic mark from sequencing data. Chapter 6 presents a case study application of the software, reporting the previously uncharacterised RNA methylome of Kaposi’s Sarcoma Herpes Virus. The chapter further discusses a putative novel N6-methyl-adenosine -RNA binding protein and its possible roles in the progression of viral infection

    Examining approaches to target validation and drug repurposing in large scale genomic projects.

    Get PDF
    PhD Theses Medical.Drug repurposing presents an opportunity to quickly produce new medications in a cost effective manner. This is especially important in rare diseases where patients are frequently underserved. Here, we apply various methods to first select good targets for repurposing. We analyse loss-of-function (LoF) data, and assess its role in informing drug discovery. We achieve this by curating, aggregating and labelling LoF data and then building a model to predict genes that may harbour homozygous LoF with no negative associated phenotypes. We produce a model with a relatively high degree of accuracy and recall (F-score 0.7), generating 442 predicted genes in addition to 1,744 from aggregation. Following this, we assess whether such data could inform drug discovery in collaboration with AbbVie, an industrial partner. Through the study of historic drug data, comparing our LoF labels with data from previous studies detailing the effect of genetic knowledge on drug discovery, and against the loss-of-function observed/expected upper bound fraction (LOEUF) score, a metric of constraint, we demonstrate that this data adds significant value to drug discovery. Finally, we build a database focussing on rare diseases, and use LoF data, in addition to drug data and expertly curated gene panels to nominate candidates for repurposing. This database will be made available for researchers within the GEL community, such that avenues for repurposing can be further explored
    • …
    corecore