70 research outputs found

    Bayesian post-hoc regularization of random forests

    Full text link
    Random Forests are powerful ensemble learning algorithms widely used in various machine learning tasks. However, they have a tendency to overfit noisy or irrelevant features, which can result in decreased generalization performance. Post-hoc regularization techniques aim to mitigate this issue by modifying the structure of the learned ensemble after its training. Here, we propose Bayesian post-hoc regularization to leverage the reliable patterns captured by leaf nodes closer to the root, while potentially reducing the impact of more specific and potentially noisy leaf nodes deeper in the tree. This approach allows for a form of pruning that does not alter the general structure of the trees but rather adjusts the influence of leaf nodes based on their proximity to the root node. We have evaluated the performance of our method on various machine learning data sets. Our approach demonstrates competitive performance with the state-of-the-art methods and, in certain cases, surpasses them in terms of predictive accuracy and generalization

    Parea: multi-view ensemble clustering for cancer subtype discovery

    Full text link
    Multi-view clustering methods are essential for the stratification of patients into sub-groups of similar molecular characteristics. In recent years, a wide range of methods has been developed for this purpose. However, due to the high diversity of cancer-related data, a single method may not perform sufficiently well in all cases. We present Parea, a multi-view hierarchical ensemble clustering approach for disease subtype discovery. We demonstrate its performance on several machine learning benchmark datasets. We apply and validate our methodology on real-world multi-view cancer patient data. Parea outperforms the current state-of-the-art on six out of seven analysed cancer types. We have integrated the Parea method into our developed Python package Pyrea (https://github.com/mdbloice/Pyrea), which enables the effortless and flexible design of ensemble workflows while incorporating a wide range of fusion and clustering algorithms

    Explainable AI with counterfactual paths

    Full text link
    Explainable AI (XAI) is an increasingly important area of research in machine learning, which in principle aims to make black-box models transparent and interpretable. In this paper, we propose a novel approach to XAI that uses counterfactual paths generated by conditional permutations. Our method provides counterfactual explanations by identifying alternative paths that could have led to different outcomes. The proposed method is particularly suitable for generating explanations based on counterfactual paths in knowledge graphs. By examining hypothetical changes to the input data in the knowledge graph, we can systematically validate the behaviour of the model and examine the features or combination of features that are most important to the model's predictions. Our approach provides a more intuitive and interpretable explanation for the model's behaviour than traditional feature weighting methods and can help identify and mitigate biases in the model

    Degraded Arabinogalactans and Their Binding Properties to Cancer-Associated Human Galectins

    Get PDF
    Galectins represent Ξ²-galactoside-binding proteins with numerous functions. Due to their role in tumor progression, human galectins-1, -3 and -7 (Gal-1, -3 and -7) are potential targets for cancer therapy. As plant derived glycans might act as galectin inhibitors, we prepared galactans by partial degradation of plant arabinogalactan-proteins. Besides commercially purchased galectins, we produced Gal-1 and -7 in a cell free system and tested binding capacities of the galectins to the galactans by biolayer-interferometry. Results for commercial and cell-free expressed galectins were comparable confirming functionality of the cell-free produced galectins. Our results revealed that galactans from Echinacea purpurea bind to Gal-1 and -7 with KD values of 1-2 Β΅M and to Gal-3 slightly stronger with KD values between 0.36 and 0.70 Β΅M depending on the sensor type. Galactans from the seagrass Zostera marina with higher branching of the galactan and higher content of uronic acids showed stronger binding to Gal-3 (0.08-0.28 Β΅M) compared to galactan from Echinacea. The results contribute to knowledge on interactions between plant polysaccharides and galectins. Arabinogalactan-proteins have been identified as a new source for production of galactans with possible capability to act as galectin inhibitors

    Design, Synthesis and Characterisation of Inhibitors of 3-Deoxy-D-arabino-Heptulosonate 7-Phosphate Synthase

    Get PDF
    The enzyme 3-deoxy D-arabino-heptulosonate 7-phosphate (DAH7P) synthase catalyses the first step of the shikimate pathway. This pathway lies at the heart of bacterial metabolism, and is responsible for the synthesis of a variety of compounds essential to the chemistry of life; from the aromatic amino acids phenylalanine, tyrosine and tryptophan, to a number of aromatic and non-aromatic natural products. This thesis describes the design, synthesis and evaluation of inhibitors of DAH7P synthase. These inhibitors exploit a variety of strategies to interrupt the activity of DAH7P synthase, ranging from simple substrate mimicry to inhibitors that mimic unstable reaction intermediates; inhibitors that exploit metal coordination and entropic effects, and inhibitors that gain improved potency by interacting with multiple sites. In Chapter Two, the synthesis of a mimic for a proposed unstable reaction intermediate is described, and its interaction with DAH7P synthase characterised. The compound was prepared in twelve steps from D-arabinose, and was found to be a slow-tight binding inhibitor of Escherichia coli DAH7P synthase. In Chapter Three, a number of compounds are prepared that were designed to bind to the phosphoenolpyruvate subsite of the DAH7P synthase active site. The binding of these compounds to the enzyme is investigated in order to gain an understanding of the factors involved in DAH7P synthase inhibition. The enantiomeric phospholactates were prepared, and the extent of inhibition of E. coli DAH7P synthase was shown to be dependent on compound chirality. Several other phosphoenolpyruvate-like molecules were prepared, and were also shown to be effective DAH7P synthase inhibitors. In Chapter Four extended compounds are designed that will bind the enzyme by multiple interactions at both substrate binding sites. Four compounds were prepared, and an increase in inhibitory potency was observed. In Chapter Five computational techniques are explored to aid the interpretation of the inhibition of DAH7P synthase by the compounds prepared in these studies. Several approaches for more potent inhibition of this enzyme are outlined and discussed

    PopGenome : an efficient swiss army knife for population genomic analyses in R

    Get PDF
    Although many computer programs can perform population genetics calculations, they are typically limited in the analyses and data input formats they offer; few applications can process the large data sets produced by whole-genome resequencing projects. Furthermore, there is no coherent framework for the easy integration of new statistics into existing pipelines, hindering the development and application of new population genetics and genomics approaches. Here, we present PopGenome, a population genomics package for the R software environment (a de facto standard for statistical analyses). PopGenome can efficiently process genome-scale data as well as large sets of individual loci. It reads DNA alignments and single-nucleotide polymorphism (SNP) data sets in most common formats, including those used by the HapMap, 1000 human genomes, and 1001 Arabidopsis genomes projects. PopGenome also reads associated annotation files in GFF format, enabling users to easily define regions or classify SNPs based on their annotation; all analyses can also be applied to sliding windows. PopGenome offers a wide range of diverse population genetics analyses, including neutrality tests as well as statistics for population differentiation, linkage disequilibrium, and recombination. PopGenome is linked to Hudson's MS and Ewing's MSMS programs to assess statistical significance based on coalescent simulations. PopGenome's integration in R facilitates effortless and reproducible downstream analyses as well as the production of publication-quality graphics. Developers can easily incorporate new analyses methods into the PopGenome framework. PopGenome and R are freely available from CRAN () for all major operating systems under the GNU General Public License

    Therapeutic opportunities within the DNA damage response

    Get PDF
    The DNA damage response (DDR) is essential for maintaining the genomic integrity of the cell, and its disruption is one of the hallmarks of cancer. Classically, defects in the DDR have been exploited therapeutically in the treatment of cancer with radiation therapies or genotoxic chemotherapies. More recently, protein components of the DDR systems have been identified as promising avenues for targeted cancer therapeutics. Here, we present an in-depth analysis of the function, role in cancer and therapeutic potential of 450 expert-curated human DDR genes. We discuss the DDR drugs that have been approved by the US Food and Drug Administration (FDA) or that are under clinical investigation. We examine large-scale genomic and expression data for 15 cancers to identify deregulated components of the DDR, and we apply systematic computational analysis to identify DDR proteins that are amenable to modulation by small molecules, highlighting potential novel therapeutic targets

    Epigenetic regulation of prostate cancer

    Get PDF
    Prostate cancer is a commonly diagnosed cancer in men and a leading cause of cancer deaths. Whilst the underlying mechanisms leading to prostate cancer are still to be determined, it is evident that both genetic and epigenetic changes contribute to the development and progression of this disease. Epigenetic changes involving DNA hypo- and hypermethylation, altered histone modifications and more recently changes in microRNA expression have been detected at a range of genes associated with prostate cancer. Furthermore, there is evidence that particular epigenetic changes are associated with different stages of the disease. Whilst early detection can lead to effective treatment, and androgen deprivation therapy has a high response rate, many tumours develop towards hormone-refractory prostate cancer, for which there is no successful treatment. Reliable markers for early detection and more effective treatment strategies are, therefore, needed. Consequently, there is a considerable interest in the potential of epigenetic changes as markers or targets for therapy in prostate cancer. Epigenetic modifiers that demethylate DNA and inhibit histone deacetylases have recently been explored to reactivate silenced gene expression in cancer. However, further understanding of the mechanisms and the effects of chromatin modulation in prostate cancer are required. In this review, we examine the current literature on epigenetic changes associated with prostate cancer and discuss the potential use of epigenetic modifiers for treatment of this disease
    • …
    corecore