140 research outputs found

    Inductive queries for a drug designing robot scientist

    Get PDF
    It is increasingly clear that machine learning algorithms need to be integrated in an iterative scientific discovery loop, in which data is queried repeatedly by means of inductive queries and where the computer provides guidance to the experiments that are being performed. In this chapter, we summarise several key challenges in achieving this integration of machine learning and data mining algorithms in methods for the discovery of Quantitative Structure Activity Relationships (QSARs). We introduce the concept of a robot scientist, in which all steps of the discovery process are automated; we discuss the representation of molecular data such that knowledge discovery tools can analyse it, and we discuss the adaptation of machine learning and data mining algorithms to guide QSAR experiments

    Predicting gene function using hierarchical multi-label decision tree ensembles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>S. cerevisiae</it>, <it>A. thaliana </it>and <it>M. musculus </it>are well-studied organisms in biology and the sequencing of their genomes was completed many years ago. It is still a challenge, however, to develop methods that assign biological functions to the ORFs in these genomes automatically. Different machine learning methods have been proposed to this end, but it remains unclear which method is to be preferred in terms of predictive performance, efficiency and usability.</p> <p>Results</p> <p>We study the use of decision tree based models for predicting the multiple functions of ORFs. First, we describe an algorithm for learning hierarchical multi-label decision trees. These can simultaneously predict all the functions of an ORF, while respecting a given hierarchy of gene functions (such as FunCat or GO). We present new results obtained with this algorithm, showing that the trees found by it exhibit clearly better predictive performance than the trees found by previously described methods. Nevertheless, the predictive performance of individual trees is lower than that of some recently proposed statistical learning methods. We show that ensembles of such trees are more accurate than single trees and are competitive with state-of-the-art statistical learning and functional linkage methods. Moreover, the ensemble method is computationally efficient and easy to use.</p> <p>Conclusions</p> <p>Our results suggest that decision tree based methods are a state-of-the-art, efficient and easy-to-use approach to ORF function prediction.</p

    An annotated checklist of bryophytes of Europe, Macaronesia and Cyprus

    Get PDF
    Introduction. Following on from work on the European bryophyte Red List, the taxonomically and nomenclaturally updated spreadsheets used for that project have been expanded into a new checklist for the bryophytes of Europe. Methods. A steering group of ten European bryologists was convened, and over the course of a year, the spreadsheets were compared with previous European checklists, and all changes noted. Recent literature was searched extensively. A taxonomic system was agreed, and the advice and expertise of many European bryologists sought. Key results. A new European checklist of bryophytes, comprising hornworts, liverworts and mosses, is presented. Fifteen new combinations are proposed. Conclusions. This checklist provides a snapshot of the current European bryophyte flora in 2019. It will already be out-of-date on publication, and further research, particularly molecular work, can be expected to result in many more changes over the next few years.Peer reviewe

    Using classification and regression tree modelling to investigate response shift patterns in dentine hypersensitivity

    Get PDF
    BACKGROUND: Dentine hypersensitivity (DH) affects people's quality of life (QoL). However changes in the internal meaning of QoL, known as Response shift (RS) may undermine longitudinal assessment of QoL. This study aimed to describe patterns of RS in people with DH using Classification and Regression Trees (CRT) and to explore the convergent validity of CRT with the then-test and ideals approaches. METHODS: Data from an 8-week clinical trial of mouthwashes for dentine hypersensitivity (n = 75) using the Dentine Hypersensitivity Experience Questionnaire (DHEQ) as the outcome measure, were analysed. CRT was used to examine 8-week changes in DHEQ total score as a dependent variable with clinical status for DH and each DHEQ subscale score (restrictions, coping, social, emotional and identity) as independent variables. Recalibration was inferred when the clinical change was not consistent with the DHEQ change score using a minimally important difference for DHEQ of 22 points. Reprioritization was inferred by changes in the relative importance of each subscale to the model over time. RESULTS: Overall, 50.7% of participants experienced a clinical improvement in their DH after treatment and 22.7% experienced an important improvement in their quality of life. Thirty-six per cent shifted their internal standards downward and 14.7% upwards, suggesting recalibration. Reprioritization occurred over time among the social and emotional impacts of DH. CONCLUSIONS: CRT was a useful method to reveal both, the types and nature of RS in people with a mild health condition and demonstrated convergent validity with design based approaches to detect RS

    Complex Aggregates over Clusters of Elements

    Get PDF
    Complex aggregates have been proposed as a way to bridge the gap between approaches that handle sets by imposing conditions on specific elements, and approaches that handle them by imposing conditions on aggregated values. A complex aggregate summarises a subset of the elements in a set, where this subset is defined by conditions on the attribute values. In this paper, we present a new type of complex aggregate, where this subset is defined to be a cluster of the set. This is useful if subsets that are relevant for the task at hand are difficult to describe in terms of attribute conditions. This work is motivated from the analysis of flow cytometry data, where the sets are cells, and the subsets are cell populations. We describe two approaches to aggregate over clusters on an abstract level, and validate one of them empirically, motivating future research in this direction

    Evolution and networks in ancient and widespread symbioses between Mucoromycotina and liverworts

    Get PDF
    Like the majority of land plants, liverworts regularly form intimate symbioses with arbuscular mycorrhizal fungi (Glomeromycotina). Recent phylogenetic and physiological studies report that they also form intimate symbioses with Mucoromycotina fungi and that some of these, like those involving Glomeromycotina, represent nutritional mutualisms. To compare these symbioses, we carried out a global analysis of Mucoromycotina fungi in liverworts and other plants using species delimitation, ancestral reconstruction, and network analyses. We found that Mucoromycotina are more common and diverse symbionts of liverworts than previously thought, globally distributed, ancestral, and often co-occur with Glomeromycotina within plants. However, our results also suggest that the associations formed by Mucoromycotina fungi are fundamentally different because, unlike Glomeromycotina, they may have evolved multiple times and their symbiotic networks are un-nested (i.e., not forming nested subsets of species). We infer that the global Mucoromycotina symbiosis is evolutionarily and ecologically distinctive
    corecore