25 research outputs found

    Comparative evaluation of set-level techniques in predictive classification of gene expression samples

    Get PDF
    Background: Analysis of gene expression data in terms of a priori-defined gene sets has recently received significant attention as this approach typically yields more compact and interpretable results than those produced by traditional methods that rely on individual genes. The set-level strategy can also be adopted with similar benefits in predictive classification tasks accomplished with machine learning algorithms. Initial studies into the predictive performance of set-level classifiers have yielded rather controversial results. The goal of this study is to provide a more conclusive evaluation by testing various components of the set-level framework within a large collection of machine learning experiments. Results: Genuine curated gene sets constitute better features for classification than sets assembled without biological relevance. For identifying the best gene sets for classification, the Global test outperforms the gene-set methods GSEA and SAM-GS as well as two generic feature selection methods. To aggregate expressions of genes into a feature value, the singular value decomposition (SVD) method as well as the SetSig technique improve on simple arithmetic averaging. Set-level classifiers learned with 10 features constituted by the Global test slightly outperform baseline gene-level classifiers learned with all original data features although they are slightly less accurate than gene-level classifiers learned with a prior feature-selection step. Conclusion: Set-level classifiers do not boost predictive accuracy, however, they do achieve competitive accuracy if learned with the right combination of ingredients. 1 Availability: Open-source, publicly available software was used for classifier learning and testing. The gene expression datasets and the gene set database used are also publicly available. The full tabulation of experimental results is available a

    A Performance Prediction Module for Workflow Scheduling

    Get PDF
    Through the years, scientific applications have demanded more powerful and sophisticated computing environments and management techniques. Workflows facilitated the design and management of scientific applications. The complexity of to day's workflows demand a high amount of resources and mechanisms for provisioning them. The execution of scientific workflow applications is a complex task and depends on how the resources are assigned. Scheduling is the name given to the process that assigns computing resources to the tasks comprised in a workflow. This work presents a scheduling algorithm (PPSA) for workflows tightly coupled to a performance prediction module (PEM). A set of experiments was developed for measuring the performance of the algorithm using the information provided by the proposed performance module. The proposed algorithm is compared with an algorithm included in the well-known workflow middlewares Condor DAGMan and ASKALON.Sociedad Argentina de Informática e Investigación Operativ

    Atomically sharp domain walls in an antiferromagnet

    Full text link
    The interest in understanding scaling limits of magnetic textures such as domain walls spans the entire field of magnetism from its relativistic quantum fundamentals to applications in information technologies. The traditional focus of the field on ferromagnets has recently started to shift towards antiferromagnets which offer a rich materials landscape and utility in ultra-fast and neuromorphic devices insensitive to magnetic field perturbations. Here we report the observation that domain walls in an epitaxial crystal of antiferromagnetic CuMnAs can be atomically sharp. We reveal this ultimate domain wall scaling limit using differential phase contrast imaging within aberrationcorrected scanning transmission electron microscopy, which we complement by X-ray magnetic dichroism microscopy and ab initio calculations. We highlight that the atomically sharp domain walls are outside the remits of established spin-Hamiltonian theories and can offer device functionalities unparalleled in ferromagnets.Comment: 8 pages, 4 figures, Supplementary informatio

    Prediction of DNA-binding propensity of proteins by the ball-histogram method using automatic template search

    Get PDF
    We contribute a novel, ball-histogram approach to DNA-binding propensity prediction of proteins. Unlike state-of-the-art methods based on constructing an ad-hoc set of features describing physicochemical properties of the proteins, the ball-histogram technique enables a systematic, Monte-Carlo exploration of the spatial distribution of amino acids complying with automatically selected properties. This exploration yields a model for the prediction of DNA binding propensity. We validate our method in prediction experiments, improving on state-of-the-art accuracies. Moreover, our method also provides interpretable features involving spatial distributions of selected amino acids

    Efficiency-conscious propositionalization for relational learning

    Get PDF
    summary:Systems aiming at discovering interesting knowledge in data, now commonly called data mining systems, are typically employed in finding patterns in a single relational table. Most of mainstream data mining tools are not applicable in the more challenging task of finding knowledge in structured data represented by a multi-relational database. Although a family of methods known as inductive logic programming have been developed to tackle that challenge by immediate means, the idea of adapting structured data into a simpler form digestible by the wealth of AVL systems has been always tempting to data miners. To this end, we present a method based on constructing first-order logic features that conducts this kind of conversion, also known as propositionalization. It incorporates some basic principles suggested in previous research and provides significant enhancements that lead to remarkable improvements in efficiency of the feature-construction process. We begin by motivating the propositionalization task with an illustrative example, review some previous approaches to propositionalization, and formalize the concept of a first-order feature elaborating mainly the points that influence the efficiency of the designed feature-construction algorithm
    corecore